AI TRAINING
Fine-Tuning Small Language Models for Production
Build, evaluate, and deploy fine-tuned LLMs using LoRA and QLoRA on real tasks.
What it covers
This hands-on bootcamp covers the full fine-tuning lifecycle for open-weight models such as Llama, Mistral, and Gemma. Participants compare full fine-tuning, LoRA, and QLoRA approaches, curate domain-specific datasets, run rigorous evaluation harnesses, and ship models to inference endpoints. Sessions balance theory with GPU-backed lab work so engineers leave with a working fine-tuned model and a repeatable workflow.
What you'll be able to do
- Select the appropriate fine-tuning strategy (full, LoRA, QLoRA) for a given constraint set on memory, compute, and target performance
- Curate and format a domain-specific instruction or chat dataset ready for supervised fine-tuning
- Run a complete LoRA or QLoRA training job on a 7B–13B parameter model using Axolotl or TRL with correct hyperparameter choices
- Build an evaluation harness combining automated benchmarks and preference scoring to detect overfitting and alignment drift
- Quantise and deploy a fine-tuned model to a production inference endpoint and measure latency/throughput trade-offs
Topics covered
- Full fine-tuning vs parameter-efficient methods (LoRA, QLoRA, DoRA)
- Dataset curation: collection, cleaning, deduplication, and formatting (instruction, chat, completion formats)
- Training configuration: learning rate schedules, batch sizing, gradient accumulation, mixed precision
- PEFT and Hugging Face TRL / Axolotl / LLaMA-Factory toolchains
- Evaluation harness design: perplexity, task-specific benchmarks, human-preference scoring
- Overfitting, catastrophic forgetting, and alignment drift diagnostics
- Model quantisation (GGUF, GPTQ, AWQ) for efficient inference
- Deployment to inference endpoints (vLLM, Ollama, HuggingFace Inference, cloud APIs)
Delivery
Delivered over 3–5 consecutive days, either in-person or live-remote via video call with shared GPU cloud workspace (e.g., Lambda Labs or RunPod). Each day follows an 80/20 hands-on to lecture ratio. Participants receive a starter repo, pre-processed sample datasets, and evaluation scripts. A private Slack or Discord channel provides async support for 30 days post-training. In-person delivery requires a venue with stable internet; cloud GPU costs are typically billed separately or included in the group price tier.
What makes it work
- Defining a narrow, well-scoped task with clear success metrics before touching any training code
- Investing at least 40% of total project time in dataset curation and quality checks
- Using automated evaluation loops (e.g., LM-Eval Harness or custom task suites) from day one to catch regressions early
- Running a small-scale baseline experiment before committing compute to full training runs
Common mistakes
- Training on too little or poorly cleaned data and attributing poor results to the model architecture rather than the dataset
- Choosing QLoRA without profiling the actual GPU memory footprint, leading to unexpected OOM errors in production
- Skipping a rigorous evaluation harness and relying on qualitative spot-checks that miss regression on held-out tasks
- Deploying the raw adapter weights without merging or quantising, resulting in inference latency far above baseline
When NOT to take this
This bootcamp is the wrong fit for teams that have not yet identified a concrete downstream task — organisations still exploring whether LLMs are relevant to their problem should start with an awareness or literacy programme before investing in fine-tuning infrastructure.
Providers to consider
Sources
This training is part of a Data & AI catalog built for leaders serious about execution. Take the free diagnostic to see which trainings your team needs.