How mature is your Data & AI organization?Take the diagnostic
All trainings

AI TRAINING

Fine-Tuning Small Language Models for Production

Build, evaluate, and deploy fine-tuned LLMs using LoRA and QLoRA on real tasks.

Format
bootcamp
Duration
24–40h
Level
advanced
Group size
6–16
Price / participant
€2K–€4K
Group price
€18K–€45K
Audience
ML engineers and AI engineers with existing deep learning experience who want to fine-tune open-weight LLMs for production use cases
Prerequisites
Solid Python and PyTorch skills, familiarity with transformer architecture basics, and access to a GPU environment (cloud or local)

What it covers

This hands-on bootcamp covers the full fine-tuning lifecycle for open-weight models such as Llama, Mistral, and Gemma. Participants compare full fine-tuning, LoRA, and QLoRA approaches, curate domain-specific datasets, run rigorous evaluation harnesses, and ship models to inference endpoints. Sessions balance theory with GPU-backed lab work so engineers leave with a working fine-tuned model and a repeatable workflow.

What you'll be able to do

  • Select the appropriate fine-tuning strategy (full, LoRA, QLoRA) for a given constraint set on memory, compute, and target performance
  • Curate and format a domain-specific instruction or chat dataset ready for supervised fine-tuning
  • Run a complete LoRA or QLoRA training job on a 7B–13B parameter model using Axolotl or TRL with correct hyperparameter choices
  • Build an evaluation harness combining automated benchmarks and preference scoring to detect overfitting and alignment drift
  • Quantise and deploy a fine-tuned model to a production inference endpoint and measure latency/throughput trade-offs

Topics covered

  • Full fine-tuning vs parameter-efficient methods (LoRA, QLoRA, DoRA)
  • Dataset curation: collection, cleaning, deduplication, and formatting (instruction, chat, completion formats)
  • Training configuration: learning rate schedules, batch sizing, gradient accumulation, mixed precision
  • PEFT and Hugging Face TRL / Axolotl / LLaMA-Factory toolchains
  • Evaluation harness design: perplexity, task-specific benchmarks, human-preference scoring
  • Overfitting, catastrophic forgetting, and alignment drift diagnostics
  • Model quantisation (GGUF, GPTQ, AWQ) for efficient inference
  • Deployment to inference endpoints (vLLM, Ollama, HuggingFace Inference, cloud APIs)

Delivery

Delivered over 3–5 consecutive days, either in-person or live-remote via video call with shared GPU cloud workspace (e.g., Lambda Labs or RunPod). Each day follows an 80/20 hands-on to lecture ratio. Participants receive a starter repo, pre-processed sample datasets, and evaluation scripts. A private Slack or Discord channel provides async support for 30 days post-training. In-person delivery requires a venue with stable internet; cloud GPU costs are typically billed separately or included in the group price tier.

What makes it work

  • Defining a narrow, well-scoped task with clear success metrics before touching any training code
  • Investing at least 40% of total project time in dataset curation and quality checks
  • Using automated evaluation loops (e.g., LM-Eval Harness or custom task suites) from day one to catch regressions early
  • Running a small-scale baseline experiment before committing compute to full training runs

Common mistakes

  • Training on too little or poorly cleaned data and attributing poor results to the model architecture rather than the dataset
  • Choosing QLoRA without profiling the actual GPU memory footprint, leading to unexpected OOM errors in production
  • Skipping a rigorous evaluation harness and relying on qualitative spot-checks that miss regression on held-out tasks
  • Deploying the raw adapter weights without merging or quantising, resulting in inference latency far above baseline

When NOT to take this

This bootcamp is the wrong fit for teams that have not yet identified a concrete downstream task — organisations still exploring whether LLMs are relevant to their problem should start with an awareness or literacy programme before investing in fine-tuning infrastructure.

Providers to consider

Sources

This training is part of a Data & AI catalog built for leaders serious about execution. Take the free diagnostic to see which trainings your team needs.