How mature is your Data & AI organization?Take the diagnostic
All trainings

AI TRAINING

Fine-Tuning LLMs: When, How, and Why

Decide confidently whether to fine-tune, prompt, or use RAG — then execute it correctly.

Format
bootcamp
Duration
16–24h
Level
advanced
Group size
6–16
Price / participant
€2K–€4K
Group price
€18K–€45K
Audience
ML engineers, AI engineers, and technical leads responsible for LLM integration or productionisation
Prerequisites
Solid Python skills, working knowledge of transformer architecture basics, and prior experience deploying or calling LLM APIs

What it covers

Participants work through a structured decision framework comparing prompting, retrieval-augmented generation, and fine-tuning across cost, latency, and quality dimensions. The programme covers dataset curation, instruction-tuning formats, LoRA/QLoRA techniques, evaluation design, and deployment cost modelling. Hands-on labs use open-source tooling (Hugging Face, Axolotl, LM Evaluation Harness) on realistic domain datasets. By the end, teams can confidently scope, execute, and evaluate a fine-tuning project in their own infrastructure.

What you'll be able to do

  • Apply a structured decision tree to determine whether prompting, RAG, or fine-tuning is the right approach for a given use case
  • Curate and format a domain-specific instruction dataset suitable for supervised fine-tuning
  • Run a QLoRA fine-tuning job on an open-source model using Hugging Face TRL or Axolotl
  • Design and execute an evaluation suite combining automated metrics and LLM-as-judge scoring
  • Estimate total cost of ownership (GPU compute, storage, inference) for a fine-tuned model vs hosted API alternatives

Topics covered

  • Prompting vs RAG vs fine-tuning: a cost-quality-latency decision tree
  • Dataset curation, cleaning, and instruction-format design (JSONL, ShareGPT, Alpaca)
  • Full fine-tuning vs parameter-efficient methods: LoRA, QLoRA, prefix-tuning
  • Supervised fine-tuning (SFT) and RLHF/DPO alignment techniques
  • Evaluation frameworks: BLEU, ROUGE, LLM-as-judge, domain-specific benchmarks
  • Tooling selection: Hugging Face TRL, Axolotl, LLaMA-Factory, OpenAI fine-tune API
  • Infrastructure and cost modelling: GPU hours, cloud vs on-prem, quantisation tradeoffs
  • Deployment and monitoring of fine-tuned models in production

Delivery

Delivered over 2–3 days, either in-person or fully remote via video conferencing with shared cloud GPU environments (e.g., Lambda Labs, RunPod, or AWS). Approximately 60% hands-on labs, 40% instruction and discussion. Participants receive a pre-configured notebook repository and retain access to lab materials post-training. A short async pre-work module (2–3 hours) on transformer fundamentals is recommended for mixed-level cohorts.

What makes it work

  • Define a measurable evaluation benchmark before writing a single training example
  • Start with the smallest model that meets quality requirements to minimise compute cost
  • Invest heavily in dataset quality and diversity — model behaviour reflects data behaviour
  • Track experiments rigorously (Weights & Biases, MLflow) to enable reproducibility and regression detection

Common mistakes

  • Fine-tuning when a well-crafted system prompt or RAG pipeline would solve the problem at a fraction of the cost
  • Using too little or poorly cleaned training data, producing a model that overfits or degrades on out-of-distribution inputs
  • Neglecting evaluation design before training — leading to no reliable signal on whether the fine-tune actually improved the model
  • Ignoring inference cost and latency implications of larger fine-tuned models compared to smaller prompted alternatives

When NOT to take this

A team that has never shipped an LLM-powered feature to production and is jumping straight to fine-tuning to avoid prompt engineering work — they should first validate the use case with prompting before incurring fine-tuning complexity and cost.

Providers to consider

Sources

This training is part of a Data & AI catalog built for leaders serious about execution. Take the free diagnostic to see which trainings your team needs.