FORMATION IA
Fine-Tuning de LLMs : Quand, Comment et Pourquoi
Choisissez avec assurance entre fine-tuning, prompting ou RAG — et exécutez la bonne approche.
Ce qu'elle couvre
Les participants travaillent sur un cadre de décision structuré comparant le prompting, la génération augmentée par récupération (RAG) et le fine-tuning selon les dimensions coût, latence et qualité. Le programme couvre la curation de datasets, les formats d'instruction-tuning, les techniques LoRA/QLoRA, la conception d'évaluation et la modélisation des coûts de déploiement. Les labs pratiques utilisent des outils open source (Hugging Face, Axolotl, LM Evaluation Harness) sur des datasets métier réalistes. À l'issue de la formation, les équipes sont capables de cadrer, exécuter et évaluer un projet de fine-tuning dans leur propre infrastructure.
À l'issue, vous saurez
- Apply a structured decision tree to determine whether prompting, RAG, or fine-tuning is the right approach for a given use case
- Curate and format a domain-specific instruction dataset suitable for supervised fine-tuning
- Run a QLoRA fine-tuning job on an open-source model using Hugging Face TRL or Axolotl
- Design and execute an evaluation suite combining automated metrics and LLM-as-judge scoring
- Estimate total cost of ownership (GPU compute, storage, inference) for a fine-tuned model vs hosted API alternatives
Sujets abordés
- Prompting vs RAG vs fine-tuning: a cost-quality-latency decision tree
- Dataset curation, cleaning, and instruction-format design (JSONL, ShareGPT, Alpaca)
- Full fine-tuning vs parameter-efficient methods: LoRA, QLoRA, prefix-tuning
- Supervised fine-tuning (SFT) and RLHF/DPO alignment techniques
- Evaluation frameworks: BLEU, ROUGE, LLM-as-judge, domain-specific benchmarks
- Tooling selection: Hugging Face TRL, Axolotl, LLaMA-Factory, OpenAI fine-tune API
- Infrastructure and cost modelling: GPU hours, cloud vs on-prem, quantisation tradeoffs
- Deployment and monitoring of fine-tuned models in production
Modalité
Delivered over 2–3 days, either in-person or fully remote via video conferencing with shared cloud GPU environments (e.g., Lambda Labs, RunPod, or AWS). Approximately 60% hands-on labs, 40% instruction and discussion. Participants receive a pre-configured notebook repository and retain access to lab materials post-training. A short async pre-work module (2–3 hours) on transformer fundamentals is recommended for mixed-level cohorts.
Ce qui fait que ça marche
- Define a measurable evaluation benchmark before writing a single training example
- Start with the smallest model that meets quality requirements to minimise compute cost
- Invest heavily in dataset quality and diversity — model behaviour reflects data behaviour
- Track experiments rigorously (Weights & Biases, MLflow) to enable reproducibility and regression detection
Erreurs fréquentes
- Fine-tuning when a well-crafted system prompt or RAG pipeline would solve the problem at a fraction of the cost
- Using too little or poorly cleaned training data, producing a model that overfits or degrades on out-of-distribution inputs
- Neglecting evaluation design before training — leading to no reliable signal on whether the fine-tune actually improved the model
- Ignoring inference cost and latency implications of larger fine-tuned models compared to smaller prompted alternatives
Quand NE PAS suivre cette formation
A team that has never shipped an LLM-powered feature to production and is jumping straight to fine-tuning to avoid prompt engineering work — they should first validate the use case with prompting before incurring fine-tuning complexity and cost.
Fournisseurs à considérer
Sources
Cette formation fait partie d'un catalogue Data & IA construit pour les leaders sérieux sur l'exécution. Lancez le diagnostic gratuit pour voir quelles formations sont prioritaires pour votre équipe.