FORMATION IA

Génération Augmentée par Récupération (RAG) en Production

Construisez, évaluez et opérez des pipelines RAG en production, rapides, précis et rentables.

Format: programme
Durée: 24–40h
Niveau: practitioner
Taille de groupe: 6–16
Prix / participant: €3K–€5K
Prix groupe: €18K–€40K
Public: Software and ML engineers building or scaling LLM-powered applications
Prérequis: Solid Python skills, working knowledge of REST APIs, and familiarity with basic LLM concepts (prompting, tokens, embeddings)

Ce qu'elle couvre

Ce programme de niveau praticien accompagne les ingénieurs du RAG fondamental jusqu'au déploiement en production : stratégies d'ingestion de documents, choix de chunking et d'embeddings, architectures de retrievers et rerankers, et frameworks d'évaluation. Les participants implémentent des pipelines bout-en-bout sur des données réelles, les instrumentent pour l'observabilité, et appliquent des techniques de mise en cache et de maîtrise des coûts. Le format associe sessions de live coding, revues d'architecture et ateliers pratiques avec des outils open source (LangChain, LlamaIndex, Weaviate, RAGAS). À l'issue du programme, les participants sont capables de livrer et de surveiller un système RAG respectant les exigences de latence, de qualité et de budget.

À l'issue, vous saurez

Design and implement a multi-stage RAG pipeline with chunking, embedding, retrieval, and reranking stages tuned for a real dataset
Select and justify the right vector store and retriever architecture for a given latency and accuracy trade-off
Evaluate RAG pipeline quality using RAGAS metrics (faithfulness, context precision, answer relevancy) and iterate systematically
Instrument a RAG system with distributed tracing and set up alerts for retrieval quality degradation in production
Apply semantic caching and query routing to reduce LLM API costs by at least 30% without sacrificing answer quality

Sujets abordés

Document ingestion pipelines and preprocessing strategies
Chunking strategies: fixed, semantic, recursive, and late chunking
Embedding model selection and fine-tuning for domain-specific retrieval
Vector stores, hybrid search, and retriever architectures
Reranking with cross-encoders and LLM-based rerankers
RAG evaluation frameworks (RAGAS, TruLens, LangSmith)
Caching, query routing, and cost control patterns
Observability, tracing, and production monitoring for RAG systems

Modalité

Delivered as a blended programme over 3–5 days (on-site or remote), with approximately 60% hands-on labs and 40% instructor-led architecture sessions. Participants work in small teams on a capstone project using their own or provided datasets. All labs run in pre-configured cloud environments; no local GPU required. Printed architecture cheat sheets and a private GitHub repository with all lab code are included. Remote delivery uses Zoom breakout rooms with a lab assistant per group of four.

Ce qui fait que ça marche

Establishing an offline evaluation dataset with ground-truth QA pairs before writing any pipeline code
Instrumenting retrieval and generation steps from day one with a tracing tool such as LangSmith or Arize Phoenix
Running chunking and embedding ablations on a representative sample of real production documents before committing to an architecture
Treating prompt templates and retrieval parameters as versioned artifacts subject to the same CI/CD discipline as application code

Erreurs fréquentes

Using fixed-size chunking for all document types without considering semantic boundaries, leading to poor retrieval precision
Skipping systematic evaluation and relying on anecdotal spot-checks, so quality regressions go unnoticed in production
Ignoring reranking entirely and assuming top-k dense retrieval is sufficient for complex, multi-hop questions
Treating RAG as a one-time build rather than an observable system, leaving latency spikes and cost overruns undetected

Quand NE PAS suivre cette formation

This programme is not appropriate for teams that have not yet shipped any LLM feature to users, organisations still evaluating whether to use AI at all will find the production-operations depth overwhelming and should start with an LLM literacy or prompt-engineering workshop instead.

Fournisseurs à considérer

Sources

Cette formation fait partie d'un catalogue Data & IA construit pour les leaders sérieux sur l'exécution. Lancez le diagnostic gratuit pour voir quelles formations sont prioritaires pour votre équipe.

Lancer le diagnostic Réserver un appel