AI TRAINING
Retrieval-Augmented Generation in Production
Build, evaluate, and operate production-grade RAG pipelines that are fast, accurate, and cost-efficient.
What it covers
This practitioner-level programme takes engineers from RAG fundamentals to production deployment, covering document ingestion strategies, chunking and embedding choices, retriever and reranker architectures, and evaluation frameworks. Participants implement end-to-end pipelines with real datasets, instrument them for observability, and apply cost-control and caching techniques. The format combines live coding sessions, architecture reviews, and hands-on labs using open-source tooling (LangChain, LlamaIndex, Weaviate, RAGAS). By the end, participants can ship and monitor a RAG system that meets latency, quality, and budget requirements.
What you'll be able to do
- Design and implement a multi-stage RAG pipeline with chunking, embedding, retrieval, and reranking stages tuned for a real dataset
- Select and justify the right vector store and retriever architecture for a given latency and accuracy trade-off
- Evaluate RAG pipeline quality using RAGAS metrics (faithfulness, context precision, answer relevancy) and iterate systematically
- Instrument a RAG system with distributed tracing and set up alerts for retrieval quality degradation in production
- Apply semantic caching and query routing to reduce LLM API costs by at least 30% without sacrificing answer quality
Topics covered
- Document ingestion pipelines and preprocessing strategies
- Chunking strategies: fixed, semantic, recursive, and late chunking
- Embedding model selection and fine-tuning for domain-specific retrieval
- Vector stores, hybrid search, and retriever architectures
- Reranking with cross-encoders and LLM-based rerankers
- RAG evaluation frameworks (RAGAS, TruLens, LangSmith)
- Caching, query routing, and cost control patterns
- Observability, tracing, and production monitoring for RAG systems
Delivery
Delivered as a blended programme over 3–5 days (on-site or remote), with approximately 60% hands-on labs and 40% instructor-led architecture sessions. Participants work in small teams on a capstone project using their own or provided datasets. All labs run in pre-configured cloud environments; no local GPU required. Printed architecture cheat sheets and a private GitHub repository with all lab code are included. Remote delivery uses Zoom breakout rooms with a lab assistant per group of four.
What makes it work
- Establishing an offline evaluation dataset with ground-truth QA pairs before writing any pipeline code
- Instrumenting retrieval and generation steps from day one with a tracing tool such as LangSmith or Arize Phoenix
- Running chunking and embedding ablations on a representative sample of real production documents before committing to an architecture
- Treating prompt templates and retrieval parameters as versioned artifacts subject to the same CI/CD discipline as application code
Common mistakes
- Using fixed-size chunking for all document types without considering semantic boundaries, leading to poor retrieval precision
- Skipping systematic evaluation and relying on anecdotal spot-checks, so quality regressions go unnoticed in production
- Ignoring reranking entirely and assuming top-k dense retrieval is sufficient for complex, multi-hop questions
- Treating RAG as a one-time build rather than an observable system, leaving latency spikes and cost overruns undetected
When NOT to take this
This programme is not appropriate for teams that have not yet shipped any LLM feature to users — organisations still evaluating whether to use AI at all will find the production-operations depth overwhelming and should start with an LLM literacy or prompt-engineering workshop instead.
Providers to consider
Sources
This training is part of a Data & AI catalog built for leaders serious about execution. Take the free diagnostic to see which trainings your team needs.