AI TRAINING

Retrieval-Augmented Generation in Production

Build, evaluate, and operate production-grade RAG pipelines that are fast, accurate, and cost-efficient.

Format: programme
Duration: 24–40h
Level: practitioner
Group size: 6–16
Price / participant: €3K–€5K
Group price: €18K–€40K
Audience: Software and ML engineers building or scaling LLM-powered applications
Prerequisites: Solid Python skills, working knowledge of REST APIs, and familiarity with basic LLM concepts (prompting, tokens, embeddings)

What it covers

This practitioner-level programme takes engineers from RAG fundamentals to production deployment, covering document ingestion strategies, chunking and embedding choices, retriever and reranker architectures, and evaluation frameworks. Participants implement end-to-end pipelines with real datasets, instrument them for observability, and apply cost-control and caching techniques. The format combines live coding sessions, architecture reviews, and hands-on labs using open-source tooling (LangChain, LlamaIndex, Weaviate, RAGAS). By the end, participants can ship and monitor a RAG system that meets latency, quality, and budget requirements.

What you'll be able to do

Design and implement a multi-stage RAG pipeline with chunking, embedding, retrieval, and reranking stages tuned for a real dataset
Select and justify the right vector store and retriever architecture for a given latency and accuracy trade-off
Evaluate RAG pipeline quality using RAGAS metrics (faithfulness, context precision, answer relevancy) and iterate systematically
Instrument a RAG system with distributed tracing and set up alerts for retrieval quality degradation in production
Apply semantic caching and query routing to reduce LLM API costs by at least 30% without sacrificing answer quality

Topics covered

Document ingestion pipelines and preprocessing strategies
Chunking strategies: fixed, semantic, recursive, and late chunking
Embedding model selection and fine-tuning for domain-specific retrieval
Vector stores, hybrid search, and retriever architectures
Reranking with cross-encoders and LLM-based rerankers
RAG evaluation frameworks (RAGAS, TruLens, LangSmith)
Caching, query routing, and cost control patterns
Observability, tracing, and production monitoring for RAG systems

Delivery

Delivered as a blended programme over 3–5 days (on-site or remote), with approximately 60% hands-on labs and 40% instructor-led architecture sessions. Participants work in small teams on a capstone project using their own or provided datasets. All labs run in pre-configured cloud environments; no local GPU required. Printed architecture cheat sheets and a private GitHub repository with all lab code are included. Remote delivery uses Zoom breakout rooms with a lab assistant per group of four.

What makes it work

Establishing an offline evaluation dataset with ground-truth QA pairs before writing any pipeline code
Instrumenting retrieval and generation steps from day one with a tracing tool such as LangSmith or Arize Phoenix
Running chunking and embedding ablations on a representative sample of real production documents before committing to an architecture
Treating prompt templates and retrieval parameters as versioned artifacts subject to the same CI/CD discipline as application code

Common mistakes

Using fixed-size chunking for all document types without considering semantic boundaries, leading to poor retrieval precision
Skipping systematic evaluation and relying on anecdotal spot-checks, so quality regressions go unnoticed in production
Ignoring reranking entirely and assuming top-k dense retrieval is sufficient for complex, multi-hop questions
Treating RAG as a one-time build rather than an observable system, leaving latency spikes and cost overruns undetected

When NOT to take this

This programme is not appropriate for teams that have not yet shipped any LLM feature to users, organisations still evaluating whether to use AI at all will find the production-operations depth overwhelming and should start with an LLM literacy or prompt-engineering workshop instead.

Providers to consider

Sources

This training is part of a Data & AI catalog built for leaders serious about execution. Take the free diagnostic to see which trainings your team needs.

Run the diagnostic Book a call