AI TRAINING

AI Observability and Monitoring in Production

Master the tools and practices to keep LLM-powered systems reliable, observable, and cost-efficient in production.

Format: bootcamp
Duration: 16–24h
Level: practitioner
Group size: 6–16
Price / participant: €2K–€4K
Group price: €18K–€40K
Audience: Platform engineers, SREs, and MLOps engineers responsible for AI systems in production
Prerequisites: Hands-on experience deploying cloud services or APIs; familiarity with Python and basic ML concepts; some exposure to LLM APIs is helpful

What it covers

This practitioner-level programme equips platform engineers and SREs with the skills to instrument, trace, and evaluate AI systems running in production. Participants learn to implement structured logging for LLM calls, detect model drift, set up cost and latency alerting, and run continuous evaluations against live traffic. The format combines hands-on labs using real observability tooling (LangSmith, Arize, Prometheus, OpenTelemetry) with architecture reviews and incident post-mortems. By the end, teams can build and operate a production-grade AI observability stack from scratch.

What you'll be able to do

Instrument an LLM API pipeline with OpenTelemetry spans and structured JSON logs exportable to any backend
Configure automated eval pipelines that score model outputs on live traffic using LLM-as-judge and rule-based checks
Build a drift detection alert that triggers when embedding cosine similarity or output toxicity scores shift beyond a set threshold
Define and implement SLOs for AI endpoints covering latency P95, error rate, and per-token cost budgets
Diagnose and remediate a simulated production incident involving a degraded LLM response quality using an observability dashboard

Topics covered

Tracing LLM calls with OpenTelemetry and vendor-specific SDKs
Structured logging strategies for prompt/response pipelines
Eval-in-production: running automated quality checks against live traffic
Drift detection for embedding models and output distributions
Cost monitoring and token-budget alerting across providers
Latency profiling and SLO definition for AI endpoints
Observability tool landscape: LangSmith, Arize, Helicone, Datadog AI
Incident response playbooks for degraded model behaviour

Delivery

Delivered as a 3-day intensive bootcamp (in-person or remote with live instruction), with approximately 60% hands-on lab time and 40% concept delivery and discussion. Participants work in small squads on a shared staging environment pre-configured with a multi-step LLM application. Materials include lab notebooks, architecture reference cards, and a starter observability stack template (Docker Compose + Prometheus + Grafana + LangSmith). Remote delivery uses breakout rooms for squad work with a shared cloud sandbox. Post-bootcamp, a follow-up office-hours session (2h) is recommended at the 30-day mark.

What makes it work

Defining SLOs for AI endpoints before instrumenting — clarity on what 'good' looks like drives the right metric selection
Embedding eval-in-production as a standard release gate alongside unit and integration tests
Assigning clear ownership of cost and quality dashboards to a named team or on-call rotation
Starting with a minimal viable observability stack (traces + cost + one quality metric) and iterating rather than boiling the ocean

Common mistakes

Treating LLM observability as identical to classic APM — ignoring semantic drift and output quality as first-class signals
Logging only errors and latency while omitting prompt content and token counts, making cost attribution impossible
Running evals only offline at release time and having no mechanism to catch quality regressions in live traffic
Setting up dashboards without defining SLOs first, resulting in metrics that nobody acts on

When NOT to take this

This bootcamp is not the right fit for teams that have not yet deployed any AI model or LLM feature to production — they would benefit more from an MLOps or LLM application-building programme first before investing in observability infrastructure.

Providers to consider

Sources

This training is part of a Data & AI catalog built for leaders serious about execution. Take the free diagnostic to see which trainings your team needs.

Run the diagnostic Book a call