AI TRAINING
MLOps for Production AI Teams
Build and operate reliable ML pipelines from experimentation to production with modern MLOps tooling.
What it covers
This practitioner-level programme covers the full MLOps lifecycle: CI/CD for models, feature stores, model registries, serving infrastructure, and production monitoring. Participants work through hands-on labs deploying real pipelines using industry-standard tools such as MLflow, Kubeflow, and Feast. The course addresses drift detection, automated retraining triggers, rollback strategies, and governance requirements. By the end, teams can design and operate a production-grade ML platform aligned with their organisation's scale and data maturity.
What you'll be able to do
- Design and implement a CI/CD pipeline that automatically trains, validates, and deploys an ML model on code or data changes
- Configure a feature store to serve low-latency features consistently across training and inference environments
- Set up a model registry with versioning, stage transitions, and approval gates using MLflow
- Instrument a deployed model with drift detection alerts and an automated retraining trigger
- Execute a safe rollback from a degraded model version using a blue/green or canary deployment strategy
Topics covered
- CI/CD pipelines for model training and deployment
- Feature stores: design, ingestion, and serving (Feast, Tecton)
- Model registries and versioning with MLflow and DVC
- Model serving patterns: batch, real-time, shadow and canary deployments
- Production monitoring: data drift, concept drift, and performance degradation
- Automated retraining triggers and pipeline orchestration (Airflow, Kubeflow Pipelines)
- Rollback strategies and blue/green deployments
- Governance, lineage tracking, and audit trails
Delivery
Delivered as a 3–5 day intensive bootcamp, available in-person or remote-live. Each day combines 40% concept sessions with 60% hands-on labs on a shared cloud environment (AWS or GCP). Participants receive a pre-configured lab repo, reference architecture diagrams, and a post-bootcamp Slack channel for 30-day follow-up support. In-person delivery recommended for teams co-building a shared platform.
What makes it work
- Assign a dedicated ML platform owner who maintains tooling standards and onboards new model owners
- Define and automate model quality gates (accuracy thresholds, bias checks) as part of the CI pipeline from day one
- Start with a single end-to-end reference pipeline on a real use case before generalising to a platform
- Establish a shared model registry and naming convention so all teams discover and reuse existing model assets
Common mistakes
- Treating model deployment as a one-off script rather than a reproducible, versioned pipeline
- Skipping feature store adoption and duplicating feature logic between training and serving, causing training-serving skew
- Monitoring only infrastructure metrics (CPU, latency) and missing model-level drift until business impact is visible
- Over-engineering the MLOps stack before validating that the use case justifies the operational complexity
When NOT to take this
A team that has fewer than two models in production and no dedicated ML engineer: the overhead of a full MLOps stack will stall delivery rather than accelerate it — a lightweight experiment-tracking setup (MLflow alone) is sufficient at that stage.
Providers to consider
Sources
This training is part of a Data & AI catalog built for leaders serious about execution. Take the free diagnostic to see which trainings your team needs.