AI TRAINING
MLflow and W&B for Experiment Tracking
Master experiment tracking, model registries, and hyperparameter sweeps using MLflow and Weights & Biases.
What it covers
This hands-on practitioner bootcamp covers the full ML experiment lifecycle using two industry-standard tools: MLflow and Weights & Biases. Participants learn to instrument training runs, compare experiments, manage model versions, and run automated hyperparameter sweeps. The programme also addresses team collaboration workflows, artifact management, and the trade-offs between self-hosted and SaaS deployments. Format is lab-heavy with real datasets and model training exercises throughout.
What you'll be able to do
- Instrument any Python-based training script with MLflow or W&B logging in under 15 minutes
- Configure and run a W&B Sweep or MLflow hyperparameter search over a real model to identify optimal configurations
- Register, version, and promote models through staging to production using the MLflow Model Registry
- Design a team collaboration workflow with shared experiment namespaces, tagging conventions, and access control policies
- Evaluate and justify a self-hosted versus SaaS deployment decision based on data sensitivity, cost, and team size
Topics covered
- MLflow tracking: logging metrics, params, artifacts, and tags
- Weights & Biases: runs, sweeps, and the W&B dashboard
- Model registry: versioning, staging, and promotion workflows
- Hyperparameter optimisation with W&B Sweeps and MLflow Projects
- Artifact management and dataset versioning
- Team collaboration patterns: shared experiments and access controls
- Self-hosted MLflow vs W&B SaaS: cost, security, scalability trade-offs
- CI/CD integration for automated experiment pipelines
Delivery
Delivered over two to three days, either on-site or remote via video conferencing with a shared cloud environment (e.g., AWS SageMaker Studio or Google Colab Enterprise). Each session follows a 30% concept / 70% lab ratio. Participants receive pre-configured Docker environments and Jupyter notebooks. A capstone exercise on day 2-3 requires integrating both tools into a mini ML pipeline. Remote delivery uses breakout rooms for pair-lab exercises.
What makes it work
- Establishing shared naming conventions and tagging standards before the first team experiment run
- Integrating experiment tracking into CI/CD so every training job is automatically logged without developer effort
- Nominating a model registry owner who reviews and approves promotions from staging to production
- Starting with a small reproducibility audit of past experiments to immediately demonstrate business value
Common mistakes
- Logging only final metrics rather than per-step metrics, making it impossible to diagnose training instability
- Skipping the model registry and relying on file paths, leading to broken reproducibility when models move to production
- Running W&B Sweeps without setting a stopping strategy, resulting in runaway compute costs
- Choosing self-hosted MLflow without planning storage backend and proxy auth, causing painful migrations later
When NOT to take this
A team that has not yet standardised its training framework (some using TensorFlow, others PyTorch, others AutoML SaaS) will struggle to get value from this training — establish a common modelling stack first.
Providers to consider
Sources
This training is part of a Data & AI catalog built for leaders serious about execution. Take the free diagnostic to see which trainings your team needs.