AI TRAINING

MLflow and W&B for Experiment Tracking

Master experiment tracking, model registries, and hyperparameter sweeps using MLflow and Weights & Biases.

Format: bootcamp
Duration: 16–24h
Level: practitioner
Group size: 4–16
Price / participant: €2K–€3K
Group price: €12K–€30K
Audience: ML engineers and data scientists who train models regularly and need rigorous experiment management
Prerequisites: Proficiency in Python and practical experience training ML models (scikit-learn, PyTorch, or TensorFlow); basic Git knowledge required

What it covers

This hands-on practitioner bootcamp covers the full ML experiment lifecycle using two industry-standard tools: MLflow and Weights & Biases. Participants learn to instrument training runs, compare experiments, manage model versions, and run automated hyperparameter sweeps. The programme also addresses team collaboration workflows, artifact management, and the trade-offs between self-hosted and SaaS deployments. Format is lab-heavy with real datasets and model training exercises throughout.

What you'll be able to do

Instrument any Python-based training script with MLflow or W&B logging in under 15 minutes
Configure and run a W&B Sweep or MLflow hyperparameter search over a real model to identify optimal configurations
Register, version, and promote models through staging to production using the MLflow Model Registry
Design a team collaboration workflow with shared experiment namespaces, tagging conventions, and access control policies
Evaluate and justify a self-hosted versus SaaS deployment decision based on data sensitivity, cost, and team size

Topics covered

MLflow tracking: logging metrics, params, artifacts, and tags
Weights & Biases: runs, sweeps, and the W&B dashboard
Model registry: versioning, staging, and promotion workflows
Hyperparameter optimisation with W&B Sweeps and MLflow Projects
Artifact management and dataset versioning
Team collaboration patterns: shared experiments and access controls
Self-hosted MLflow vs W&B SaaS: cost, security, scalability trade-offs
CI/CD integration for automated experiment pipelines

Delivery

Delivered over two to three days, either on-site or remote via video conferencing with a shared cloud environment (e.g., AWS SageMaker Studio or Google Colab Enterprise). Each session follows a 30% concept / 70% lab ratio. Participants receive pre-configured Docker environments and Jupyter notebooks. A capstone exercise on day 2-3 requires integrating both tools into a mini ML pipeline. Remote delivery uses breakout rooms for pair-lab exercises.

What makes it work

Establishing shared naming conventions and tagging standards before the first team experiment run
Integrating experiment tracking into CI/CD so every training job is automatically logged without developer effort
Nominating a model registry owner who reviews and approves promotions from staging to production
Starting with a small reproducibility audit of past experiments to immediately demonstrate business value

Common mistakes

Logging only final metrics rather than per-step metrics, making it impossible to diagnose training instability
Skipping the model registry and relying on file paths, leading to broken reproducibility when models move to production
Running W&B Sweeps without setting a stopping strategy, resulting in runaway compute costs
Choosing self-hosted MLflow without planning storage backend and proxy auth, causing painful migrations later

When NOT to take this

A team that has not yet standardised its training framework (some using TensorFlow, others PyTorch, others AutoML SaaS) will struggle to get value from this training, establish a common modelling stack first.

Providers to consider

Sources

This training is part of a Data & AI catalog built for leaders serious about execution. Take the free diagnostic to see which trainings your team needs.

Run the diagnostic Book a call