How mature is your Data & AI organization?Take the diagnostic
All trainings

AI TRAINING

Feature Engineering Fundamentals for ML

Transform raw data into high-quality features that meaningfully improve machine learning model performance.

Format
bootcamp
Duration
14–24h
Level
practitioner
Group size
6–16
Price / participant
€1K–€3K
Group price
€8K–€18K
Audience
Data analysts and business intelligence professionals transitioning into machine learning roles
Prerequisites
Working knowledge of Python and pandas; familiarity with basic supervised ML concepts (train/test split, model evaluation metrics)

What it covers

This practitioner-level training teaches analysts and data professionals how to systematically engineer features from structured and semi-structured data. Participants learn categorical encoding strategies, numerical scaling, interaction and polynomial features, temporal feature extraction, and how to prevent target leakage. The programme combines hands-on labs in Python (pandas, scikit-learn) with real datasets, and closes with an introduction to feature stores for production pipelines. Participants leave with a reusable feature engineering playbook they can apply to their own datasets immediately.

What you'll be able to do

  • Apply at least five categorical encoding strategies and justify which to use for a given dataset and model type
  • Build temporal features including lag variables, rolling aggregates, and cyclical encodings from raw datetime columns
  • Detect and eliminate target leakage in a feature pipeline using validation-set chronological splitting
  • Implement a reusable feature transformation pipeline using scikit-learn Pipeline and ColumnTransformer
  • Register and retrieve features from a basic feature store setup using Feast or Hopsworks

Topics covered

  • Categorical encoding: ordinal, one-hot, target, and frequency encoding
  • Numerical scaling: min-max, standardisation, robust scaling, log transforms
  • Interaction features and polynomial feature construction
  • Temporal and date-based feature extraction (lag, rolling windows, seasonality)
  • Handling missing values as features vs. imputation strategies
  • Target leakage detection and prevention techniques
  • Feature selection methods: filter, wrapper, and embedded approaches
  • Introduction to feature stores (Feast, Hopsworks) for production reuse

Delivery

Delivered over two to three days either in-person or live-virtual (Zoom/Teams). Roughly 40% concept instruction and 60% hands-on lab work. Each module pairs a short lecture with a Jupyter notebook exercise on a real-world dataset (e-commerce or financial). Participants receive a GitHub repo with all materials, a feature engineering checklist, and a reusable sklearn pipeline template. Remote delivery requires participants to have Python 3.10+ and a configured conda environment (setup guide provided in advance).

What makes it work

  • Anchoring every exercise to a real business dataset the participants recognise, increasing relevance and retention
  • Introducing feature stores early so participants see how engineered features are reused in production rather than recreated per model
  • Pairing feature engineering training with a model evaluation module so participants can measure the impact of each transformation
  • Encouraging participants to bring their own dataset for a capstone exercise during the final session

Common mistakes

  • Encoding the target variable before splitting data, causing leakage that inflates validation scores
  • Applying scaling or encoding fit on the full dataset rather than only on training folds
  • Creating dozens of interaction features without a selection step, leading to the curse of dimensionality
  • Treating feature engineering as a one-off step rather than building reproducible, versioned transformation pipelines

When NOT to take this

This training is not the right fit for teams that have not yet established a baseline ML workflow — if participants have never trained and evaluated a model end-to-end, a broader ML fundamentals course should come first.

Providers to consider

Sources

This training is part of a Data & AI catalog built for leaders serious about execution. Take the free diagnostic to see which trainings your team needs.