AI TRAINING

Feature Engineering Fundamentals for ML

Transform raw data into high-quality features that meaningfully improve machine learning model performance.

Format: bootcamp
Duration: 14–24h
Level: practitioner
Group size: 6–16
Price / participant: €1K–€3K
Group price: €8K–€18K
Audience: Data analysts and business intelligence professionals transitioning into machine learning roles
Prerequisites: Working knowledge of Python and pandas; familiarity with basic supervised ML concepts (train/test split, model evaluation metrics)

What it covers

This practitioner-level training teaches analysts and data professionals how to systematically engineer features from structured and semi-structured data. Participants learn categorical encoding strategies, numerical scaling, interaction and polynomial features, temporal feature extraction, and how to prevent target leakage. The programme combines hands-on labs in Python (pandas, scikit-learn) with real datasets, and closes with an introduction to feature stores for production pipelines. Participants leave with a reusable feature engineering playbook they can apply to their own datasets immediately.

What you'll be able to do

Apply at least five categorical encoding strategies and justify which to use for a given dataset and model type
Build temporal features including lag variables, rolling aggregates, and cyclical encodings from raw datetime columns
Detect and eliminate target leakage in a feature pipeline using validation-set chronological splitting
Implement a reusable feature transformation pipeline using scikit-learn Pipeline and ColumnTransformer
Register and retrieve features from a basic feature store setup using Feast or Hopsworks

Topics covered

Categorical encoding: ordinal, one-hot, target, and frequency encoding
Numerical scaling: min-max, standardisation, robust scaling, log transforms
Interaction features and polynomial feature construction
Temporal and date-based feature extraction (lag, rolling windows, seasonality)
Handling missing values as features vs. imputation strategies
Target leakage detection and prevention techniques
Feature selection methods: filter, wrapper, and embedded approaches
Introduction to feature stores (Feast, Hopsworks) for production reuse

Delivery

Delivered over two to three days either in-person or live-virtual (Zoom/Teams). Roughly 40% concept instruction and 60% hands-on lab work. Each module pairs a short lecture with a Jupyter notebook exercise on a real-world dataset (e-commerce or financial). Participants receive a GitHub repo with all materials, a feature engineering checklist, and a reusable sklearn pipeline template. Remote delivery requires participants to have Python 3.10+ and a configured conda environment (setup guide provided in advance).

What makes it work

Anchoring every exercise to a real business dataset the participants recognise, increasing relevance and retention
Introducing feature stores early so participants see how engineered features are reused in production rather than recreated per model
Pairing feature engineering training with a model evaluation module so participants can measure the impact of each transformation
Encouraging participants to bring their own dataset for a capstone exercise during the final session

Common mistakes

Encoding the target variable before splitting data, causing leakage that inflates validation scores
Applying scaling or encoding fit on the full dataset rather than only on training folds
Creating dozens of interaction features without a selection step, leading to the curse of dimensionality
Treating feature engineering as a one-off step rather than building reproducible, versioned transformation pipelines

When NOT to take this

This training is not the right fit for teams that have not yet established a baseline ML workflow, if participants have never trained and evaluated a model end-to-end, a broader ML fundamentals course should come first.

Providers to consider

Sources

This training is part of a Data & AI catalog built for leaders serious about execution. Take the free diagnostic to see which trainings your team needs.

Run the diagnostic Book a call