FORMATION IA
Fondamentaux de l'ingénierie des features pour le ML
Transformez vos données brutes en features de qualité pour améliorer significativement vos modèles de machine learning.
Ce qu'elle couvre
Cette formation de niveau praticien apprend aux analystes et professionnels de la donnée à construire des features de manière systématique à partir de données structurées et semi-structurées. Les participants maîtrisent les stratégies d'encodage catégoriel, la normalisation des variables numériques, la création de features d'interaction et temporelles, ainsi que la prévention des fuites de cible. La formation alterne théorie et ateliers pratiques en Python (pandas, scikit-learn) sur des jeux de données réels, et se conclut par une introduction aux feature stores pour les pipelines en production. Les participants repartent avec un guide de référence réutilisable applicable immédiatement à leurs projets.
À l'issue, vous saurez
- Apply at least five categorical encoding strategies and justify which to use for a given dataset and model type
- Build temporal features including lag variables, rolling aggregates, and cyclical encodings from raw datetime columns
- Detect and eliminate target leakage in a feature pipeline using validation-set chronological splitting
- Implement a reusable feature transformation pipeline using scikit-learn Pipeline and ColumnTransformer
- Register and retrieve features from a basic feature store setup using Feast or Hopsworks
Sujets abordés
- Categorical encoding: ordinal, one-hot, target, and frequency encoding
- Numerical scaling: min-max, standardisation, robust scaling, log transforms
- Interaction features and polynomial feature construction
- Temporal and date-based feature extraction (lag, rolling windows, seasonality)
- Handling missing values as features vs. imputation strategies
- Target leakage detection and prevention techniques
- Feature selection methods: filter, wrapper, and embedded approaches
- Introduction to feature stores (Feast, Hopsworks) for production reuse
Modalité
Delivered over two to three days either in-person or live-virtual (Zoom/Teams). Roughly 40% concept instruction and 60% hands-on lab work. Each module pairs a short lecture with a Jupyter notebook exercise on a real-world dataset (e-commerce or financial). Participants receive a GitHub repo with all materials, a feature engineering checklist, and a reusable sklearn pipeline template. Remote delivery requires participants to have Python 3.10+ and a configured conda environment (setup guide provided in advance).
Ce qui fait que ça marche
- Anchoring every exercise to a real business dataset the participants recognise, increasing relevance and retention
- Introducing feature stores early so participants see how engineered features are reused in production rather than recreated per model
- Pairing feature engineering training with a model evaluation module so participants can measure the impact of each transformation
- Encouraging participants to bring their own dataset for a capstone exercise during the final session
Erreurs fréquentes
- Encoding the target variable before splitting data, causing leakage that inflates validation scores
- Applying scaling or encoding fit on the full dataset rather than only on training folds
- Creating dozens of interaction features without a selection step, leading to the curse of dimensionality
- Treating feature engineering as a one-off step rather than building reproducible, versioned transformation pipelines
Quand NE PAS suivre cette formation
This training is not the right fit for teams that have not yet established a baseline ML workflow — if participants have never trained and evaluated a model end-to-end, a broader ML fundamentals course should come first.
Fournisseurs à considérer
Sources
Cette formation fait partie d'un catalogue Data & IA construit pour les leaders sérieux sur l'exécution. Lancez le diagnostic gratuit pour voir quelles formations sont prioritaires pour votre équipe.