FORMATION IA

Fondamentaux de l'ingénierie des features pour le ML

Transformez vos données brutes en features de qualité pour améliorer significativement vos modèles de machine learning.

Format: bootcamp
Durée: 14–24h
Niveau: practitioner
Taille de groupe: 6–16
Prix / participant: €1K–€3K
Prix groupe: €8K–€18K
Public: Data analysts and business intelligence professionals transitioning into machine learning roles
Prérequis: Working knowledge of Python and pandas; familiarity with basic supervised ML concepts (train/test split, model evaluation metrics)

Ce qu'elle couvre

Cette formation de niveau praticien apprend aux analystes et professionnels de la donnée à construire des features de manière systématique à partir de données structurées et semi-structurées. Les participants maîtrisent les stratégies d'encodage catégoriel, la normalisation des variables numériques, la création de features d'interaction et temporelles, ainsi que la prévention des fuites de cible. La formation alterne théorie et ateliers pratiques en Python (pandas, scikit-learn) sur des jeux de données réels, et se conclut par une introduction aux feature stores pour les pipelines en production. Les participants repartent avec un guide de référence réutilisable applicable immédiatement à leurs projets.

À l'issue, vous saurez

Apply at least five categorical encoding strategies and justify which to use for a given dataset and model type
Build temporal features including lag variables, rolling aggregates, and cyclical encodings from raw datetime columns
Detect and eliminate target leakage in a feature pipeline using validation-set chronological splitting
Implement a reusable feature transformation pipeline using scikit-learn Pipeline and ColumnTransformer
Register and retrieve features from a basic feature store setup using Feast or Hopsworks

Sujets abordés

Categorical encoding: ordinal, one-hot, target, and frequency encoding
Numerical scaling: min-max, standardisation, robust scaling, log transforms
Interaction features and polynomial feature construction
Temporal and date-based feature extraction (lag, rolling windows, seasonality)
Handling missing values as features vs. imputation strategies
Target leakage detection and prevention techniques
Feature selection methods: filter, wrapper, and embedded approaches
Introduction to feature stores (Feast, Hopsworks) for production reuse

Modalité

Delivered over two to three days either in-person or live-virtual (Zoom/Teams). Roughly 40% concept instruction and 60% hands-on lab work. Each module pairs a short lecture with a Jupyter notebook exercise on a real-world dataset (e-commerce or financial). Participants receive a GitHub repo with all materials, a feature engineering checklist, and a reusable sklearn pipeline template. Remote delivery requires participants to have Python 3.10+ and a configured conda environment (setup guide provided in advance).

Ce qui fait que ça marche

Anchoring every exercise to a real business dataset the participants recognise, increasing relevance and retention
Introducing feature stores early so participants see how engineered features are reused in production rather than recreated per model
Pairing feature engineering training with a model evaluation module so participants can measure the impact of each transformation
Encouraging participants to bring their own dataset for a capstone exercise during the final session

Erreurs fréquentes

Encoding the target variable before splitting data, causing leakage that inflates validation scores
Applying scaling or encoding fit on the full dataset rather than only on training folds
Creating dozens of interaction features without a selection step, leading to the curse of dimensionality
Treating feature engineering as a one-off step rather than building reproducible, versioned transformation pipelines

Quand NE PAS suivre cette formation

This training is not the right fit for teams that have not yet established a baseline ML workflow, if participants have never trained and evaluated a model end-to-end, a broader ML fundamentals course should come first.

Fournisseurs à considérer

Sources

Cette formation fait partie d'un catalogue Data & IA construit pour les leaders sérieux sur l'exécution. Lancez le diagnostic gratuit pour voir quelles formations sont prioritaires pour votre équipe.

Lancer le diagnostic Réserver un appel