Quel est le niveau de maturité de votre organisation Data & IA ?Faites le diagnostic
Toutes les formations

FORMATION IA

Pipelines de données pour les charges de travail IA

Construisez des pipelines de données fiables et observables pour alimenter vos modèles d'IA en production.

Format
bootcamp
Durée
24–40h
Niveau
practitioner
Taille de groupe
6–16
Prix / participant
€2K–€4K
Prix groupe
€22K–€45K
Public
Data engineers and senior analytics engineers working on AI or ML-adjacent infrastructure
Prérequis
Solid Python skills, working knowledge of SQL, and hands-on experience building or maintaining at least one data pipeline in a production environment

Ce qu'elle couvre

Ce programme de niveau praticien forme les ingénieurs de données aux patterns et outils nécessaires pour concevoir, construire et opérer des pipelines adaptés aux charges de travail IA et ML. Les participants travaillent sur la conception ELT, les arbitrages streaming/batch, l'évolution de schémas et les contrôles qualité avec Airflow, dbt, Dagster et Prefect. Le format alterne sessions guidées et travaux pratiques sur des cas d'usage réels. À l'issue du programme, les participants sont capables de livrer des pipelines résilients et observables répondant aux exigences des systèmes ML en production.

À l'issue, vous saurez

  • Design and implement an ELT pipeline using dbt and a cloud data warehouse optimised for ML feature generation
  • Choose between streaming and batch ingestion architectures based on model latency and data freshness requirements
  • Configure Dagster or Prefect to orchestrate a multi-step AI data workflow with retries, branching, and SLA alerts
  • Implement schema evolution policies that prevent silent data drift from breaking downstream model training
  • Write and deploy data quality checks using Great Expectations or dbt tests that gate pipeline progression

Sujets abordés

  • ELT patterns optimised for feature stores and model training data
  • Streaming vs batch trade-offs for real-time inference pipelines
  • Schema evolution and backward compatibility strategies
  • Orchestration with Airflow, Dagster, and Prefect — when to use which
  • Data transformation and lineage with dbt
  • Data quality gates: expectations, anomaly detection, and alerting
  • Pipeline observability: logging, metrics, and SLAs
  • Handling large-scale data for LLM fine-tuning and RAG workloads

Modalité

Delivered as a 3–5 day intensive bootcamp, available in-person or fully remote via collaborative tooling (VS Code Live Share, shared cloud environments). Approximately 60% hands-on lab time across realistic datasets. Participants receive a pre-configured cloud sandbox (GCP or AWS) and access to recorded sessions for 90 days post-training. A capstone project — building an end-to-end pipeline for a simulated LLM embedding refresh workflow — is assessed and returned with written feedback.

Ce qui fait que ça marche

  • Establish data contracts between pipeline producers and ML consumers before writing transformation code
  • Instrument pipelines with observability from day one — SLA tracking, freshness metrics, and anomaly alerts
  • Run quality gates as mandatory pipeline steps rather than optional monitoring layers
  • Align orchestration tool choice with the team's existing DevOps practices and cloud vendor ecosystem

Erreurs fréquentes

  • Reusing analytical ETL pipelines for ML workloads without adapting for feature consistency and point-in-time correctness
  • Choosing streaming by default without evaluating whether model latency requirements actually justify the operational overhead
  • Skipping data quality gates in early pipeline versions and discovering silent schema drift only after model degradation in production
  • Treating pipeline orchestration tool selection as purely technical and ignoring team familiarity and operational support costs

Quand NE PAS suivre cette formation

This training is not the right fit for teams that have not yet standardised on a cloud data warehouse and have no existing pipelines — they need foundational data engineering onboarding first before tackling AI-specific pipeline patterns.

Fournisseurs à considérer

Sources

Cette formation fait partie d'un catalogue Data & IA construit pour les leaders sérieux sur l'exécution. Lancez le diagnostic gratuit pour voir quelles formations sont prioritaires pour votre équipe.