What is Data Pipeline?
An automated workflow that extracts, transforms, and loads data from sources to destinations.
A data pipeline is an automated series of processes that move data from one or more sources to one or more destinations, typically involving extraction (pulling data from source systems), transformation (cleaning, enriching, and restructuring), and loading (writing to the target system). Modern pipelines handle batch and real-time streaming, include data quality checks, and support observability and monitoring.
Related terms
Data Lake
A centralized repository that stores raw data in its native format at any scale.
Data Warehouse
A structured repository optimized for analytical queries and business intelligence reporting.
Data Quality
The degree to which data is accurate, complete, consistent, timely, and fit for its intended use.
Put this into practice
Assess your maturity, discover initiatives, and build your transformation roadmap.
Start free assessment