What is Data Lake?
A centralized repository that stores raw data in its native format at any scale.
A data lake is a centralized storage repository that holds vast amounts of raw data in its native format — structured, semi-structured, and unstructured — until needed for analysis. Unlike data warehouses that require schema-on-write, data lakes use schema-on-read, offering flexibility in how data is processed. They support diverse workloads including batch processing, real-time analytics, and machine learning. However, without proper governance, data lakes can become 'data swamps' where data is disorganized and unusable.
Related terms
Data Mesh
A decentralized data architecture that treats data as a product owned by domain teams.
Data Catalog
A searchable inventory of all data assets in an organization with metadata, lineage, and access information.
Data Governance
The framework of policies, processes, and standards for managing data assets across an organization.
Learn more
Put this into practice
Assess your maturity, discover initiatives, and build your transformation roadmap.
Start free assessment