Back to glossaryDefinition

What is Data Lake?

A centralized repository that stores raw data in its native format at any scale.

A data lake is a centralized storage repository that holds vast amounts of raw data in its native format — structured, semi-structured, and unstructured — until needed for analysis. Unlike data warehouses that require schema-on-write, data lakes use schema-on-read, offering flexibility in how data is processed. They support diverse workloads including batch processing, real-time analytics, and machine learning. However, without proper governance, data lakes can become 'data swamps' where data is disorganized and unusable.

Learn more

Put this into practice

Assess your maturity, discover initiatives, and build your transformation roadmap.

Start free assessment