본문 바로가기
CS/DataEngineering

Data Lake vs Data Warehouse

by Diligejy 2022. 12. 10.

1. Comparision of Data Lake & Data Warehouse

  Data Lake Data Warehouse
  Stores all the raw data Specific data for specific use
  Can be petabytes (1 million GBs) Relative small
  Stores all data structures Stores mainly structured data
  Cost effective More costly to update
  Difficult to analyze Optimized for data analysis
  Requires an up-to-date data catalog  
  Used by data scientists Also used by data analysts and business analysts
  Big data, real time analytics Ad-hoc, read-only queries

2. Data Catalog for data lakes

- What is the source of this data?

- Where is this data used?

- Who is the owner of the data?

- How often is this data updated?

- Good practice in terms of data governance

- Ensures reproducibility

 

 

댓글