data-engineering-storage-lakehouse
Installation
SKILL.md
Lakehouse Formats
Lakehouse formats add ACID transactions, schema evolution, and time travel to data lakes stored on object storage (S3, GCS, Azure). This skill covers the three major open table formats: Delta Lake, Apache Iceberg, and Apache Hudi.
Quick Comparison
| Feature | Delta Lake | Apache Iceberg | Apache Hudi |
|---|---|---|---|
| ACID Transactions | ✅ | ✅ | ✅ |
| Time Travel | ✅ | ✅ | ✅ |
| Schema Evolution | ✅ | Advanced (branching) | ✅ |
| Primary Ecosystem | Spark/Databricks | Engine-agnostic | Spark (CDC focus) |
| Write Optimization | Copy-on-write | CoW, Merge-on-Read | CoW, Merge-on-Read |
| Python API | deltalake (pure), PySpark |
pyiceberg (pure) |
PySpark only |
| Best For | Spark ecosystems, Databricks | Multi-engine analytics | Change data capture, streaming |