data-engineering-storage-lakehouse

Installation
SKILL.md

Lakehouse Formats

Lakehouse formats add ACID transactions, schema evolution, and time travel to data lakes stored on object storage (S3, GCS, Azure). This skill covers the three major open table formats: Delta Lake, Apache Iceberg, and Apache Hudi.

Quick Comparison

Feature Delta Lake Apache Iceberg Apache Hudi
ACID Transactions
Time Travel
Schema Evolution Advanced (branching)
Primary Ecosystem Spark/Databricks Engine-agnostic Spark (CDC focus)
Write Optimization Copy-on-write CoW, Merge-on-Read CoW, Merge-on-Read
Python API deltalake (pure), PySpark pyiceberg (pure) PySpark only
Best For Spark ecosystems, Databricks Multi-engine analytics Change data capture, streaming

When to Use Which?

  • Delta Lake: You're in the Spark/Databricks ecosystem, need mature tooling with pure-Python deltalake library
Related skills

More from legout/data-platform-agent-skills

Installs
6
First Seen
Feb 11, 2026