databricks-iceberg
Installation
SKILL.md
Apache Iceberg on Databricks
Databricks provides multiple ways to work with Apache Iceberg: native managed Iceberg tables, UniForm for Delta-to-Iceberg interoperability, and the Iceberg REST Catalog (IRC) for external engine access.
Critical Rules (always follow)
- MUST use Unity Catalog — all Iceberg features require UC-enabled workspaces
- MUST NOT install an Iceberg library into Databricks Runtime (DBR includes built-in Iceberg support; adding a library causes version conflicts)
- MUST NOT set
write.metadata.pathorwrite.metadata.previous-versions-max— Databricks manages metadata locations automatically; overriding causes corruption - MUST determine which Iceberg pattern fits the use case before writing code — see the When to Use section below
- MUST know that both
PARTITIONED BYandCLUSTER BYproduce the same Iceberg metadata for external engines — UC maintains an Iceberg partition spec with partition fields corresponding to the clustering keys, so external engines reading via IRC see a partitioned Iceberg table (not Hive-style, but proper Iceberg partition fields) and can prune on those fields; internally UC uses those fields as liquid clustering keys; the only differences between the two syntaxes are: (1)PARTITIONED BYis standard Iceberg DDL (any engine can create the table), whileCLUSTER BYis DBR-only DDL; (2)PARTITIONED BYauto-handles DV/row-tracking properties, whileCLUSTER BYrequires manual TBLPROPERTIES on v2 - MUST NOT use expression-based partition transforms (
bucket(),years(),months(),days(),hours()) withPARTITIONED BYon managed Iceberg tables — only plain column references are supported; expression transforms cause errors - MUST disable deletion vectors and row tracking when using
CLUSTER BYon Iceberg v2 tables — set'delta.enableDeletionVectors' = falseand'delta.enableRowTracking' = falsein TBLPROPERTIES (Iceberg v3 handles this automatically;PARTITIONED BYhandles this automatically on both v2 and v3)