dqx-patterns
Installation
SKILL.md
DQX Data Quality Framework Patterns
Overview
DQX is a Python-based data quality framework from Databricks Labs that validates PySpark DataFrames with richer diagnostics than standard DLT expectations. This skill provides production-grade patterns for integrating DQX into medallion architecture pipelines.
Recommended Version: >=0.12.0 (float support, outlier detection, JSON validation, AI-assisted rules)
Key Benefits:
- Detailed diagnostic information (
_error,_warningcolumns) - Flexible quarantine strategies (drop, mark, split)
- Dataset-level checks (uniqueness, foreign keys, outliers, aggregations)
- YAML/JSON/Delta/Lakebase check storage with governance
- Auto-profiling and AI-assisted rule generation (0.10.0+)
- Summary metrics for quality tracking over time (0.10.0+)
Quick Start (3-4 hours for pilot)
Goal: Add DQX diagnostics to one Silver table without disrupting existing DLT expectations.