data-engineering-quality

Installation
SKILL.md

Data Quality and Testing

Data validation and testing frameworks for ensuring pipeline correctness and data quality: Great Expectations (enterprise) and Pandera (lightweight). Integrates with orchestration tools for automated validation.

Quick Comparison

Feature Great Expectations Pandera
Approach Declarative "expectations" Schema definitions with checks
DataFrame Support Pandas, Spark, SQL, BigQuery Pandas, Polars, PySpark, Dask
Validation Output JSON results with detailed diagnostics Boolean or exception
Best For Enterprise data platforms, comprehensive profiling Python-centric pipelines, lightweight
Learning Curve Steeper (concepts: DataContext, Checkpoints) Lower (Python decorators/classes)
Integration CI/CD, Airflow, Prefect, Dagster pytest, FastAPI, any Python code

When to Use Which?

  • Great Expectations: You need comprehensive data documentation (data docs), profiling, and validation with rich reporting. Organizations with dedicated data quality teams.
Related skills

More from legout/data-platform-agent-skills

Installs
7
First Seen
Feb 11, 2026