data-quality-assessment
Installation
SKILL.md
Five Quality Dimensions
Score each dimension 1-5 when evaluating any data source or pipeline:
1. Completeness - What percentage of expected records and fields are present?
- Null rate per column
- Missing record detection (expected vs actual row counts)
- Required field coverage
- Score 5: <1% nulls in required fields. Score 1: >20% missing data.
2. Accuracy - Does the data reflect reality?
- Cross-field validation (age matches birth date, totals match line items)
- Reference data matching (codes exist in terminology tables)
- Statistical outlier detection
- Score 5: <0.1% error rate verified against gold standard. Score 1: Known systematic errors unresolved.
3. Timeliness - Is the data fresh enough for its intended use?
- Data freshness (time since last update vs SLA)
- Pipeline latency (ingestion to availability)
Related skills