data-quality

Installation
SKILL.md

data-quality

Purpose

This skill ensures data accuracy, completeness, and consistency in pipelines by performing validation, cleaning, and monitoring. It integrates with data engineering workflows to catch issues early, using tools like rule-based checks and automated alerts.

When to Use

Use this skill when building or maintaining data pipelines that handle large datasets, such as ETL processes, to prevent downstream errors. Apply it for compliance checks in regulated industries or when integrating new data sources that might introduce inconsistencies. Avoid it for simple data tasks without validation needs.

Key Capabilities

  • Validate data schemas and values using predefined rules (e.g., check for nulls, duplicates, or type mismatches).
  • Clean datasets by applying transformations like trimming strings or imputing missing values.
  • Monitor pipelines in real-time with metrics and alerts for anomalies.
  • Support for various data formats (CSV, JSON, Parquet) and integration with storage systems like S3 or databases.
  • Generate reports or logs for auditing, including summary statistics and error counts.

Usage Patterns

Invoke this skill via CLI for quick checks or integrate it into Python scripts for automated pipelines. Always load configuration files first, then run validations. For monitoring, set up recurring jobs. Example pattern: Load data, apply validation, clean if needed, and monitor outputs.

Related skills
Installs
23
GitHub Stars
5
First Seen
Mar 5, 2026