data-cleaning-pipeline
Installation
SKILL.md
Data Cleaning Pipeline
A skill for building systematic, reproducible data cleaning pipelines for research datasets. Covers common data quality issues, step-by-step cleaning workflows, handling missing values, detecting and treating outliers, validating data integrity, and documenting cleaning decisions for reproducibility.
The Data Cleaning Workflow
Pipeline Overview
Data cleaning should follow a consistent, documented order. Each step builds on the previous one, and the entire pipeline should be scripted for reproducibility.
Data Cleaning Pipeline (recommended order):
1. Initial Assessment
- Load data, check dimensions, inspect dtypes
- Generate summary statistics and missing value report
- Identify structural issues (merged cells, inconsistent delimiters)