data-cleaning-pipeline

Installation
SKILL.md

Data Cleaning Pipeline

A skill for building systematic, reproducible data cleaning pipelines for research datasets. Covers common data quality issues, step-by-step cleaning workflows, handling missing values, detecting and treating outliers, validating data integrity, and documenting cleaning decisions for reproducibility.

The Data Cleaning Workflow

Pipeline Overview

Data cleaning should follow a consistent, documented order. Each step builds on the previous one, and the entire pipeline should be scripted for reproducibility.

Data Cleaning Pipeline (recommended order):

1. Initial Assessment
   - Load data, check dimensions, inspect dtypes
   - Generate summary statistics and missing value report
   - Identify structural issues (merged cells, inconsistent delimiters)
Installs
3
GitHub Stars
230
First Seen
Mar 31, 2026
data-cleaning-pipeline — wentorai/research-plugins