building-data-pipelines
Installation
SKILL.md
Building Data Pipelines
Build robust, efficient batch data pipelines in Python. This skill covers the complete pipeline lifecycle: extracting data from sources, transforming with DataFrames or SQL, loading to destinations, and operating with production standards.
When to use this skill
Use this skill when:
- Building ETL/ELT pipelines in Python
- Choosing between Polars, DuckDB, PyArrow, or SQL for data processing
- Designing data layer architecture (Bronze/Silver/Gold)
- Implementing incremental loading with watermarks or CDC
- Deciding on append vs overwrite vs merge semantics
- Setting up partitioning and file sizing strategies
- Validating data quality at pipeline boundaries