ml-data-pipeline-architecture

Installation
SKILL.md

ML Data Pipeline Architecture

Patterns for efficient ML data pipelines using Polars, Arrow, and ClickHouse.

ADR: 2026-01-22-polars-preference-hook (efficiency preferences framework)

Note: A PreToolUse hook enforces Polars preference. To use Pandas, add # polars-exception: <reason> at file top.

Self-Evolving Skill: This skill improves through use. If instructions are wrong, parameters drifted, or a workaround was needed — fix this file immediately, don't defer. Only update for real, reproducible issues.

When to Use This Skill

Use this skill when:

  • Deciding between Polars and Pandas for a data pipeline
  • Optimizing memory usage with zero-copy Arrow patterns
  • Loading data from ClickHouse into PyTorch DataLoaders
  • Implementing lazy evaluation for large datasets
  • Migrating existing Pandas code to Polars
Related skills
Installs
92
GitHub Stars
46
First Seen
Feb 7, 2026