ml-data-pipeline-architecture
ML Data Pipeline Architecture
Patterns for efficient ML data pipelines using Polars, Arrow, and ClickHouse.
ADR: 2026-01-22-polars-preference-hook (efficiency preferences framework)
Note: A PreToolUse hook enforces Polars preference. To use Pandas, add
# polars-exception: <reason>at file top.
Self-Evolving Skill: This skill improves through use. If instructions are wrong, parameters drifted, or a workaround was needed — fix this file immediately, don't defer. Only update for real, reproducible issues.
When to Use This Skill
Use this skill when:
- Deciding between Polars and Pandas for a data pipeline
- Optimizing memory usage with zero-copy Arrow patterns
- Loading data from ClickHouse into PyTorch DataLoaders
- Implementing lazy evaluation for large datasets
- Migrating existing Pandas code to Polars
More from terrylica/cc-skills
pandoc-pdf-generation
PDF generation from markdown via Pandoc/XeLaTeX. TRIGGERS - markdown for PDF, print document, pandoc
379mise-tasks
Orchestrate multi-step project workflows using mise task definitions with dependency management and argument handling. Use whenever the user.
328python-logging-best-practices
Python logging with loguru, structlog, and orjson. TRIGGERS - loguru, structlog, structured logging
326mql5-indicator-patterns
MQL5 indicator development patterns. TRIGGERS - MQL5 indicator, OnCalculate, indicator buffers, MetaTrader 5.
324semantic-release
Automate versioning with Node.js semantic-release v25+. TRIGGERS - npm run release, version bump, changelog, conventional commits, release automation.
213latex-build
LaTeX builds with latexmk and live preview. TRIGGERS - latexmk, LaTeX build, live preview, compilation.
167