scikit-learn
scikit-learn ML Pipelines
Build reproducible ML workflows with scikit-learn Pipelines, ColumnTransformers, cross-validation, and MLflow experiment tracking.
Principles
- Prefer
Pipeline/ColumnTransformerso preprocessing travels with the model - Make runs deterministic: set
random_stateeverywhere and seed numpy - Keep train/val/test separation; use cross-validation for small datasets
- Persist the whole pipeline with
jobliband load it for inference
Project Layout
.
data/
raw/ processed/
src/
features.py # build features, column lists
More from brojonat/llmsrules
ibis-data
Use Ibis for database-agnostic data access in Python. Use when writing data queries, connecting to databases (DuckDB, PostgreSQL, SQLite), or building portable data pipelines that should work across backends.
13go-service
Build Go microservices with stdlib HTTP handlers, sqlc, urfave/cli, and slog. Use when creating or modifying a Go HTTP server, adding routes, middleware, database queries, or CLI commands.
13temporal-go
Build Temporal workflow applications in Go. Use when creating or modifying Temporal workflows, activities, workers, clients, signals, queries, updates, retry policies, saga patterns, or writing Temporal tests.
13python-cli
Build Python CLIs with Click using subcommand groups. Use when creating or modifying a Python command-line interface, adding subcommands, or structuring a CLI package.
13parquet-analysis
Analyze parquet files using Python and Ibis. Use when the user wants to explore, transform, or analyze parquet data files, perform aggregations, joins, or export results. Works with local parquet files and provides database-agnostic data operations.
12ducklake
Work with DuckLake, an open lakehouse format built on DuckDB. Use when creating or querying DuckLake tables, managing snapshots, time travel, schema evolution, partitioning, or lakehouse maintenance operations.
12