ai-ml-data-science
Installation
SKILL.md
Data Science Engineering Suite - Quick Reference
This skill turns raw data and questions into validated, documented models ready for production:
- EDA workflows: Structured exploration with drift detection
- Feature engineering: Reproducible feature pipelines with leakage prevention and train/serve parity
- Model selection: Baselines first; strong tabular defaults; escalate complexity only when justified
- Evaluation & reporting: Slice analysis, uncertainty, model cards, production metrics
- SQL transformation: SQLMesh for staging/intermediate/marts layers
- MLOps: CI/CD, CT (continuous training), CM (continuous monitoring)
- Production patterns: Data contracts, lineage, feedback loops, streaming features
Modern emphasis (2026): Feature stores, automated retraining, drift monitoring (Evidently), train-serve parity, and agentic ML loops (plan -> execute -> evaluate -> improve). Tools: LightGBM, CatBoost, scikit-learn, PyTorch, Polars (lazy eval for larger-than-RAM datasets), lakeFS for data versioning.