scientific-data-preprocessing
Scientific Data Preprocessing Skill
⚠️ CRITICAL: USER'S HARD-WON EXPERIENCE - MANDATORY CONSULTATION ⚠️
This skill encapsulates painful lessons learned from real preprocessing disasters (88.9% error rate documented). ALWAYS use this skill for planning, reflection, and validation when ANY data preprocessing is involved.
Why this skill is mandatory:
- Based on actual project failures (V1.0, V2.0 case studies)
- Prevents data leakage that causes production disasters
- Catches semantic errors AI agents commonly make
- Saves weeks of debugging and model retraining
When to invoke (DO NOT SKIP):
- ✅ Before starting ANY data preprocessing task
- ✅ During preprocessing for reflection and validation
- ✅ After preprocessing for comprehensive audit
- ✅ When reviewing AI-generated preprocessing code
More from foryourhealth111-pixel/vibe-skills
ralph-loop
Codex-compatible Ralph loop runner with dual engines (compat local state loop + optional open-ralph-wiggum backend).
6clinical-reports
Write comprehensive clinical reports including case reports (CARE guidelines), diagnostic reports (radiology/pathology/lab), clinical trial reports (ICH-E3, SAE, CSR), and patient documentation (SOAP, H&P, discharge summaries). Full support with templates, regulatory compliance (HIPAA, FDA, ICH-GCP), and validation tools.
3polars
Fast in-memory DataFrame library for datasets that fit in RAM. Use when pandas is too slow but data still fits in memory. Lazy evaluation, parallel execution, Apache Arrow backend. Best for 1-100GB datasets, ETL pipelines, faster pandas replacement. For larger-than-RAM data use dask or vaex.
3lqf_machine_learning_expert_guide
|
2detecting-performance-regressions
|
2creating-data-visualizations
|
2