Dataset Curator
Dataset Curator
The Dataset Curator skill guides you through the critical process of preparing high-quality training data for machine learning models. Data quality is the single most important factor in model performance, yet it is often underinvested. This skill helps you systematically clean, validate, augment, and maintain datasets that lead to better models.
From initial collection to ongoing maintenance, this skill covers deduplication, label quality assessment, bias detection, augmentation strategies, and version control. It applies best practices from production ML systems to ensure your datasets are not just clean, but strategically optimized for your learning objectives.
Whether you are building a classifier, fine-tuning an LLM, or training a custom model, this skill ensures your data foundation is solid.
Core Workflows
Workflow 1: Assess Dataset Quality
- Profile the dataset:
- Size and dimensionality
- Label distribution and balance
- Missing value patterns
- Feature statistics
- Identify quality issues:
- Duplicates (exact and near-duplicate)
- Mislabeled examples
More from eddiebe147/claude-settings
supabase-expert
Expert guide for Supabase integration - database schemas, RLS policies, auth, Edge Functions, and real-time subscriptions. Use when working with Supabase backend features.
129appstore-readiness
Expert iOS App Store submission and approval system. 9 specialized agents providing senior App Review Team-level expertise across compliance, design, privacy, monetization, metadata, technical requirements, timing, rejection recovery, and learning. Triggers on keywords like app store, iOS submission, apple review, app rejection, aso, privacy manifest, privacy labels, ATT, iap, in-app purchase, subscription, storekit, review guidelines, HIG, testflight, app store connect.
85docker-composer
Expert guide for creating Docker Compose configurations, Dockerfiles, and container orchestration. Use when containerizing applications, setting up development environments, or configuring multi-container deployments.
83copywriter
Craft persuasive marketing copy that drives conversions and engagement
81technical writer
Create clear, accurate technical documentation for developers and end users
71landing page optimizer
Optimize landing pages for maximum conversion through copy, design, and UX improvements
70