Model Evaluator
Model Evaluator
The Model Evaluator skill helps you rigorously assess and compare machine learning model performance across multiple dimensions. It guides you through selecting appropriate metrics, designing evaluation protocols, avoiding common statistical pitfalls, and making data-driven decisions about model selection.
Proper model evaluation goes beyond accuracy scores. This skill covers evaluation across the full spectrum: predictive performance, computational efficiency, robustness, fairness, calibration, and production readiness. It helps you answer not just "which model is best?" but "which model is best for my specific use case and constraints?"
Whether you are comparing LLMs, classifiers, or custom models, this skill ensures your evaluation methodology is sound and your conclusions are reliable.
Core Workflows
Workflow 1: Design Evaluation Protocol
- Define evaluation objectives:
- Primary goal (accuracy, speed, cost, etc.)
- Secondary constraints
- Failure modes to test
- Real-world conditions to simulate
- Select appropriate metrics:
Task Type Primary Metrics Secondary Metrics
More from eddiebe147/claude-settings
supabase-expert
Expert guide for Supabase integration - database schemas, RLS policies, auth, Edge Functions, and real-time subscriptions. Use when working with Supabase backend features.
129appstore-readiness
Expert iOS App Store submission and approval system. 9 specialized agents providing senior App Review Team-level expertise across compliance, design, privacy, monetization, metadata, technical requirements, timing, rejection recovery, and learning. Triggers on keywords like app store, iOS submission, apple review, app rejection, aso, privacy manifest, privacy labels, ATT, iap, in-app purchase, subscription, storekit, review guidelines, HIG, testflight, app store connect.
85docker-composer
Expert guide for creating Docker Compose configurations, Dockerfiles, and container orchestration. Use when containerizing applications, setting up development environments, or configuring multi-container deployments.
83copywriter
Craft persuasive marketing copy that drives conversions and engagement
81technical writer
Create clear, accurate technical documentation for developers and end users
71landing page optimizer
Optimize landing pages for maximum conversion through copy, design, and UX improvements
70