sc-evaluate
LLM Evaluation Skill
Run LLM pipeline evaluation against gold standard datasets using oracle LLM-as-judge scoring. Measures output quality across weighted dimensions, identifies weak steps, and suggests prompt improvements.
Quick Start
# Full evaluation (all test cases, all steps)
/sc:evaluate
# Quick spot check
/sc:evaluate --cases=case_1,case_2 --steps=1,2,3
# Re-evaluate existing results without re-running pipeline
/sc:evaluate --skip-pipeline
# Generate outputs only (no evaluation)
/sc:evaluate --skip-eval
More from tony363/superclaude
sc-estimate
Provide development estimates for tasks, features, or projects with intelligent analysis. Use when planning timelines, assessing complexity, or scoping resources.
83agent-fullstack-developer
End-to-end feature owner with expertise across the entire stack. Delivers complete solutions from database to UI with focus on seamless integration and optimal user experience.
53agent-react-specialist
Expert React specialist mastering React 18+ with modern patterns and ecosystem. Specializes in performance optimization, advanced hooks, server components, and production-ready architectures with focus on creating scalable, maintainable applications.
36agent-technical-writer
Expert technical writer specializing in clear, accurate documentation and content creation. Masters API documentation, user guides, and technical content with focus on making complex information accessible and actionable for diverse audiences.
34sc-design
Design system architecture, APIs, and component interfaces with comprehensive specifications. Use when planning architecture, designing APIs, creating component interfaces, or modeling databases.
34agent-security-engineer
Expert infrastructure security engineer specializing in DevSecOps, cloud security, and compliance frameworks. Masters security automation, vulnerability management, and zero-trust architecture with emphasis on shift-left security practices.
33