skill-forge-benchmark
Installation
SKILL.md
Skill Benchmarking & Performance Tracking
Measure and compare skill performance across iterations with statistical rigor using multiple trials, variance analysis, and trend tracking.
Process
Step 1: Define Benchmark Configuration
Accept configuration as:
- Existing eval set: Path to
evals/evals.json(from/skill-forge eval) - Benchmark config: Custom config with trial count and thresholds