skill-forge-benchmark
Installation
SKILL.md
Skill Benchmarking & Performance Tracking
Measure and compare skill performance across iterations with statistical rigor using multiple trials, variance analysis, and trend tracking.
Process
Step 1: Define Benchmark Configuration
Accept configuration as:
- Existing eval set: Path to
evals/evals.json(from/skill-forge eval) - Benchmark config: Custom config with trial count and thresholds
Benchmark config schema:
{
"skill_name": "my-skill",
"skill_path": "./my-skill",
"eval_set_path": "./evals/evals.json",
Related skills