agent-comparison

Installation
SKILL.md

Agent Comparison Skill

Compare agent variants through controlled A/B benchmarks. Runs identical tasks on both agents, grades output quality with domain-specific checklists, and reports total session token cost to a working solution. This skill is exclusively for agent variant comparison — use agent-evaluation for single-agent assessment, and skill-eval for skill testing.

Reference Loading Table

Signal Load These Files Why
tasks related to this reference benchmark-tasks.md Loads detailed guidance from benchmark-tasks.md.
example-driven tasks, errors examples-and-errors.md Loads detailed guidance from examples-and-errors.md.
tasks related to this reference grading-rubric.md Loads detailed guidance from grading-rubric.md.
tasks related to this reference methodology.md Loads detailed guidance from methodology.md.
tasks related to this reference optimization-guide.md Loads detailed guidance from optimization-guide.md.
tasks related to this reference optimize-phase.md Loads detailed guidance from optimize-phase.md.
tasks related to this reference report-template.md Loads detailed guidance from report-template.md.

Instructions

See references/examples-and-errors.md for error handling. See references/optimize-phase.md for Phase 5 OPTIMIZE full procedure. See references/methodology.md for December 2024 benchmark data.

Related skills
Installs
7
GitHub Stars
366
First Seen
Mar 23, 2026