Agent Comparison Skill

Compare agent variants through controlled A/B benchmarks. Runs identical tasks on both agents, grades output quality with domain-specific checklists, and reports total session token cost to a working solution. This skill is exclusively for agent variant comparison — use agent-evaluation for single-agent assessment, and skill-eval for skill testing.

Reference Loading Table

Signal	Load These Files	Why
tasks related to this reference	`benchmark-tasks.md`	Loads detailed guidance from `benchmark-tasks.md`.
example-driven tasks, errors	`examples-and-errors.md`	Loads detailed guidance from `examples-and-errors.md`.
tasks related to this reference	`grading-rubric.md`	Loads detailed guidance from `grading-rubric.md`.
tasks related to this reference	`methodology.md`	Loads detailed guidance from `methodology.md`.
tasks related to this reference	`optimization-guide.md`	Loads detailed guidance from `optimization-guide.md`.
tasks related to this reference	`optimize-phase.md`	Loads detailed guidance from `optimize-phase.md`.
tasks related to this reference	`report-template.md`	Loads detailed guidance from `report-template.md`.

Instructions

See references/examples-and-errors.md for error handling. See references/optimize-phase.md for Phase 5 OPTIMIZE full procedure. See references/methodology.md for December 2024 benchmark data.

agent-comparison

Agent Comparison Skill

Reference Loading Table

Instructions

More from notque/claude-code-toolkit

generate-claudemd

fish-shell-config

pptx-generator

codebase-overview

image-to-video

data-analysis