agent-comparison
Agent Comparison Skill
Compare agent variants through controlled A/B benchmarks. Runs identical tasks on both agents, grades output quality with domain-specific checklists, and reports total session token cost to a working solution. This skill is exclusively for agent variant comparison — use agent-evaluation for single-agent assessment, and skill-eval for skill testing.
Reference Loading Table
| Signal | Load These Files | Why |
|---|---|---|
| tasks related to this reference | benchmark-tasks.md |
Loads detailed guidance from benchmark-tasks.md. |
| example-driven tasks, errors | examples-and-errors.md |
Loads detailed guidance from examples-and-errors.md. |
| tasks related to this reference | grading-rubric.md |
Loads detailed guidance from grading-rubric.md. |
| tasks related to this reference | methodology.md |
Loads detailed guidance from methodology.md. |
| tasks related to this reference | optimization-guide.md |
Loads detailed guidance from optimization-guide.md. |
| tasks related to this reference | optimize-phase.md |
Loads detailed guidance from optimize-phase.md. |
| tasks related to this reference | report-template.md |
Loads detailed guidance from report-template.md. |
Instructions
See
references/examples-and-errors.mdfor error handling. Seereferences/optimize-phase.mdfor Phase 5 OPTIMIZE full procedure. Seereferences/methodology.mdfor December 2024 benchmark data.
More from notque/claude-code-toolkit
generate-claudemd
Generate project-specific CLAUDE.md from repo analysis.
12fish-shell-config
Fish shell configuration and PATH management.
12pptx-generator
PPTX presentation generation with visual QA: slides, pitch decks.
12codebase-overview
Systematic codebase exploration and architecture mapping.
10image-to-video
FFmpeg-based video creation from image and audio.
9data-analysis
Decision-first data analysis with statistical rigor gates.
9