recursive-benchmark

Use this skill to run a fair benchmark that compares the same coding agent with recursive-mode off and recursive-mode on.

The benchmark should use the same project requirements, the same model family, and the same success criteria for both arms. The recursive-on arm should additionally start from a bootstrapped recursive-mode scaffold, a run-local 00-requirements.md, and a command-style prompt that explicitly tells the agent to use the bootstrapped recursive control-plane files as the recursive-mode skill before implementing the run end to end.

For fairness, the recursive-off arm should receive controller guidance only in the chat prompt, not as benchmark requirement, rubric, or prompt documents inside its repo or benchmark workspace.

Current maintained benchmark runners are Codex CLI, Kimi CLI, and OpenCode CLI. For OpenCode, prefer provider-qualified model ids and use the dedicated CLI binary rather than the desktop wrapper.

Primary Use Case

Use recursive-benchmark when the user wants to: