The Agent Skills Directory

[COMMAND_EXECUTION]: The skill orchestrates benchmark cycles by invoking a local Python script (run-recursive-benchmark.py) to manage repository setup and execution of coding agent CLIs.
[UNVERIFIABLE_DEPENDENCIES_AND_REMOTE_CODE_EXECUTION]: Mentions the use of standard development stacks including React, Vite, and Rust/Trunk, which involve downloading dependencies from official registries during project initialization.
[INDIRECT_PROMPT_INJECTION]: Ingests project requirements from external files (00-requirements.md) as input for the benchmarked agent; while inherent to the benchmarking process, this creates an ingestion point for instructions that could influence agent behavior.
[DATA_EXPOSURE_AND_EXFILTRATION]: Captures and aggregates execution metadata, raw logs, and screenshot artifacts to generate local comparative reports on agent performance.

recursive-benchmark