recursive-benchmark
Pass
Audited by Gen Agent Trust Hub on Apr 19, 2026
Risk Level: SAFE
Full Analysis
- [COMMAND_EXECUTION]: The skill orchestrates benchmark cycles by invoking a local Python script (
run-recursive-benchmark.py) to manage repository setup and execution of coding agent CLIs. - [UNVERIFIABLE_DEPENDENCIES_AND_REMOTE_CODE_EXECUTION]: Mentions the use of standard development stacks including React, Vite, and Rust/Trunk, which involve downloading dependencies from official registries during project initialization.
- [INDIRECT_PROMPT_INJECTION]: Ingests project requirements from external files (
00-requirements.md) as input for the benchmarked agent; while inherent to the benchmarking process, this creates an ingestion point for instructions that could influence agent behavior. - [DATA_EXPOSURE_AND_EXFILTRATION]: Captures and aggregates execution metadata, raw logs, and screenshot artifacts to generate local comparative reports on agent performance.
Audit Metadata