recursive-benchmark

Pass

Audited by Gen Agent Trust Hub on Apr 19, 2026

Risk Level: SAFE
Full Analysis
  • [COMMAND_EXECUTION]: The skill orchestrates benchmark cycles by invoking a local Python script (run-recursive-benchmark.py) to manage repository setup and execution of coding agent CLIs.
  • [UNVERIFIABLE_DEPENDENCIES_AND_REMOTE_CODE_EXECUTION]: Mentions the use of standard development stacks including React, Vite, and Rust/Trunk, which involve downloading dependencies from official registries during project initialization.
  • [INDIRECT_PROMPT_INJECTION]: Ingests project requirements from external files (00-requirements.md) as input for the benchmarked agent; while inherent to the benchmarking process, this creates an ingestion point for instructions that could influence agent behavior.
  • [DATA_EXPOSURE_AND_EXFILTRATION]: Captures and aggregates execution metadata, raw logs, and screenshot artifacts to generate local comparative reports on agent performance.
Audit Metadata
Risk Level
SAFE
Analyzed
Apr 19, 2026, 04:30 AM
Security Audit — agent-trust-hub — recursive-benchmark