agent-evals

Pass

Audited by Gen Agent Trust Hub on Jun 13, 2026

Risk Level: SAFEPROMPT_INJECTIONCOMMAND_EXECUTION
Full Analysis
  • [PROMPT_INJECTION]: The skill processes untrusted workspace data (code, rule files, prompts) which serves as a surface for indirect prompt injection.
  • Ingestion points: The Step 1 Workspace Audit (SKILL.md) and the autonomous improvement loop (autonomous-improve-loop.mjs) read local workspace file contents into the agent's context.
  • Boundary markers: SKILL.md requires human review checkpoints and the use of held-out evaluation sets to mitigate the risk of malicious instructions influencing the optimization loop.
  • Capability inventory: Scaffolds like autonomous-improve-loop.mjs execute shell commands via spawnSync and apply code patches with git apply. The level-3-sandbox-harness.py template executes arbitrary shell commands within Docker containers via subprocess.run.
  • Sanitization: The autonomous-improve-loop.mjs template includes a redact function to filter credentials and secrets from data sent to external optimizers.
  • [COMMAND_EXECUTION]: Testing infrastructure and provided templates utilize shell commands for environment setup and isolated execution.
  • evals/phase2-grader.py uses subprocess.check_call and os.execv to bootstrap a Python virtual environment and install dependencies during testing.
  • references/templates/level-3-sandbox-harness.py uses subprocess.run to coordinate agent tasks within Docker containers, providing a sandbox for high-risk operations.
Audit Metadata
Risk Level
SAFE
Analyzed
Jun 13, 2026, 01:50 AM
Security Audit — agent-trust-hub — agent-evals