agent-evals
Fail
Audited by Snyk on Jun 13, 2026
Risk Level: CRITICAL
Full Analysis
CRITICAL E006: Malicious code pattern detected in skill scripts.
- Malicious code pattern detected (high risk: 0.90). The repository includes an autonomous improvement controller that intentionally sends local trace/candidate data and allowlisted file contents to an external LLM endpoint (https://api.openai.com) and can apply model-generated unified diffs via git/apply and run shell commands, which creates clear data-exfiltration and remote-change/execution attack surfaces (redaction is partial), so the code can be abused to leak secrets or enable unauthorized code modifications.
MEDIUM W012: Unverifiable external dependency detected (runtime URL that controls agent).
- Potentially malicious external URL detected (high risk: 0.90). The autonomous improvement controller (references/templates/autonomous-improve-loop.mjs) performs a runtime POST to the OpenAI Responses endpoint at https://api.openai.com/v1/responses to generate unified-diff patches from failed trace/eval candidates, so external model output fetched from that URL directly controls prompts/patches the script may apply and is required for the controller to function.
Issues (2)
E006
CRITICALMalicious code pattern detected in skill scripts.
W012
MEDIUMUnverifiable external dependency detected (runtime URL that controls agent).
Audit Metadata