context-eval

Warn

Audited by Gen Agent Trust Hub on Apr 2, 2026

Risk Level: MEDIUMCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The script optimize_harness.py uses subprocess.run() to execute a command string provided via the --llm-cmd argument. This allows for the execution of arbitrary shell commands within the host environment, depending on the arguments supplied by the agent or user.
  • [PROMPT_INJECTION]: The skill is designed to process external and potentially untrusted data known as "context harnesses." These inputs are interpolated directly into prompts (e.g., in optimize_harness.py and when spawning subagents like those defined in comparator.md and analyzer.md) without sufficient boundary markers or sanitization, creating a surface for indirect prompt injection.
  • Ingestion points: optimize_harness.py (reads harness content), generate_viewer.py (reads and embeds workspace files).
  • Boundary markers: Absent; harness content and eval definitions are concatenated directly into the LLM prompts.
  • Capability inventory: subprocess.run in optimize_harness.py, file system read/write operations across multiple scripts, and local web server execution in generate_viewer.py.
  • Sanitization: No escaping or validation is performed on the content of the harness files before they are processed or sent to LLMs.
  • [COMMAND_EXECUTION]: The generate_viewer.py script starts a local web server (http.server.HTTPServer) and uses webbrowser.open() to launch the system browser. This increases the local attack surface and could be leveraged if an attacker can influence the files being served or the URL being opened.
  • [DATA_EXPOSURE]: generate_viewer.py reads files from the workspace, converts images to base64, and embeds them into a self-contained HTML file. If pointed at a directory containing sensitive information, it would expose that data in the generated viewer.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Apr 2, 2026, 03:00 PM