skill-system-debug

Fail

Audited by Gen Agent Trust Hub on Apr 11, 2026

Risk Level: HIGHCOMMAND_EXECUTIONDATA_EXFILTRATIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The bisect operation in scripts/debug_tool.py accepts a test command string which is split into arguments and passed directly to git bisect run. This utility executes the provided command repeatedly on different commits, providing a vector for arbitrary command execution within the agent's environment.
  • [DATA_EXFILTRATION]: The trace and compare operations in scripts/debug_tool.py provide unrestricted filesystem access. The trace function parses imports and reads the contents of referenced files, while the compare function recursively reads and diffs directory contents. Neither function implements path validation or sandboxing, allowing an attacker to read sensitive files like SSH keys, AWS credentials, or environment variables by providing absolute or relative paths outside the project scope.
  • [PROMPT_INJECTION]: The skill is susceptible to indirect prompt injection because it ingests untrusted data from the codebase and presents it to the agent without isolation.
  • Ingestion points: _run_grep_search, _run_behavior_search, and _parse_imports in scripts/debug_tool.py read content from files across the project directory.
  • Boundary markers: No delimiters or 'ignore' instructions are used to wrap the gathered content when returning it to the agent.
  • Capability inventory: The skill possesses high-privilege capabilities including command execution and broad filesystem reading.
  • Sanitization: Content read from the filesystem is directly incorporated into the JSON output (e.g., in the 'hypotheses' or 'key_diffs' fields) without any escaping or filtering of potentially malicious instructions embedded in the code or data files.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Apr 11, 2026, 12:55 AM