result-to-claim

Pass

Audited by Gen Agent Trust Hub on May 12, 2026

Risk Level: SAFECOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The skill instructs the agent to execute commands on remote servers using SSH (e.g., ssh server "tail -100 /path/to/training.log") to collect experiment logs. This provides a mechanism for remote execution and potential lateral movement if SSH access is configured in the environment.
  • [COMMAND_EXECUTION]: The workflow relies on executing local scripts such as tools/research_wiki.py and tools/save_trace.sh. These scripts are external to the skill definition, and their behavior cannot be verified within this scope.
  • [PROMPT_INJECTION]: The skill is vulnerable to indirect prompt injection. It ingests data from external sources (W&B API, remote logs, local project files) and interpolates it directly into a prompt for the mcp__codex__codex tool. An attacker with control over experiment logs or metadata could embed malicious instructions to manipulate the model's judgment.
  • Ingestion points: Data is gathered from EXPERIMENT_LOG.md, EXPERIMENT_TRACKER.md, docs/research_contract.md, remote logs via SSH, and the Weights & Biases API.
  • Boundary markers: Minimal use of headers (e.g., Results:, Baselines:) without explicit instructions to ignore embedded commands or clear delimiters.
  • Capability inventory: Access to Bash(*), Write, Edit, and MCP tools (mcp__codex__codex).
  • Sanitization: The skill does not perform any validation, escaping, or filtering of the ingested data before prompt construction.
Audit Metadata
Risk Level
SAFE
Analyzed
May 12, 2026, 12:59 AM