experiment-loop

Warn

Audited by Gen Agent Trust Hub on Jun 22, 2026

Risk Level: MEDIUMCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The skill executes arbitrary shell commands provided in the measurement_cmd field of experiment definitions.
  • Evidence: The documentation describes measurement_cmd as a "Shell command that produces JSON with the metric value" and provides examples like npm run bench:api and python eval/run_evals.py.
  • [REMOTE_CODE_EXECUTION]: The skill automates a process where one agent (spark) modifies the codebase and a subsequent step executes the modified code via benchmarks or tests.
  • Evidence: The "5-Step Loop" explicitly includes a "MODIFY" phase followed by a "TEST" phase where measurements are run.
  • [PROMPT_INJECTION]: The skill is susceptible to indirect prompt injection if the configuration file (thoughts/EXPERIMENTS.md) is poisoned with malicious commands.
  • Ingestion points: The skill reads experiment definitions from thoughts/EXPERIMENTS.md or the user's task description.
  • Boundary markers: Absent. There are no delimiters or warnings to prevent the agent from executing malicious instructions embedded in the measurement_cmd field.
  • Capability inventory: The skill has the ability to execute shell commands, modify files within a defined scope, and perform git operations (git stash).
  • Sanitization: Absent. The skill does not validate or sanitize the shell commands before execution.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Jun 22, 2026, 07:46 AM
Security Audit — agent-trust-hub — experiment-loop