The Agent Skills Directory

[COMMAND_EXECUTION]: The skill executes arbitrary shell commands provided in the measurement_cmd field of experiment definitions.
Evidence: The documentation describes measurement_cmd as a "Shell command that produces JSON with the metric value" and provides examples like npm run bench:api and python eval/run_evals.py.
[REMOTE_CODE_EXECUTION]: The skill automates a process where one agent (spark) modifies the codebase and a subsequent step executes the modified code via benchmarks or tests.
Evidence: The "5-Step Loop" explicitly includes a "MODIFY" phase followed by a "TEST" phase where measurements are run.
[PROMPT_INJECTION]: The skill is susceptible to indirect prompt injection if the configuration file (thoughts/EXPERIMENTS.md) is poisoned with malicious commands.
Ingestion points: The skill reads experiment definitions from thoughts/EXPERIMENTS.md or the user's task description.
Boundary markers: Absent. There are no delimiters or warnings to prevent the agent from executing malicious instructions embedded in the measurement_cmd field.
Capability inventory: The skill has the ability to execute shell commands, modify files within a defined scope, and perform git operations (git stash).
Sanitization: Absent. The skill does not validate or sanitize the shell commands before execution.

experiment-loop