iterate-ml-experiment
Pass
Audited by Gen Agent Trust Hub on May 28, 2026
Risk Level: SAFECOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONDATA_EXFILTRATIONEXTERNAL_DOWNLOADSPROMPT_INJECTION
Full Analysis
- [COMMAND_EXECUTION]: The skill executes shell commands using 'pixi run' to launch Python experiments and scratch scripts. It also uses the 'gh' command-line tool to post comments to GitHub issues.\n- [REMOTE_CODE_EXECUTION]: The skill generates Python scripts within the 'experiments/' and 'scratch/' directories based on project context, sibling skill outputs, and external sources. These scripts are subsequently executed via shell calls. This behavior is mitigated by explicit user approval gates (G-DESIGN for the design note and G-RUN for the execution).\n- [DATA_EXFILTRATION]: The skill can send experiment outcomes to external platforms via GitHub issue comments. This outbound communication requires structured user consent via an 'AskUserQuestion' tool.\n- [EXTERNAL_DOWNLOADS]: Instructions specify fetching information from user-provided URLs (scientific articles) and GitHub resources. This data is used to synthesize experiment proposals but is subject to user review before any resulting code is generated or run.\n- [PROMPT_INJECTION]: The instructions use imperative language and 'Stop conditions' to enforce a strict operational sequence (e.g., 'non-negotiable first emit', 'STOP and write the design note'). These are internal workflow constraints designed to maintain project integrity and do not represent attempts to bypass underlying model safety filters.
Audit Metadata