eval-creator-ci

Warn

Audited by Gen Agent Trust Hub on Apr 24, 2026

Risk Level: MEDIUMCOMMAND_EXECUTIONREMOTE_CODE_EXECUTION
Full Analysis
  • [COMMAND_EXECUTION]: The skill defines a command-check verification method that executes shell commands and evaluates their exit codes or output. This provides a direct path for the agent to execute arbitrary code within the CI environment if the eval case definitions are modified or maliciously crafted.
  • [REMOTE_CODE_EXECUTION]: In its 'Create Evals' mode, the skill ingests data from upstream artifacts (specifically from learning-aggregator-ci) to dynamically generate new eval case files. These files can contain command-check definitions, effectively allowing untrusted or compromised upstream data to dictate shell commands that will be executed in subsequent CI runs.
  • [DATA_EXFILTRATION]: The execution logic for command-check involves reporting 'actual' command output to PR comments and check annotations. This behavior could be exploited to exfiltrate sensitive environment variables or file contents if an eval case is configured to output such data (e.g., executing printenv or cat .env).
  • [INDIRECT_PROMPT_INJECTION]: The skill has a significant attack surface for indirect injection:
  • Ingestion points: Reads promotion candidates from external GitHub Action artifacts and gap reports.
  • Boundary markers: None are specified to prevent the agent from interpreting instructions embedded within the pattern data as commands.
  • Capability inventory: Full shell access via command-check and the ability to commit new files to the repository.
  • Sanitization: The instructions do not define validation or sanitization steps for the 'verification methods' or 'commands' extracted from external data sources.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Apr 24, 2026, 03:31 AM