self-play
Pass
Audited by Gen Agent Trust Hub on Apr 28, 2026
Risk Level: SAFECOMMAND_EXECUTIONCREDENTIALS_UNSAFEPROMPT_INJECTION
Full Analysis
- [COMMAND_EXECUTION]: The skill instructs the agent to execute multiple local shell commands, including project builds (
make build), evaluation scripts via theuvtool (uv run python -m scripts.run_probes), and a project CLI (./panda schema). These actions are standard for the skill's purpose as a development automation tool. - [CREDENTIALS_UNSAFE]: The instructions specify that
OPENROUTER_API_KEYmust be set in the environment for the LLM-based evaluation logic. No hardcoded credentials or exfiltration patterns were detected; environment-based secret management is a standard practice. - [PROMPT_INJECTION]: The skill exhibits a surface for indirect prompt injection by ingesting untrusted data from probe definitions and test results to drive autonomous code modification and repository commits.
- Ingestion points: Reads data from
tests/eval/cases/probes.yamland result files intests/eval/probes/results/. - Boundary markers: No explicit boundary markers or 'ignore' instructions are provided for the ingested data.
- Capability inventory: The agent can write to files (
modules/clickhouse/examples.yaml,runbooks/*.md), execute shell commands (make build), and perform version control operations (git commit,git revert). - Sanitization: There is no documented validation or sanitization of the probe content before it is used to determine and write repository fixes.
Audit Metadata