spec-tests
Pass
Audited by Gen Agent Trust Hub on Mar 29, 2026
Risk Level: SAFECOMMAND_EXECUTIONDATA_EXFILTRATIONPROMPT_INJECTION
Full Analysis
- [COMMAND_EXECUTION]: The skill's primary functionality relies on executing external CLI tools through shell commands.
- Evidence: The Python scripts
run_tests_claude.py,run_tests_opencode.py, andrun_tests_codex.pyusesubprocess.run()to callclaude,opencode, andcodexrespectively. - Evidence: The PowerShell script
Invoke-SpecTests.ps1executes thecopilotCLI tool. - Risk: While the tools themselves are well-known, executing arbitrary commands from scripts provided within a skill carries inherent risks if the environment is not restricted.
- [DATA_EXFILTRATION]: The skill reads local files and sends their contents to external LLM services for evaluation.
- Evidence: The
judge_prompt.mdinstructs the LLM to use a "Read tool" to access target files specified in the test frontmatter. The Python and PowerShell runners facilitate this by passing file paths or content to the LLM CLIs. - Risk: If a specification file points to sensitive local files (e.g.,
.env, SSH keys), those files will be read and their content processed by external LLM providers. - [PROMPT_INJECTION]: The skill uses complex system prompts and user-provided specifications to steer LLM behavior.
- Evidence:
judge_prompt.mdcontains strict behavioral directives such as "CRITICAL: You must respond with ONLY a JSON object" and "No other text... Do not wrap the JSON in backticks." - Risk: Maliciously crafted specifications could attempt to override these instructions to extract system prompts or bypass evaluation logic.
- [INDIRECT_PROMPT_INJECTION]: The skill is vulnerable to indirect prompt injection from the files it evaluates.
- Ingestion points: The target files specified in the YAML frontmatter of spec files (e.g., in
specs/tests/authentication.md) are read and analyzed by the LLM judge. - Boundary markers: The
judge_prompt.mdusesBEGIN_ASSERTIONandEND_ASSERTIONblocks to delimit test conditions, but the target file content itself is not strictly isolated or sanitized. - Capability inventory: The runners possess the capability to read any local file accessible to the user and perform network operations via the LLM CLI tools.
- Sanitization: There is no evidence of sanitization or escaping of the target file content before it is processed by the LLM judge.
- Risk: A target file could contain instructions designed to trick the LLM judge into reporting a "PASS" verdict regardless of actual implementation quality.
Audit Metadata