skills/ianphil/my-skills/spec-tests/Gen Agent Trust Hub

spec-tests

Pass

Audited by Gen Agent Trust Hub on Mar 29, 2026

Risk Level: SAFECOMMAND_EXECUTIONDATA_EXFILTRATIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The skill's primary functionality relies on executing external CLI tools through shell commands.
  • Evidence: The Python scripts run_tests_claude.py, run_tests_opencode.py, and run_tests_codex.py use subprocess.run() to call claude, opencode, and codex respectively.
  • Evidence: The PowerShell script Invoke-SpecTests.ps1 executes the copilot CLI tool.
  • Risk: While the tools themselves are well-known, executing arbitrary commands from scripts provided within a skill carries inherent risks if the environment is not restricted.
  • [DATA_EXFILTRATION]: The skill reads local files and sends their contents to external LLM services for evaluation.
  • Evidence: The judge_prompt.md instructs the LLM to use a "Read tool" to access target files specified in the test frontmatter. The Python and PowerShell runners facilitate this by passing file paths or content to the LLM CLIs.
  • Risk: If a specification file points to sensitive local files (e.g., .env, SSH keys), those files will be read and their content processed by external LLM providers.
  • [PROMPT_INJECTION]: The skill uses complex system prompts and user-provided specifications to steer LLM behavior.
  • Evidence: judge_prompt.md contains strict behavioral directives such as "CRITICAL: You must respond with ONLY a JSON object" and "No other text... Do not wrap the JSON in backticks."
  • Risk: Maliciously crafted specifications could attempt to override these instructions to extract system prompts or bypass evaluation logic.
  • [INDIRECT_PROMPT_INJECTION]: The skill is vulnerable to indirect prompt injection from the files it evaluates.
  • Ingestion points: The target files specified in the YAML frontmatter of spec files (e.g., in specs/tests/authentication.md) are read and analyzed by the LLM judge.
  • Boundary markers: The judge_prompt.md uses BEGIN_ASSERTION and END_ASSERTION blocks to delimit test conditions, but the target file content itself is not strictly isolated or sanitized.
  • Capability inventory: The runners possess the capability to read any local file accessible to the user and perform network operations via the LLM CLI tools.
  • Sanitization: There is no evidence of sanitization or escaping of the target file content before it is processed by the LLM judge.
  • Risk: A target file could contain instructions designed to trick the LLM judge into reporting a "PASS" verdict regardless of actual implementation quality.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 29, 2026, 03:29 AM