exhaustive-real-world-scenario-qa

Pass

Audited by Gen Agent Trust Hub on May 15, 2026

Risk Level: SAFECOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [PROMPT_INJECTION]: The skill processes untrusted external data from Notion specs and live web page content to generate and execute test cases, which introduces an indirect prompt injection surface.
  • Ingestion points: playwright-cli snapshot (ingests live DOM content) and mcp__claude_ai_Notion__notion-fetch (ingests external documentation).
  • Boundary markers: No specific boundary markers or 'ignore embedded instructions' warnings are used when interpolating external content into the test generation process.
  • Capability inventory: The skill utilizes playwright-cli (browser interaction, JavaScript evaluation), git, and gh (GitHub CLI).
  • Sanitization: No explicit sanitization or filtering of the ingested external content is mentioned before processing.
  • [COMMAND_EXECUTION]: The skill makes extensive use of CLI tools including git, gh, and playwright-cli for project analysis and browser automation. These are used within the documented scope of the skill.
  • [PROMPT_INJECTION]: The skill uses detailed role-play instructions to guide sub-agents (Spec Tester, Design Checker, Bug Hunter). These instructions are functional for task delegation and do not attempt to bypass platform safety guidelines.
  • [SAFE]: The skill correctly manages credentials by retrieving them from a memory tool (mcp__serena__read_memory) or prompting the user, rather than hardcoding sensitive information.
Audit Metadata
Risk Level
SAFE
Analyzed
May 15, 2026, 03:29 AM
Security Audit — agent-trust-hub — exhaustive-real-world-scenario-qa