evaluator

Pass

Audited by Gen Agent Trust Hub on Mar 29, 2026

Risk Level: SAFECOMMAND_EXECUTION
Full Analysis
  • [COMMAND_EXECUTION]: The skill utilizes local shell commands to manage development servers and perform browser-based verification.
  • Evidence: Executes startup commands such as npm run dev, yarn dev, pnpm dev, and bun dev to ensure the target application is reachable.
  • Evidence: Uses curl to perform health checks on the local service at http://localhost:3000.
  • Evidence: Employs playwright-cli to perform automated browser actions including open, snapshot, screenshot, click, fill, and console logging.
  • [PROMPT_INJECTION]: The skill processes a 'Sprint Contract' that defines the test plan and interaction parameters, which represents a surface for indirect instructions.
  • Ingestion points: Data is ingested from the 'Sprint Contract' block described in SKILL.md to dictate navigation and form interactions.
  • Boundary markers: Absent; the skill does not use specific delimiters to isolate the ingested contract instructions from the agent's core protocol.
  • Capability inventory: High-privilege browser automation (clicks, form fills, navigation) and shell command execution.
  • Sanitization: Input values from the contract are used directly for automation without explicit validation or escaping within the skill's logic.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 29, 2026, 02:51 PM
Security Audit — agent-trust-hub — evaluator