clarify-first

Pass

Audited by Gen Agent Trust Hub on Mar 16, 2026

Risk Level: SAFEPROMPT_INJECTIONDATA_EXFILTRATIONCOMMAND_EXECUTIONEXTERNAL_DOWNLOADS
Full Analysis
  • [PROMPT_INJECTION]: The skill includes instructions for the agent to treat its protocols as a TERMINAL_GUARDRAIL and prioritize them over conflicting instructions like "execute silently" or "auto-fix all". This is a defensive override intended to prevent the agent from bypassing safety checks when encountering ambiguity.
  • [DATA_EXFILTRATION]: The instructions mandate the redaction of sensitive credentials, including API keys, tokens, and personal information, using masking patterns like sk-*** to prevent their exposure in logs or user responses.
  • [COMMAND_EXECUTION]: The skill establishes a rigorous workflow for high-impact commands such as file deletion, migrations, or production deployments. It requires the creation of an execution plan, rollback strategy, and explicit user confirmation before any destructive actions are taken.
  • [EXTERNAL_DOWNLOADS]: The agent is instructed to read standard project manifest files (e.g., package.json, requirements.txt) to identify dependencies and the technical stack. This is documented as a neutral contextual step for project analysis.
  • [SAFE]: Indirect Prompt Injection surface evaluation (Category 8): 1. Ingestion points: Project manifest files are processed to infer context. 2. Boundary markers: The skill uses structured alignment snapshots and risk headers to separate context analysis from execution. 3. Capability inventory: The agent has capabilities for file-write, deletion, and deployment. 4. Sanitization: Mandatory redaction of secrets and PII is enforced to protect credentials.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 16, 2026, 01:46 AM