anth-policy-guardrails
Pass
Audited by Gen Agent Trust Hub on May 19, 2026
Risk Level: SAFE
Full Analysis
- [SAFE]: The skill implements best-practice security patterns for AI integrations, including regex-based input validation for PII (SSN and credit cards) and output filtering for sensitive credentials such as API keys and private keys.
- [SAFE]: The system prompt template explicitly includes instructions to resist system prompt extraction and ignore override attempts.
- [SAFE]: All external references are to official and well-known documentation from Anthropic.
- [SAFE]: No suspicious command execution, network exfiltration, or obfuscation patterns were detected.
Audit Metadata