self-correction-loop

Pass

Audited by Gen Agent Trust Hub on Mar 17, 2026

Risk Level: SAFEPROMPT_INJECTIONNO_CODE
Full Analysis
  • [PROMPT_INJECTION]: The skill creates a persistent instruction surface by capturing untrusted user feedback and storing it as authoritative logic in a MEMORY.md file. This allows for long-term behavioral modifications without security oversight.
  • Ingestion points: Captures user input phrases like 'remember this', 'don't do that again', or 'always/never do X' as instructions to modify future behavior.
  • Boundary markers: Absent. No logic exists to isolate or sanitize user-provided rules from the agent's core safety instructions.
  • Capability inventory: The skill relies on natural language instructions to perform file-writing and mandates a review of the stored rules at every session start, creating a persistent influence loop.
  • Sanitization: Absent. The generalization process explicitly encourages turning specific user comments into broad, project-wide rules without validation.
  • [NO_CODE]: The skill is composed entirely of instructional documentation and metadata. It contains no executable code, scripts, or binary dependencies, which reduces direct execution risk but focuses the security surface on instruction-based vulnerabilities.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 17, 2026, 05:20 AM
Security Audit — agent-trust-hub — self-correction-loop