self-improvement
Warn
Audited by Gen Agent Trust Hub on Apr 8, 2026
Risk Level: MEDIUMCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
- [PROMPT_INJECTION]: The skill is highly vulnerable to indirect prompt injection. It is instructed to ingest data from external, potentially untrusted sources and 'promote' derived insights into instructions that govern the agent's behavior.
- Ingestion points: Processes content from user corrections, command error messages, and GitHub PR/Issue data (via
ghCLI). - Boundary markers: The skill lacks any instruction to sanitize or wrap the derived insights in protective delimiters, increasing the risk that injected instructions will be followed by the agent in future sessions.
- Capability inventory: The skill performs file writes to the project root (
CLAUDE.md) and the user's global configuration directory (~/.claude/CLAUDE.md). - Sanitization: The instructions rely on the LLM to 'distill the insight', which does not protect against adversarial data designed to look like a legitimate learning but containing hidden instructions.
- [COMMAND_EXECUTION]: The skill uses the
ghCLI to fetch data from remote GitHub repositories (gh pr list,gh pr view). While these are standard tools, the retrieved content is used as the primary source for persistent behavioral changes. - [PERSISTENCE_MECHANISM]: The skill is designed to create permanent modifications to
CLAUDE.mdfiles. These files are typically loaded at the start of every session. The instruction to 'Act, don't ask' and 'promote immediately' specifically removes the user-in-the-loop safety check, allowing a malicious injection to become a persistent part of the agent's global identity without the user noticing. - [DATA_EXPOSURE]: The skill targets
~/.claude/CLAUDE.mdfor modification. Accessing the user home directory (~/) is a sensitive operation, as it allows the skill to influence the agent's behavior globally across all projects on the user's machine, rather than being restricted to the current workspace.
Audit Metadata