The Agent Skills Directory

[PROMPT_INJECTION]: The agent uses a reflection loop to update its core system_prompt based on user-provided feedback. This architecture is vulnerable to indirect prompt injection, where an attacker could provide feedback designed to trick the reflection model into incorporating malicious instructions or safety bypasses into the agent's persistent state.\n- Ingestion points: Untrusted data enters the agent context via the user_msg and feedback parameters in the reflect_and_update method in SKILL.md.\n- Boundary markers: The reflection prompt uses simple text headers (USER:, AGENT:, Feedback:) which provide minimal protection against instructions embedded within the feedback.\n- Capability inventory: The agent has the capability to perform LLM completions via the anthropic client and write its updated state and instructions to the local filesystem via agent_config.json.\n- Sanitization: There is no explicit sanitization or rule-based validation of the proposed instruction changes; the system relies entirely on the second LLM call to safely 'merge' and filter the updates.

hermes-agent