hermes-agent
Pass
Audited by Gen Agent Trust Hub on Apr 24, 2026
Risk Level: SAFEPROMPT_INJECTION
Full Analysis
- [PROMPT_INJECTION]: The agent uses a reflection loop to update its core
system_promptbased on user-provided feedback. This architecture is vulnerable to indirect prompt injection, where an attacker could provide feedback designed to trick the reflection model into incorporating malicious instructions or safety bypasses into the agent's persistent state.\n- Ingestion points: Untrusted data enters the agent context via theuser_msgandfeedbackparameters in thereflect_and_updatemethod inSKILL.md.\n- Boundary markers: The reflection prompt uses simple text headers (USER:, AGENT:, Feedback:) which provide minimal protection against instructions embedded within the feedback.\n- Capability inventory: The agent has the capability to perform LLM completions via theanthropicclient and write its updated state and instructions to the local filesystem viaagent_config.json.\n- Sanitization: There is no explicit sanitization or rule-based validation of the proposed instruction changes; the system relies entirely on the second LLM call to safely 'merge' and filter the updates.
Audit Metadata