memory-harness

Warn

Audited by Gen Agent Trust Hub on Apr 20, 2026

Risk Level: MEDIUMCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONDATA_EXFILTRATION
Full Analysis
  • [COMMAND_EXECUTION]: In SKILL.md, the instructions for recording episodes interpolate shell variables like $TOOL_NAME and $OBSERVATION directly into Python string literals within a python3 -c call. This allows for Python code injection if those variables contain single quotes (e.g., via a malicious tool output or file content).\n- [COMMAND_EXECUTION]: The replay-trace.sh script parses trace data and pipes it to a shell loop using tab or pipe delimiters. If recorded observations contain these delimiters, it can lead to misparsing of command arguments when calling do-memory-cli.\n- [REMOTE_CODE_EXECUTION]: The skill employs dynamic execution patterns by generating and executing Python code at runtime based on data retrieved from execution traces and the local environment.\n- [DATA_EXFILTRATION]: The skill records detailed agent interactions, tool outputs, and project metadata (using git rev-parse) to the .memory-traces/ directory. This creates a risk of exposing sensitive internal information if these traces are inadvertently shared, committed to version control, or used in shared CI/CD environments.\n- [PROMPT_INJECTION]: The skill is susceptible to indirect prompt injection (Category 8):\n
  • Ingestion points: The skill reads historical session data from JSON files in the .memory-traces/ directory during replay and evaluation modes.\n
  • Boundary markers: None; trace data is processed as structured input but without markers to prevent the agent from obeying instructions embedded in past observations.\n
  • Capability inventory: The skill has the ability to execute shell commands, generate Python code, and create/modify files in the local filesystem.\n
  • Sanitization: Sanitization is inconsistent; while replay-trace.sh attempts to strip single quotes, the primary recording logic in SKILL.md does not, allowing potentially malicious instructions to be stored and later re-processed.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Apr 20, 2026, 07:01 PM