The Agent Skills Directory

[CREDENTIALS_UNSAFE]: The skill extracts Claude Code OAuth credentials from the macOS Keychain.
Evidence: scripts/utils.py contains the function read_macos_keychain_credentials which executes security find-generic-password -s "Claude Code-credentials" -w to retrieve session tokens.
Evidence: These credentials are then written to a plain-text .credentials.json file within the evaluation workspace in scripts/run_evals.py and scripts/run_triggers.py.
[DATA_EXFILTRATION]: The skill implements a policy where run artifacts are never deleted, leading to long-term exposure of sensitive data.
Evidence: SKILL.md states 'Artifacts are forever. Never delete, overwrite, or rotate run folders.' This means the extracted OAuth tokens remain in the ~/bmad-evals/ directory indefinitely in plain text.
[COMMAND_EXECUTION]: The skill makes extensive use of the subprocess module and PTYs to execute shell commands on the host and inside containers.
Evidence: scripts/pty_runner.py uses pty.openpty() to simulate an interactive terminal for claude, allowing it to capture output while bypasses certain interactive restrictions.
Evidence: Numerous subprocess.run and subprocess.Popen calls are used for Docker management, rsync operations, and CLI execution across scripts/run_evals.py, scripts/docker_setup.py, and scripts/utils.py.
[REMOTE_CODE_EXECUTION]: The skill dynamically generates shell scripts as strings and executes them inside Docker containers using bash -c.
Evidence: In scripts/run_evals.py, the container_script variable is a multi-line shell script that incorporates variables like SKILL_NAME and SKILL_SRC through string interpolation. If a tested skill has a malicious name or path, it could lead to command injection within the container context.
[PROMPT_INJECTION]: The skill acts as an intermediary, passing prompts from evals.json directly into an isolated Claude instance.
Evidence: This creates a surface for Indirect Prompt Injection (Category 8) where a malicious evaluation file could attempt to compromise the evaluator or the host environment if the isolation layers (Docker/Local) were bypassed.

bmad-eval-runner