skills/jonmumm/skills/evals-first/Gen Agent Trust Hub

evals-first

Warn

Audited by Gen Agent Trust Hub on Mar 29, 2026

Risk Level: MEDIUMCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The skill provides instructions and templates for the agent to execute shell commands within pre-commit hooks and iterative optimization loops. Specifically, the 'Metric-driven eval loop' pattern (Phase 3) directs the agent to repeatedly execute model commands and a user-defined measurement script (e.g., 'run_eval_metric') based on automated feedback.
  • [REMOTE_CODE_EXECUTION]: The 'LLM Judges' and 'Signal-Based Orchestration' sections describe workflows involving sub-agents and CLI tool invocations ('claude', 'codex') to process code and prompts. This introduces a path for arbitrary code execution if the evaluation logic or inputs are manipulated.
  • [PROMPT_INJECTION]: The skill is vulnerable to indirect prompt injection. It guides the agent to process untrusted 'reference materials' (Phase 1 and 6) such as external guides or papers to derive evaluation criteria. These criteria then serve as instructions that 'condition' the agent's generation of specifications and code, allowing malicious external data to influence the agent's core output.
  • Ingestion points: Phase 1, Step 2 ('Collect reference materials') and Phase 6 ('Converting Reference Materials into Evals') in SKILL.md.
  • Boundary markers: None identified for isolating untrusted reference data from instructions.
  • Capability inventory: Includes shell command execution (Phase 3) and file system modification (Phase 4/5) in SKILL.md.
  • Sanitization: No validation or filtering of external content is mentioned before it is used for agent conditioning.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Mar 29, 2026, 06:00 AM