dspy-evaluate
Pass
Audited by Gen Agent Trust Hub on Mar 22, 2026
Risk Level: SAFEPROMPT_INJECTIONCOMMAND_EXECUTION
Full Analysis
- [PROMPT_INJECTION]: The skill demonstrates LM-as-judge patterns in
examples.md(e.g., lines 88, 140, 149) that interpolate potentially untrusted data into evaluation prompts, creating a surface for indirect prompt injection attacks. \n - Ingestion points: Fields such as
predicted_explanation,predicted_answer, andanswer(from prediction) are fed directly intodspy.Signatureclasses. \n - Boundary markers: Absent. The signatures do not use delimiters (like triple quotes) or specific isolation instructions to distinguish untrusted content from the judge's instructions. \n
- Capability inventory: The skill performs automated scoring and includes a CLI script for execution;
scripts/run_eval.pycan load program states from local files. \n - Sanitization: Absent. No evidence of input filtering or escaping is demonstrated in the provided examples. \n- [COMMAND_EXECUTION]: The
scripts/run_eval.pyutility restores program state from a file path provided via command-line arguments usingdspy.Module().load(). While standard for the framework, loading serialized model states carries a risk of arbitrary code execution if the source file is malicious or if the underlying framework uses unsafe deserialization methods.
Audit Metadata