openjudge
Pass
Audited by Gen Agent Trust Hub on Mar 21, 2026
Risk Level: SAFEEXTERNAL_DOWNLOADSPROMPT_INJECTIONCOMMAND_EXECUTION
Full Analysis
- [EXTERNAL_DOWNLOADS]: The skill requires the installation of the
py-openjudgePython package from a public registry, which is the core framework used by the skill. - [PROMPT_INJECTION]: The skill demonstrates an indirect prompt injection surface by processing untrusted dataset content for evaluation.
- Ingestion points: Untrusted data enters the context via
datasetsamples containingquery,response, andcontextfields used inGradingRunner.arun()and various grader_aevaluate()methods. - Boundary markers: Prompt templates in
graders.mdandanalyzer.mddo not utilize delimiters or explicit instructions to ignore commands embedded within the evaluated content. - Capability inventory: The framework includes a
CodeExecutionGraderfor executing code and anAgenticGraderthat performs tool calls via a ReAct agent. - Sanitization: No input validation or escaping mechanisms are described for the data being interpolated into LLM prompts.
- [COMMAND_EXECUTION]: The
CodeExecutionGraderclass ingraders.mdis designed to execute code provided in theresponsefield to calculate test case pass rates. While this is a primary function for code-evaluation tasks, executing untrusted code poses a risk to the host environment.
Audit Metadata