skills/modelscope/agentjet/openjudge/Gen Agent Trust Hub

openjudge

Pass

Audited by Gen Agent Trust Hub on Mar 21, 2026

Risk Level: SAFEEXTERNAL_DOWNLOADSPROMPT_INJECTIONCOMMAND_EXECUTION
Full Analysis
  • [EXTERNAL_DOWNLOADS]: The skill requires the installation of the py-openjudge Python package from a public registry, which is the core framework used by the skill.
  • [PROMPT_INJECTION]: The skill demonstrates an indirect prompt injection surface by processing untrusted dataset content for evaluation.
  • Ingestion points: Untrusted data enters the context via dataset samples containing query, response, and context fields used in GradingRunner.arun() and various grader _aevaluate() methods.
  • Boundary markers: Prompt templates in graders.md and analyzer.md do not utilize delimiters or explicit instructions to ignore commands embedded within the evaluated content.
  • Capability inventory: The framework includes a CodeExecutionGrader for executing code and an AgenticGrader that performs tool calls via a ReAct agent.
  • Sanitization: No input validation or escaping mechanisms are described for the data being interpolated into LLM prompts.
  • [COMMAND_EXECUTION]: The CodeExecutionGrader class in graders.md is designed to execute code provided in the response field to calculate test case pass rates. While this is a primary function for code-evaluation tasks, executing untrusted code poses a risk to the host environment.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 21, 2026, 01:46 AM