The Agent Skills Directory

[EXTERNAL_DOWNLOADS]: The skill requires the installation of the py-openjudge Python package from a public registry, which is the core framework used by the skill.
[PROMPT_INJECTION]: The skill demonstrates an indirect prompt injection surface by processing untrusted dataset content for evaluation.
Ingestion points: Untrusted data enters the context via dataset samples containing query, response, and context fields used in GradingRunner.arun() and various grader _aevaluate() methods.
Boundary markers: Prompt templates in graders.md and analyzer.md do not utilize delimiters or explicit instructions to ignore commands embedded within the evaluated content.
Capability inventory: The framework includes a CodeExecutionGrader for executing code and an AgenticGrader that performs tool calls via a ReAct agent.
Sanitization: No input validation or escaping mechanisms are described for the data being interpolated into LLM prompts.
[COMMAND_EXECUTION]: The CodeExecutionGrader class in graders.md is designed to execute code provided in the response field to calculate test case pass rates. While this is a primary function for code-evaluation tasks, executing untrusted code poses a risk to the host environment.

openjudge