benchmark-skill

Pass

Audited by Gen Agent Trust Hub on Mar 28, 2026

Risk Level: SAFECOMMAND_EXECUTION
Full Analysis
  • [COMMAND_EXECUTION]: The skill uses the Bash tool to execute a local Python script (aggregate_benchmark.py) located in a sibling directory relative to the skill root. This is a standard practice for repository-based tooling to separate logic from configuration.
  • [PROMPT_INJECTION]: The skill processes benchmark data from the .skill-eval/ workspace, which acts as an ingestion surface for potentially untrusted data (Indirect Prompt Injection surface).
  • Ingestion points: Reads and displays content from the .skill-eval/ directory and benchmark.md.
  • Boundary markers: No explicit boundary markers or 'ignore' instructions are used when displaying the generated summary.
  • Capability inventory: The skill has Bash and Read capabilities, which are used to process and report on local files.
  • Sanitization: The skill relies on the agent's path resolution logic for the $ARGUMENTS variable rather than directly interpolating raw user input into a shell command.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 28, 2026, 10:15 PM