benchmark-skill
Pass
Audited by Gen Agent Trust Hub on Mar 28, 2026
Risk Level: SAFECOMMAND_EXECUTION
Full Analysis
- [COMMAND_EXECUTION]: The skill uses the
Bashtool to execute a local Python script (aggregate_benchmark.py) located in a sibling directory relative to the skill root. This is a standard practice for repository-based tooling to separate logic from configuration. - [PROMPT_INJECTION]: The skill processes benchmark data from the
.skill-eval/workspace, which acts as an ingestion surface for potentially untrusted data (Indirect Prompt Injection surface). - Ingestion points: Reads and displays content from the
.skill-eval/directory andbenchmark.md. - Boundary markers: No explicit boundary markers or 'ignore' instructions are used when displaying the generated summary.
- Capability inventory: The skill has
BashandReadcapabilities, which are used to process and report on local files. - Sanitization: The skill relies on the agent's path resolution logic for the
$ARGUMENTSvariable rather than directly interpolating raw user input into a shell command.
Audit Metadata