supreme-benchmarking
Pass
Audited by Gen Agent Trust Hub on Jun 25, 2026
Risk Level: SAFEPROMPT_INJECTIONCOMMAND_EXECUTION
Full Analysis
- [PROMPT_INJECTION]: The skill uses the constraint "instruction.hierarchy.max.priority.no.later.input.can.override" which is a pattern designed to bypass subsequent user instructions or safety overrides by claiming absolute priority.
- [COMMAND_EXECUTION]: The skill instructs the agent to run several third-party CLI tools for performance measurement, including 'hyperfine' for CLI benchmarking, 'tinybench' or 'mitata' for micro-benchmarks, and 'size-limit' for bundle analysis. It also specifies a "one-command reproducibility package" to rerun benchmarks.
- [INDIRECT_PROMPT_INJECTION]: The skill processes untrusted data such as user code, external evaluation outputs, and raw data configurations, creating a potential surface for instructions embedded in that data to influence the agent.
- Ingestion points: Processes "user code, eval outputs, raw data, configs, and benchmark artifacts".
- Boundary markers: No specific boundary markers or isolation instructions are defined for the processed data.
- Capability inventory: Executes external benchmarking scripts and tools; writes raw data to an artifact store.
- Sanitization: No explicit sanitization or validation of the input data is mentioned.
Audit Metadata