compare-agents

Pass

Audited by Gen Agent Trust Hub on Apr 23, 2026

Risk Level: SAFECOMMAND_EXECUTIONEXTERNAL_DOWNLOADSDATA_EXFILTRATIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The skill generates Python or TypeScript scripts based on provided job patterns and instructs the user to run them locally (e.g., python evaluate.py) to execute the agent comparisons.
  • [EXTERNAL_DOWNLOADS]: Core functionality depends on installing the evaluatorq and orq-ai-sdk packages from public registries (PyPI and NPM). These are official resources associated with the skill's vendor.
  • [DATA_EXFILTRATION]: When the ORQ_API_KEY environment variable is present, the skill automatically transmits agent performance data, query responses, and evaluation scores to the orq.ai platform's Experiment UI.
  • [PROMPT_INJECTION]: The skill processes external data points which are used as inputs for testing agents. While this introduces an indirect prompt injection surface, the risk is mitigated by instructions to use synthetic datasets and structured evaluation templates.
Audit Metadata
Risk Level
SAFE
Analyzed
Apr 23, 2026, 11:20 AM