benchmark-models

Pass

Audited by Gen Agent Trust Hub on May 20, 2026

Risk Level: SAFE
Full Analysis
  • [DATA_EXPOSURE]: The skill checks for the presence of API keys in ~/.claude/.credentials.json and environment variables to determine which model adapters are available for benchmarking.
  • [DYNAMIC_EXECUTION]: Shell environment configuration is dynamically handled by executing output from local binaries using eval (e.g., gstack-slug) and source (e.g., gstack-repo-mode).
  • [COMMAND_EXECUTION]: Multiple local binaries located in the ~/.claude/skills/gstack/bin/ directory are invoked to manage skill updates, configuration, telemetry logging, and the core benchmarking engine.
  • [DATA_EXFILTRATION]: Includes opt-in mechanisms to synchronize project artifacts with a private GitHub repository and to send usage telemetry to the vendor's infrastructure. These features are inactive unless the user explicitly consents during the interactive setup flow.
  • [PROMPT_INJECTION]: The skill possesses an indirect prompt injection surface as it ingests user-provided prompts or content from local files to be processed by the models being benchmarked. It utilizes interactive checkpoints to confirm prompt sources.
  • [COMMAND_EXECUTION]: Automatically modifies the local CLAUDE.md file (if the user opts in) to add skill routing rules, which involves standard Git operations (git add, git commit).
Audit Metadata
Risk Level
SAFE
Analyzed
May 20, 2026, 05:21 PM
Security Audit — agent-trust-hub — benchmark-models