evaluate-skill

Pass

Audited by Gen Agent Trust Hub on Jun 28, 2026

Risk Level: SAFECOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONEXTERNAL_DOWNLOADS
Full Analysis
  • [EXTERNAL_DOWNLOADS]: The skill instructs the user to install the caliper-eval package via pipx and references a third-party Homebrew tap (steipete/tap/summarize) for the summarize tool used in evaluation examples.\n- [REMOTE_CODE_EXECUTION]: The Caliper evaluation framework executes Python code snippets provided in the assert field of .eval.yaml files. This dynamic execution is used to deterministically verify task outcomes, such as checking if a file exists or matches a pattern.\n- [COMMAND_EXECUTION]: The skill uses shell commands to run evaluations, manage files, and execute helper scripts. This includes a reference to a screenshot helper that uses powershell -ExecutionPolicy Bypass on Windows to capture desktop images.\n- [DATA_EXPOSURE]: The summarize reference skill provides instructions for configuring API keys for various LLM providers (OpenAI, Anthropic, Google, xAI) via environment variables or a local config file (~/.summarize/config.json).
Audit Metadata
Risk Level
SAFE
Analyzed
Jun 28, 2026, 08:34 PM
Security Audit — agent-trust-hub — evaluate-skill