instruction-eval
Pass
Audited by Gen Agent Trust Hub on May 15, 2026
Risk Level: SAFEEXTERNAL_DOWNLOADSCOMMAND_EXECUTION
Full Analysis
- [COMMAND_EXECUTION]: The script
scripts/eval.pyusessubprocess.runto interact with the local git environment to discover repository roots and fetch file contents at specific git references for A/B testing. - [EXTERNAL_DOWNLOADS]: The skill requires the
anthropicandpyyamlPython packages to interface with the LLM API and parse test configuration files. These are standard dependencies for this type of tool. - [COMMAND_EXECUTION]: The evaluation scripts execute LLM-as-judge probes which involve sending local skill content and test prompts to the Anthropic API to verify that instruction changes have not caused regressions or safety bypasses.
- [SAFE]: The
!git rev-parse --show-topleveland similar git commands used within the scripts are standard for developer tools and do not represent privilege escalation or malicious activity. - [SAFE]: The use of
os.environ.get("ANTHROPIC_API_KEY")is a standard and safe practice for managing API credentials through environment variables rather than hardcoding.
Audit Metadata