The Agent Skills Directory

[COMMAND_EXECUTION]: The script scripts/eval.py uses subprocess.run to interact with the local git environment to discover repository roots and fetch file contents at specific git references for A/B testing.
[EXTERNAL_DOWNLOADS]: The skill requires the anthropic and pyyaml Python packages to interface with the LLM API and parse test configuration files. These are standard dependencies for this type of tool.
[COMMAND_EXECUTION]: The evaluation scripts execute LLM-as-judge probes which involve sending local skill content and test prompts to the Anthropic API to verify that instruction changes have not caused regressions or safety bypasses.
[SAFE]: The !git rev-parse --show-toplevel and similar git commands used within the scripts are standard for developer tools and do not represent privilege escalation or malicious activity.
[SAFE]: The use of os.environ.get("ANTHROPIC_API_KEY") is a standard and safe practice for managing API credentials through environment variables rather than hardcoding.

instruction-eval