The Agent Skills Directory

[COMMAND_EXECUTION]: The evaluation scripts (evals/phase2-grader.py and evals/integration-test.sh) utilize shell commands to manage virtual environments and install dependencies required for the skill's test suite.
[COMMAND_EXECUTION]: The evals/phase2-grader.py script uses os.execv to restart its execution within a newly created virtual environment after verifying the availability of the Anthropic SDK.
[COMMAND_EXECUTION]: The evaluation fixture evals/fixtures/agent.py contains a subprocess.run call with shell=True. This file is explicitly documented as a mock workspace component used to test the skill's ability to diagnose and remediate un-sandboxed execution risks in user code.
[COMMAND_EXECUTION]: The autonomous-improve-loop.mjs template executes shell commands for running user-defined evaluations and benchmarking suites via spawnSync. These commands are configured through environment variables to provide flexibility in various CI/CD environments.
[COMMAND_EXECUTION]: The level-3-sandbox-harness.py template utilizes docker run to execute user-defined shell commands within an isolated container, implementing a recommended security control for AI agents with system access.
[EXTERNAL_DOWNLOADS]: The skill's test automation suite downloads and installs the anthropic and dspy-ai packages from official registries to facilitate its internal evaluation and integration testing.
[DATA_EXFILTRATION]: The autonomous-improve-loop.mjs template transmits allowlisted workspace file contents and execution traces to the OpenAI API for the purpose of generating improvement patches. The script includes a redaction mechanism designed to filter out API keys and secrets before data transmission.

agent-evals