The Agent Skills Directory

[COMMAND_EXECUTION]: The skill executes local repository scripts and allows the agent to run commands provided by the user.
Evidence: The instructions in SKILL.md direct the agent to run node scripts/run-benchmark.js and node scripts/update-benchmark-dashboard.js.
Context: This execution is part of the primary functional purpose of the skill as a benchmark assistant tool.
[SAFE]: The skill processes local data and handles benchmark cases with specific privacy and scope constraints.
Ingestion points: Reads from benchmark-responses.json, benchmarks/, and evals/ directories.
Capability: The skill limits its actions to generating prompts or asking for user responses if external credentials or unknown configurations are required.
Privacy: The instructions explicitly state not to include private raw conversations in benchmark cases and to preserve local drafts unless specifically asked to commit them.

benchmark-assistant