The Agent Skills Directory

[COMMAND_EXECUTION]: The skill instructs the user on how to author and execute bash or TypeScript scripts (e.g., bash graders/check.sh) to perform deterministic scoring. This is the core intended functionality of the evaluation framework.- [CREDENTIALS_UNSAFE]: The error handling section mentions the need to set GEMINI_API_KEY or ANTHROPIC_API_KEY in the environment to resolve LLM-related failures. There are no hardcoded secrets or evidence of credential exfiltration.- [SAFE]: The skill uses structured JSON schemas for outputs and provides validation procedures, which are standard practices for development and evaluation tasks.

skillgrade-graders