The Agent Skills Directory

[COMMAND_EXECUTION]: The skill utilizes several Python scripts (run_eval.py, eval_compare.py, optimize_description.py) that invoke external CLI tools via subprocess.run(). Specifically, it calls the claude CLI to run benchmarks and the go toolchain to validate generated code. These calls are essential for the skill's primary function of measuring and verifying skill performance. The scripts use list-based arguments rather than shell strings, which significantly mitigates the risk of shell injection.
[DYNAMIC_EXECUTION]: The eval_compare.py script executes compiler and linter checks (go build, go test, go vet) on code produced during evaluation runs. While this involves running dynamically generated content, it is restricted to a local workspace and is the intended behavior for a software development evaluation tool.
[INDIRECT_PROMPT_INJECTION]: The skill's core workflow involves ingesting untrusted 'test prompts' or 'eval queries' and passing them to an LLM via the claude CLI. This creates a surface for indirect prompt injection; however, the skill is explicitly designed for testing and measurement, and its instructions include safety-conscious patterns such as warning against hardcoded secrets and encouraging the use of gates to verify outcomes.

skill-creator