skill-creator

Pass

Audited by Gen Agent Trust Hub on Mar 24, 2026

Risk Level: SAFE
Full Analysis
  • [COMMAND_EXECUTION]: The skill includes several Python scripts (scripts/run_eval.py, scripts/improve_description.py, scripts/package_skill.py) that utilize subprocess.run and subprocess.Popen. These are used to interact with the system's official CLI tool (claude) to test skill triggering and to perform packaging operations. These actions are legitimate and necessary for the skill's purpose as an evaluation and development toolkit.
  • [EXTERNAL_DOWNLOADS]: The human review interface (eval-viewer/viewer.html) references a well-known library for spreadsheet processing (SheetJS) hosted on a public Content Delivery Network (CDN). This is used for the legitimate purpose of rendering .xlsx files within the local viewer and does not constitute a security risk.
  • [DATA_EXFILTRATION]: The eval-viewer/generate_review.py script starts a local web server (binding to 127.0.0.1) to display evaluation results. This local server is used strictly for human-in-the-loop review of the task outputs generated during testing and does not transmit data to external third-party servers.
  • [REMOTE_CODE_EXECUTION]: The skill facilitates testing by spawning subagents to execute task prompts defined in evals/evals.json. This is the primary intended behavior for a benchmarking tool, and the execution is confined within the platform's standard subagent safety boundaries.
  • [PROMPT_INJECTION]: The skill includes guidance on making skill descriptions "pushy" to improve triggering accuracy. This is a documented technique for managing LLM behavior and does not involve bypassing safety filters or overriding core ethical constraints.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 24, 2026, 05:19 AM