skill-creator

Pass

Audited by Gen Agent Trust Hub on Apr 24, 2026

Risk Level: SAFECOMMAND_EXECUTIONEXTERNAL_DOWNLOADSPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The skill extensively uses the Python subprocess module in scripts like run_eval.py, improve_description.py, and run_loop.py. These scripts execute the claude CLI and other internal Python utilities to automate skill testing and configuration. These operations are intended to manage the skill development lifecycle.
  • [EXTERNAL_DOWNLOADS]: The evaluation viewer component (eval-viewer/viewer.html) loads the SheetJS library (xlsx.full.min.js) and Poppins fonts from well-known content delivery networks (CDNs) to facilitate spreadsheet rendering and UI styling.
  • [DATA_EXFILTRATION]: The eval-viewer/generate_review.py script starts a local HTTP server on port 3117 using the http.server module. This server hosts the evaluation viewer UI on the local loopback interface for user review of test cases.
  • [PROMPT_INJECTION]: As a developer tool for creating other skills, this skill processes external data including user-defined test prompts and output from subagent runs, which could serve as a surface for indirect prompt injection.
  • Ingestion points: Test prompts from evals/evals.json and execution results from the outputs/ directory are read and processed by grader and analyzer agents.
  • Boundary markers: The skill uses markdown headers and backticks to delineate instructions from data, but does not implement explicit 'ignore instructions' safety markers for all interpolated content.
  • Capability inventory: Capabilities include shell command execution, filesystem write access for workspace management, and local network server hosting.
  • Sanitization: The skill relies on standard model behavior and does not perform explicit escaping or sanitization of potential instructions embedded within the processed evaluation data.
Audit Metadata
Risk Level
SAFE
Analyzed
Apr 24, 2026, 02:14 AM