eval-driven-dev

Pass

Audited by Gen Agent Trust Hub on Apr 22, 2026

Risk Level: SAFEPROMPT_INJECTIONCOMMAND_EXECUTIONEXTERNAL_DOWNLOADSDATA_EXFILTRATION
Full Analysis
  • [PROMPT_INJECTION]: The skill includes instructions that explicitly direct the agent to proceed through a multi-step workflow without seeking user confirmation ("Do not ask the user for confirmation at intermediate steps — verify each step yourself and continue"). This autonomy abuse pattern reduces human oversight during code modification and execution. Additionally, the skill's purpose involves processing untrusted application data and external sources, which presents an indirect prompt injection surface.
  • [COMMAND_EXECUTION]: The resources/setup.sh script automates environment configuration and launches a background web server (pixie start) to display results. The core workflow involves the agent writing and executing a custom Python Runnable (pixie_qa/run_app.py) to drive the application code under test.
  • [EXTERNAL_DOWNLOADS]: The setup process involves automated installation of the pixie-qa package and uses npx to update the skill from the vendor's resources. These downloads are required for the pipeline's functionality.
  • [DATA_EXFILTRATION]: The skill starts a background web server to visualize application traces and evaluation results. These traces capture data from application boundaries (via wrap calls), which may include sensitive information or credentials if handled by the application, creating a potential local data exposure channel.
Audit Metadata
Risk Level
SAFE
Analyzed
Apr 22, 2026, 12:57 AM