eval-driven-dev
Pass
Audited by Gen Agent Trust Hub on Apr 22, 2026
Risk Level: SAFEPROMPT_INJECTIONCOMMAND_EXECUTIONEXTERNAL_DOWNLOADSDATA_EXFILTRATION
Full Analysis
- [PROMPT_INJECTION]: The skill includes instructions that explicitly direct the agent to proceed through a multi-step workflow without seeking user confirmation ("Do not ask the user for confirmation at intermediate steps — verify each step yourself and continue"). This autonomy abuse pattern reduces human oversight during code modification and execution. Additionally, the skill's purpose involves processing untrusted application data and external sources, which presents an indirect prompt injection surface.
- [COMMAND_EXECUTION]: The
resources/setup.shscript automates environment configuration and launches a background web server (pixie start) to display results. The core workflow involves the agent writing and executing a custom Python Runnable (pixie_qa/run_app.py) to drive the application code under test. - [EXTERNAL_DOWNLOADS]: The setup process involves automated installation of the
pixie-qapackage and usesnpxto update the skill from the vendor's resources. These downloads are required for the pipeline's functionality. - [DATA_EXFILTRATION]: The skill starts a background web server to visualize application traces and evaluation results. These traces capture data from application boundaries (via
wrapcalls), which may include sensitive information or credentials if handled by the application, creating a potential local data exposure channel.
Audit Metadata