run-evals
Pass
Audited by Gen Agent Trust Hub on May 14, 2026
Risk Level: SAFECOMMAND_EXECUTION
Full Analysis
- [COMMAND_EXECUTION]: The skill uses the
daytonaCLI to create and manage cloud sandboxes. This includes executing setup scripts within the sandbox environment and configuring resource access (e.g.,--public). These actions are performed to facilitate end-to-end testing of the application UI.\n- [PROMPT_INJECTION]: The skill presents an indirect prompt injection surface (Category 8) because it processes external data from evaluation files and application UI content during runtime. This is an inherent part of its functionality as a testing tool. Ingestion points: Evaluation files located in theevals/directory and browser page content retrieved viabrowser_evaluate. Boundary markers: No specific delimiters or warnings to ignore embedded instructions are present in the skill instructions. Capability inventory: The skill can execute shell commands within the sandbox viadaytona execand manipulate the browser environment using tools likebrowser_evaluateandbrowser_click. Sanitization: No explicit sanitization or validation logic is defined for the external data ingested during the evaluation flows.
Audit Metadata