run-evals

Pass

Audited by Gen Agent Trust Hub on May 14, 2026

Risk Level: SAFECOMMAND_EXECUTION
Full Analysis
  • [COMMAND_EXECUTION]: The skill uses the daytona CLI to create and manage cloud sandboxes. This includes executing setup scripts within the sandbox environment and configuring resource access (e.g., --public). These actions are performed to facilitate end-to-end testing of the application UI.\n- [PROMPT_INJECTION]: The skill presents an indirect prompt injection surface (Category 8) because it processes external data from evaluation files and application UI content during runtime. This is an inherent part of its functionality as a testing tool. Ingestion points: Evaluation files located in the evals/ directory and browser page content retrieved via browser_evaluate. Boundary markers: No specific delimiters or warnings to ignore embedded instructions are present in the skill instructions. Capability inventory: The skill can execute shell commands within the sandbox via daytona exec and manipulate the browser environment using tools like browser_evaluate and browser_click. Sanitization: No explicit sanitization or validation logic is defined for the external data ingested during the evaluation flows.
Audit Metadata
Risk Level
SAFE
Analyzed
May 14, 2026, 07:23 AM