agentflow-evals

Pass

Audited by Gen Agent Trust Hub on May 1, 2026

Risk Level: SAFECOMMAND_EXECUTIONEXTERNAL_DOWNLOADSREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The skill provides instructions for executing the agentflow eval CLI to manage evaluation suites, run trials, and generate reports.\n- [EXTERNAL_DOWNLOADS]: The documentation references setup scripts (npm run setup:eval-repos, npm run setup:realworld-evals) that clone external GitHub repositories to serve as test fixtures for workflow evaluations.\n- [REMOTE_CODE_EXECUTION]: The framework supports custom_script criteria, which execute suite-specific scripts to perform objective validations of workflow outputs and artifacts.\n- [PROMPT_INJECTION]: The skill is designed to analyze external, untrusted data from GitHub issues and repositories, which introduces a surface for indirect prompt injection.\n
  • Ingestion points: External data enters the context through cloned repositories and pinned GitHub issue metadata used in the eval scenarios (references/operations-and-dogfood.md).\n
  • Boundary markers: The skill guidelines recommend keeping oracle metadata and upstream PR patches hidden from the agent's context to maintain evaluation integrity (references/eval-patterns.md).\n
  • Capability inventory: The system executes Agentflow graphs that can perform tool calls and file system operations, and it runs suite-provided custom scripts (references/grading-and-reporting.md).\n
  • Sanitization: The provided documentation does not specify sanitization or filtering protocols for the ingested repository content.
Audit Metadata
Risk Level
SAFE
Analyzed
May 1, 2026, 06:53 PM