sf-eval

Pass

Audited by Gen Agent Trust Hub on Mar 28, 2026

Risk Level: SAFECOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The skill executes a local bash script evals/checks/static-checks.sh when a user requests a static check on a file. This is an intended function for code quality verification and is restricted to the local environment.\n- [PROMPT_INJECTION]: The instructions for 'Baseline Generation' guide the AI to simulate a generic model by intentionally omitting platform-specific security patterns such as WITH USER_MODE. This is a documented benchmarking technique used to quantify the value of skill-provided context and does not target the agent's safety guardrails.\n- [DATA_EXPOSURE]: The skill reads local benchmark definitions, rubrics, and judge prompts from the project's evals/ directory to facilitate the comparison and scoring process. These operations are confined to the local repository context.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 28, 2026, 11:15 PM