gaia-submission
Warn
Audited by Gen Agent Trust Hub on Jun 13, 2026
Risk Level: MEDIUMEXTERNAL_DOWNLOADSCOMMAND_EXECUTIONPROMPT_INJECTIONREMOTE_CODE_EXECUTION
Full Analysis
- [EXTERNAL_DOWNLOADS]: The skill utilizes
npx @claude-flow/cli@latestto install the required CLI components. This package is hosted on a public registry and does not originate from a recognized trusted organization or well-known service. - [REMOTE_CODE_EXECUTION]: The use of
npxresults in the download and immediate execution of remote JavaScript code in the local environment. This execution pattern lacks specific version pinning or integrity verification. - [COMMAND_EXECUTION]: The skill executes multiple shell commands, including
/gaiafor benchmark operations andnpxfor metadata storage and task tracking. These commands interact with the local file system and network. - [DATA_EXPOSURE]: The environment validation phase checks for the presence of
ANTHROPIC_API_KEYandHF_TOKEN. While it only echoes the prefix of these keys for verification, it highlights the skill's reliance on and potential exposure of highly sensitive credentials. - [INDIRECT_PROMPT_INJECTION]: The skill processes benchmark trajectories and results which contain untrusted agent-generated data.
- Ingestion points: Reads result files from
~/.cache/ruflo/gaia/results-latest.jsonand processestrajectories.jsonlduring packaging and comparison phases. - Boundary markers: No specific boundary markers or instructions to ignore embedded commands are present when the agent processes the benchmark content.
- Capability inventory: The skill has access to shell execution, file system writes, and persistent memory storage through MCP tools.
- Sanitization: There is no evidence of sanitization, validation, or escaping of the ingested JSON data before it is parsed or used to generate reports.
Audit Metadata