gaia-submission

Warn

Audited by Gen Agent Trust Hub on Jun 13, 2026

Risk Level: MEDIUMEXTERNAL_DOWNLOADSCOMMAND_EXECUTIONPROMPT_INJECTIONREMOTE_CODE_EXECUTION
Full Analysis
  • [EXTERNAL_DOWNLOADS]: The skill utilizes npx @claude-flow/cli@latest to install the required CLI components. This package is hosted on a public registry and does not originate from a recognized trusted organization or well-known service.
  • [REMOTE_CODE_EXECUTION]: The use of npx results in the download and immediate execution of remote JavaScript code in the local environment. This execution pattern lacks specific version pinning or integrity verification.
  • [COMMAND_EXECUTION]: The skill executes multiple shell commands, including /gaia for benchmark operations and npx for metadata storage and task tracking. These commands interact with the local file system and network.
  • [DATA_EXPOSURE]: The environment validation phase checks for the presence of ANTHROPIC_API_KEY and HF_TOKEN. While it only echoes the prefix of these keys for verification, it highlights the skill's reliance on and potential exposure of highly sensitive credentials.
  • [INDIRECT_PROMPT_INJECTION]: The skill processes benchmark trajectories and results which contain untrusted agent-generated data.
  • Ingestion points: Reads result files from ~/.cache/ruflo/gaia/results-latest.json and processes trajectories.jsonl during packaging and comparison phases.
  • Boundary markers: No specific boundary markers or instructions to ignore embedded commands are present when the agent processes the benchmark content.
  • Capability inventory: The skill has access to shell execution, file system writes, and persistent memory storage through MCP tools.
  • Sanitization: There is no evidence of sanitization, validation, or escaping of the ingested JSON data before it is parsed or used to generate reports.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Jun 13, 2026, 01:23 PM
Security Audit — agent-trust-hub — gaia-submission