flagrelease-entrance-flagos

Pass

Audited by Gen Agent Trust Hub on Mar 26, 2026

Risk Level: SAFECOMMAND_EXECUTIONEXTERNAL_DOWNLOADSPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The skill uses shell commands to manage Docker containers and execute diagnostics.\n
  • Evidence: Commands like docker ps and docker inspect are used to identify and verify the target environment. Local scripts are copied into containers and executed to perform benchmarks and environment checks.\n- [EXTERNAL_DOWNLOADS]: The pipeline automates the installation of various software components and the downloading of model weights.\n
  • Evidence: Coordinates the installation of packages including vLLM, FlagTree, FlagGems, and FlagCX. It also manages the download of the Qwen3-0.6B model weights during smoke testing.\n- [PROMPT_INJECTION]: The skill ingests data from external sources and user inputs, creating a surface for indirect prompt injection.\n
  • Ingestion points: The skill reads output from docker CLI commands and accepts user-provided container names and model paths via AskUserQuestion (SKILL.md).\n
  • Boundary markers: No specific delimiters or safety instructions are used to separate untrusted data from the agent's internal logic.\n
  • Capability inventory: The skill has broad capabilities, including unrestricted Bash access, file modification (Write), and the ability to orchestrate other agents.\n
  • Sanitization: There is no evidence of input validation or output encoding for data retrieved from the system or the user before it is processed by the agent.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 26, 2026, 05:55 AM