flagrelease-entrance-flagos
Pass
Audited by Gen Agent Trust Hub on Mar 26, 2026
Risk Level: SAFECOMMAND_EXECUTIONEXTERNAL_DOWNLOADSPROMPT_INJECTION
Full Analysis
- [COMMAND_EXECUTION]: The skill uses shell commands to manage Docker containers and execute diagnostics.\n
- Evidence: Commands like
docker psanddocker inspectare used to identify and verify the target environment. Local scripts are copied into containers and executed to perform benchmarks and environment checks.\n- [EXTERNAL_DOWNLOADS]: The pipeline automates the installation of various software components and the downloading of model weights.\n - Evidence: Coordinates the installation of packages including
vLLM,FlagTree,FlagGems, andFlagCX. It also manages the download of theQwen3-0.6Bmodel weights during smoke testing.\n- [PROMPT_INJECTION]: The skill ingests data from external sources and user inputs, creating a surface for indirect prompt injection.\n - Ingestion points: The skill reads output from
dockerCLI commands and accepts user-provided container names and model paths viaAskUserQuestion(SKILL.md).\n - Boundary markers: No specific delimiters or safety instructions are used to separate untrusted data from the agent's internal logic.\n
- Capability inventory: The skill has broad capabilities, including unrestricted
Bashaccess, file modification (Write), and the ability to orchestrate other agents.\n - Sanitization: There is no evidence of input validation or output encoding for data retrieved from the system or the user before it is processed by the agent.
Audit Metadata