perf-test
Fail
Audited by Gen Agent Trust Hub on Mar 25, 2026
Risk Level: HIGHREMOTE_CODE_EXECUTIONCOMMAND_EXECUTION
Full Analysis
- [REMOTE_CODE_EXECUTION]: The vLLM serving command in 'SKILL.md' and the benchmark execution in 'scripts/run_benchmark.py' both include the '--trust-remote-code' flag. This setting allows the model being served or benchmarked to execute arbitrary Python code from its repository, which is a critical risk if the model originates from an untrusted source.\n- [COMMAND_EXECUTION]: The skill uses 'docker exec' and 'docker cp' extensively to manage the lifecycle of the benchmarking environment. This allows the skill to execute arbitrary commands and modify files within the target Docker container.\n- [REMOTE_CODE_EXECUTION]: 'SKILL.md' contains a command that pipes the output of a network request ('curl http://localhost:8000/v1/models') directly into 'python3 -c'. While the inline script is a simple JSON parser, piping external data to a language interpreter is a high-risk pattern flagged for potential remote code execution.\n- [COMMAND_EXECUTION]: The scripts 'run_all_benchmarks.py' and 'run_benchmark.py' use 'subprocess.run' to invoke the 'vllm' CLI. Although 'shell=True' is not used, the scripts allow passing user-provided 'extra_args' which are appended to the command line, potentially allowing the injection of unintended CLI flags.
Recommendations
- HIGH: Downloads and executes remote code from: http://localhost:8000/v1/models - DO NOT USE without thorough review
- AI detected serious security threats
Audit Metadata