The Agent Skills Directory

[REMOTE_CODE_EXECUTION]: The vLLM serving command in 'SKILL.md' and the benchmark execution in 'scripts/run_benchmark.py' both include the '--trust-remote-code' flag. This setting allows the model being served or benchmarked to execute arbitrary Python code from its repository, which is a critical risk if the model originates from an untrusted source.\n- [COMMAND_EXECUTION]: The skill uses 'docker exec' and 'docker cp' extensively to manage the lifecycle of the benchmarking environment. This allows the skill to execute arbitrary commands and modify files within the target Docker container.\n- [REMOTE_CODE_EXECUTION]: 'SKILL.md' contains a command that pipes the output of a network request ('curl http://localhost:8000/v1/models') directly into 'python3 -c'. While the inline script is a simple JSON parser, piping external data to a language interpreter is a high-risk pattern flagged for potential remote code execution.\n- [COMMAND_EXECUTION]: The scripts 'run_all_benchmarks.py' and 'run_benchmark.py' use 'subprocess.run' to invoke the 'vllm' CLI. Although 'shell=True' is not used, the scripts allow passing user-provided 'extra_args' which are appended to the command line, potentially allowing the injection of unintended CLI flags.

perf-test