llm-serving-auto-benchmark
Pass
Audited by Gen Agent Trust Hub on May 13, 2026
Risk Level: SAFE
Full Analysis
- [COMMAND_EXECUTION]: The skill provides templates for executing benchmark workloads using Docker and native CLI tools (e.g.,
vllm serve,sglang.launch_server). These are standard operational procedures for the intended benchmarking use case. - [EXTERNAL_DOWNLOADS]: The skill references and pulls official container images from well-known technology providers such as NVIDIA (nvcr.io), LMSYS (lmsysorg), and vLLM. These downloads originate from trusted registries and are required for the skill's primary function.
- [CREDENTIALS_UNSAFE]: The skill documentation includes clear instructions on secret hygiene, specifically advising users to pass sensitive keys like
HF_TOKENvia environment variables and unquoted container arguments to prevent them from being printed in logs or artifacts. - [INDIRECT_PROMPT_INJECTION]: The skill contains a Python script (
compare_benchmark_results.py) that processes benchmark output data (JSONL). While this represents a theoretical ingestion surface for untrusted data, it is a standard component of a benchmarking workflow where the data is generated by the user's own controlled runs. - [DYNAMIC_EXECUTION]: The
validate_cookbook_configs.pyscript usesyaml.safe_loadto parse configuration files, which is a secure practice that prevents arbitrary code execution during the validation phase.
Audit Metadata