llm-serving-auto-benchmark

Pass

Audited by Gen Agent Trust Hub on May 13, 2026

Risk Level: SAFE
Full Analysis
  • [COMMAND_EXECUTION]: The skill provides templates for executing benchmark workloads using Docker and native CLI tools (e.g., vllm serve, sglang.launch_server). These are standard operational procedures for the intended benchmarking use case.
  • [EXTERNAL_DOWNLOADS]: The skill references and pulls official container images from well-known technology providers such as NVIDIA (nvcr.io), LMSYS (lmsysorg), and vLLM. These downloads originate from trusted registries and are required for the skill's primary function.
  • [CREDENTIALS_UNSAFE]: The skill documentation includes clear instructions on secret hygiene, specifically advising users to pass sensitive keys like HF_TOKEN via environment variables and unquoted container arguments to prevent them from being printed in logs or artifacts.
  • [INDIRECT_PROMPT_INJECTION]: The skill contains a Python script (compare_benchmark_results.py) that processes benchmark output data (JSONL). While this represents a theoretical ingestion surface for untrusted data, it is a standard component of a benchmarking workflow where the data is generated by the user's own controlled runs.
  • [DYNAMIC_EXECUTION]: The validate_cookbook_configs.py script uses yaml.safe_load to parse configuration files, which is a secure practice that prevents arbitrary code execution during the validation phase.
Audit Metadata
Risk Level
SAFE
Analyzed
May 13, 2026, 07:21 AM