The Agent Skills Directory

[COMMAND_EXECUTION]: The skill provides templates for executing benchmark workloads using Docker and native CLI tools (e.g., vllm serve, sglang.launch_server). These are standard operational procedures for the intended benchmarking use case.
[EXTERNAL_DOWNLOADS]: The skill references and pulls official container images from well-known technology providers such as NVIDIA (nvcr.io), LMSYS (lmsysorg), and vLLM. These downloads originate from trusted registries and are required for the skill's primary function.
[CREDENTIALS_UNSAFE]: The skill documentation includes clear instructions on secret hygiene, specifically advising users to pass sensitive keys like HF_TOKEN via environment variables and unquoted container arguments to prevent them from being printed in logs or artifacts.
[INDIRECT_PROMPT_INJECTION]: The skill contains a Python script (compare_benchmark_results.py) that processes benchmark output data (JSONL). While this represents a theoretical ingestion surface for untrusted data, it is a standard component of a benchmarking workflow where the data is generated by the user's own controlled runs.
[DYNAMIC_EXECUTION]: The validate_cookbook_configs.py script uses yaml.safe_load to parse configuration files, which is a secure practice that prevents arbitrary code execution during the validation phase.

llm-serving-auto-benchmark