hugging-face-evaluation-manager
Pass
Audited by Gen Agent Trust Hub on Mar 28, 2026
Risk Level: SAFEPROMPT_INJECTIONREMOTE_CODE_EXECUTIONCOMMAND_EXECUTIONEXTERNAL_DOWNLOADS
Full Analysis
- [PROMPT_INJECTION]: The skill processes untrusted markdown content from Hugging Face model README files to extract evaluation metrics, creating an indirect prompt injection surface where a malicious repository owner could embed instructions to influence the agent's behavior.\n
- Ingestion points: README content is fetched from external repositories via the Hugging Face API in
scripts/evaluation_manager.py.\n - Boundary markers: The skill does not implement explicit boundary markers or instructions to the agent to ignore embedded instructions within the processed README content.\n
- Capability inventory: The skill has the ability to update remote model cards (
push_to_hubinscripts/evaluation_manager.py) and execute CLI tools viasubprocess.runin several evaluation scripts.\n - Sanitization: While the script parses the markdown to extract numeric metrics, it does not sanitize or escape the text content to prevent it from being interpreted as instructions by the agent.\n- [REMOTE_CODE_EXECUTION]: Several scripts, including
inspect_vllm_uv.py,lighteval_vllm_uv.py, andrun_vllm_eval_job.py, include a--trust-remote-codeparameter. This flag enables the Hugging Face Transformers library to execute arbitrary Python code contained within a model's repository. While a standard feature in the ecosystem for custom architectures, it represents a significant security risk when used with unvetted third-party models.\n- [COMMAND_EXECUTION]: The skill usessubprocess.runacross multiple scripts to wrap CLI tools such ashf,inspect, andlighteval. The commands are invoked using lists of arguments rather than shell strings, which is a safe practice to prevent shell injection. However, these tools still perform high-privilege operations like submitting compute jobs and modifying remote repository metadata.\n- [EXTERNAL_DOWNLOADS]: The skill fetches evaluation data from the Artificial Analysis API (artificialanalysis.ai) and interacts with the Hugging Face Hub. It also utilizesuv runwhich dynamically installs dependencies listed in script headers (PEP 723), such asinspect-ai,lighteval, andvllm, from official registries.
Audit Metadata