npu-op-benchmark
Fail
Audited by Gen Agent Trust Hub on May 19, 2026
Risk Level: HIGHCREDENTIALS_UNSAFECOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
- [CREDENTIALS_UNSAFE]: The instructions in
SKILL.mdandreferences/usage.mdexplicitly direct the agent to solicit the user's server IP, account, and password. Entering raw credentials into an LLM conversation is an unsafe practice as it exposes sensitive access information to the model and session logs. - [COMMAND_EXECUTION]: The skill relies on multiple shell command execution points:
scripts/find_docker_cann.shutilizesdocker execto run shell commands likepip listandfindinside containers.scripts/bench_repeat_interleave.pyusessubprocess.callto launch the benchmark runner.scripts/cann_detect.shperforms directory listing and symlink checks on the host system to identify CANN installations.- [REMOTE_CODE_EXECUTION]: The skill is designed to execute Python scripts (
bench_op.py) within a remote NPU environment.SKILL.mdindicates that the agent should prioritize using "user provided" demo code, which facilitates the execution of arbitrary scripts on the target server if the input source is untrusted. - [PROMPT_INJECTION]: The skill is vulnerable to indirect prompt injection due to its handling of external data:
- Ingestion points:
SKILL.mdandreferences/usage.mdnote the ingestion of user-provided demo code and operator details. - Boundary markers: There are no markers or specific instructions to isolate or ignore embedded instructions within the user-supplied code.
- Capability inventory: The environment allows command execution via shell scripts and Python-based operator execution.
- Sanitization: There is no verification or sanitization of user-provided code before it is executed in the target environment.
Recommendations
- AI detected serious security threats
Audit Metadata