npu-op-benchmark

Fail

Audited by Gen Agent Trust Hub on May 19, 2026

Risk Level: HIGHCREDENTIALS_UNSAFECOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [CREDENTIALS_UNSAFE]: The instructions in SKILL.md and references/usage.md explicitly direct the agent to solicit the user's server IP, account, and password. Entering raw credentials into an LLM conversation is an unsafe practice as it exposes sensitive access information to the model and session logs.
  • [COMMAND_EXECUTION]: The skill relies on multiple shell command execution points:
  • scripts/find_docker_cann.sh utilizes docker exec to run shell commands like pip list and find inside containers.
  • scripts/bench_repeat_interleave.py uses subprocess.call to launch the benchmark runner.
  • scripts/cann_detect.sh performs directory listing and symlink checks on the host system to identify CANN installations.
  • [REMOTE_CODE_EXECUTION]: The skill is designed to execute Python scripts (bench_op.py) within a remote NPU environment. SKILL.md indicates that the agent should prioritize using "user provided" demo code, which facilitates the execution of arbitrary scripts on the target server if the input source is untrusted.
  • [PROMPT_INJECTION]: The skill is vulnerable to indirect prompt injection due to its handling of external data:
  • Ingestion points: SKILL.md and references/usage.md note the ingestion of user-provided demo code and operator details.
  • Boundary markers: There are no markers or specific instructions to isolate or ignore embedded instructions within the user-supplied code.
  • Capability inventory: The environment allows command execution via shell scripts and Python-based operator execution.
  • Sanitization: There is no verification or sanitization of user-provided code before it is executed in the target environment.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
May 19, 2026, 06:27 AM
Security Audit — agent-trust-hub — npu-op-benchmark