ai-threat-testing

Warn

Audited by Gen Agent Trust Hub on May 11, 2026

Risk Level: MEDIUMPROMPT_INJECTIONCOMMAND_EXECUTIONEXTERNAL_DOWNLOADS
Full Analysis
  • [PROMPT_INJECTION]: The reference file reference/llm01-prompt-injection.md contains several instructions that attempt to override agent behavior and safety protocols. Examples include "Ignore all previous instructions...", "Disregard safety guidelines for...", and "You are now in debug mode...". While these are intended as testing payloads for the agent to use against targets, their presence in the skill's own instructions could lead to accidental instruction overriding if the agent interprets them as its own rules.
  • [COMMAND_EXECUTION]: The skill documents specific payloads for executing system commands and arbitrary Python code in reference/llm02-insecure-output.md. This includes dangerous patterns such as __import__('os').system('...') and shell execution markers like `command` or $(whoami). These represent a significant capability if the agent executes the model-generated payloads it is testing.
  • [EXTERNAL_DOWNLOADS]: The reference/llm05-supply-chain.md file suggests running external security tools such as safety check, npm audit, and pip-audit. These tools perform network requests to official package registries and vulnerability databases to identify supply chain risks.
  • [PROMPT_INJECTION]: The skill framework exhibits an indirect prompt injection risk surface due to how it processes target data.
  • Ingestion points: The skill ingests untrusted content from external API endpoints, web URLs, and vector database documents (RAG) during the security assessment process, as noted in SKILL.md and reference/llm08-vector-poisoning.md.
  • Boundary markers: The skill body and playbooks do not implement boundary markers (e.g., XML tags or triple quotes) or "ignore embedded instructions" warnings to isolate the target data from the agent's core instructions.
  • Capability inventory: The framework incorporates capabilities for code and command execution as part of its vulnerability exploitation modules, particularly in reference/llm02-insecure-output.md.
  • Sanitization: There is no documented process for sanitizing, escaping, or validating the content retrieved from target systems before it is incorporated into the agent's context.
Audit Metadata
Risk Level
MEDIUM
Analyzed
May 11, 2026, 07:45 AM