ai-threat-testing
Warn
Audited by Gen Agent Trust Hub on May 11, 2026
Risk Level: MEDIUMPROMPT_INJECTIONCOMMAND_EXECUTIONEXTERNAL_DOWNLOADS
Full Analysis
- [PROMPT_INJECTION]: The reference file
reference/llm01-prompt-injection.mdcontains several instructions that attempt to override agent behavior and safety protocols. Examples include "Ignore all previous instructions...", "Disregard safety guidelines for...", and "You are now in debug mode...". While these are intended as testing payloads for the agent to use against targets, their presence in the skill's own instructions could lead to accidental instruction overriding if the agent interprets them as its own rules. - [COMMAND_EXECUTION]: The skill documents specific payloads for executing system commands and arbitrary Python code in
reference/llm02-insecure-output.md. This includes dangerous patterns such as__import__('os').system('...')and shell execution markers like`command`or$(whoami). These represent a significant capability if the agent executes the model-generated payloads it is testing. - [EXTERNAL_DOWNLOADS]: The
reference/llm05-supply-chain.mdfile suggests running external security tools such assafety check,npm audit, andpip-audit. These tools perform network requests to official package registries and vulnerability databases to identify supply chain risks. - [PROMPT_INJECTION]: The skill framework exhibits an indirect prompt injection risk surface due to how it processes target data.
- Ingestion points: The skill ingests untrusted content from external API endpoints, web URLs, and vector database documents (RAG) during the security assessment process, as noted in
SKILL.mdandreference/llm08-vector-poisoning.md. - Boundary markers: The skill body and playbooks do not implement boundary markers (e.g., XML tags or triple quotes) or "ignore embedded instructions" warnings to isolate the target data from the agent's core instructions.
- Capability inventory: The framework incorporates capabilities for code and command execution as part of its vulnerability exploitation modules, particularly in
reference/llm02-insecure-output.md. - Sanitization: There is no documented process for sanitizing, escaping, or validating the content retrieved from target systems before it is incorporated into the agent's context.
Audit Metadata