detecting-ai-model-prompt-injection-attacks
Installation
SKILL.md
Detecting AI Model Prompt Injection Attacks
When to Use
- Scanning user inputs to LLM-powered applications before they are forwarded to the model
- Building an input validation layer for chatbots, AI agents, or retrieval-augmented generation (RAG) pipelines
- Monitoring logs of LLM interactions to retrospectively identify prompt injection attempts
- Evaluating the effectiveness of existing prompt injection defenses through red-team testing
- Classifying prompt injection payloads during security incident investigations involving AI systems
Do not use as the sole defense mechanism against prompt injection -- always combine with output validation, privilege separation, and least-privilege tool access. Not suitable for detecting jailbreaks that do not involve injection of adversarial instructions.
Prerequisites
- Python 3.10+ with pip for installing detection dependencies
- The
transformersandtorchlibraries for running the DeBERTa-based classifier model - The
protectai/deberta-v3-base-prompt-injection-v2model from Hugging Face (downloaded on first run, approximately 700 MB) - Network access to Hugging Face Hub for initial model download (offline mode supported after first download)
- Sample prompt injection payloads for testing (the script includes a built-in test suite)