detecting-ai-model-prompt-injection-attacks

Installation
SKILL.md

Detecting AI Model Prompt Injection Attacks

When to Use

  • Scanning user inputs to LLM-powered applications before they are forwarded to the model
  • Building an input validation layer for chatbots, AI agents, or retrieval-augmented generation (RAG) pipelines
  • Monitoring logs of LLM interactions to retrospectively identify prompt injection attempts
  • Evaluating the effectiveness of existing prompt injection defenses through red-team testing
  • Classifying prompt injection payloads during security incident investigations involving AI systems

Do not use as the sole defense mechanism against prompt injection -- always combine with output validation, privilege separation, and least-privilege tool access. Not suitable for detecting jailbreaks that do not involve injection of adversarial instructions.

Prerequisites

  • Python 3.10+ with pip for installing detection dependencies
  • The transformers and torch libraries for running the DeBERTa-based classifier model
  • The protectai/deberta-v3-base-prompt-injection-v2 model from Hugging Face (downloaded on first run, approximately 700 MB)
  • Network access to Hugging Face Hub for initial model download (offline mode supported after first download)
  • Sample prompt injection payloads for testing (the script includes a built-in test suite)
Related skills
Installs
23
GitHub Stars
6.2K
First Seen
Mar 20, 2026