testing-prompt-injection-in-rag-pipelines
Testing Prompt Injection in RAG Pipelines
Authorized-use-only notice: This skill describes offensive testing techniques against Retrieval-Augmented Generation (RAG) systems. Run these probes only against applications you own or have explicit written authorization to test. Adversarial inputs that exfiltrate documents or hijack a model can cause real harm to production systems and downstream users. Always test in a non-production environment first and follow your engagement rules of engagement (RoE).
Overview
Retrieval-Augmented Generation (RAG) pipelines combine a large language model (LLM) with a retrieval layer (a vector store such as FAISS, Chroma, Pinecone, Milvus, or pgvector) so the model can answer questions over private documents. The retrieval layer is an injection surface: any text that the retriever returns is concatenated into the model's context window and is treated by the model as authoritative. An attacker who can influence the document corpus (a poisoned PDF, a malicious wiki edit, a planted support ticket, a crafted email) can plant instructions that the model will follow when that chunk is retrieved. This is indirect prompt injection delivered through the retrieval channel, and it maps to MITRE ATLAS AML.T0051 (LLM Prompt Injection) and OWASP LLM01:2025 Prompt Injection.
Beyond text-level injection, RAG pipelines are vulnerable at the embedding layer. An attacker who understands the embedding model can craft text that lands near high-value queries in vector space ("embedding manipulation" / retrieval poisoning), guaranteeing that the malicious chunk is retrieved for a target query even when it is not semantically relevant to a human. This skill walks through systematically probing both surfaces using NVIDIA garak, Promptfoo red-team plugins, and Microsoft PyRIT, with verified, runnable commands from each tool's documentation.
When to Use
- When security-testing a RAG chatbot, internal knowledge assistant, or document-Q&A product before or after release.
- When validating that retrieval guardrails (input/output filtering, context sandboxing) actually block injected instructions.
- During an AI red-team engagement scoped to test the LLM application layer (OWASP LLM Top 10 coverage).
- When you ingest user-controllable or third-party content into a vector store and need to prove the blast radius of a poisoned document.
- As a regression gate in CI/CD: re-run the probe suite on every prompt-template or retriever change.