testing-for-system-prompt-leakage

Installation
SKILL.md

Testing for System Prompt Leakage

Authorized use only: The extraction payloads below are for assessing LLM applications you own or have written authorization to test. Extracting prompts, secrets, or routing logic from systems you are not authorized to test may be unlawful.

Overview

A system prompt (a.k.a. developer message, preamble, or instructions) steers an LLM application's behavior. OWASP LLM07:2025 System Prompt Leakage addresses the risk that these prompts contain sensitive material that was never meant to be exposed — API keys, database connection strings, internal role/permission logic, model-routing rules, content policies, and tool definitions. Two principles frame this skill:

  1. The system prompt must never be treated as a secret or used as a security control. If leaking it breaks your security model, the security model is wrong. The real findings during a leakage test are the secrets and logic embedded in the prompt that should have been enforced server-side.
  2. System prompts are extractable. Through direct requests, instruction-override (jailbreak) framing, translation/encoding tricks, completion attacks, and few-shot replay, adversaries can reliably recover preambles.

This maps to MITRE ATLAS AML.T0057 — LLM Data Leakage: triggering unintentional disclosure (here, of the system prompt and embedded data) through crafted queries. Testing combines manual payloads with automated scanners — garak (NVIDIA's LLM vulnerability scanner) and Promptfoo (red-team eval) provide repeatable extraction probes.

When to Use

  • During an LLM application penetration test or red-team engagement (OWASP LLM07 coverage).
  • When validating that no secrets, credentials, or authorization logic live in the system prompt.
  • When verifying that guardrails block prompt-extraction attempts.
  • When building a regression suite that fails the build if a new prompt leaks.
  • When assessing multi-agent or RAG apps where the preamble defines tool routing.
Installs
22
GitHub Stars
24.2K
First Seen
11 days ago
testing-for-system-prompt-leakage — mukul975/anthropic-cybersecurity-skills