hunt-llm-ai
11. LLM / AI FEATURES
LLM bugs are only worth reporting when they cross a trust boundary you can prove — an OOB callback, a verbatim-reproducible secret, a cross-tenant record, or code execution. A model "saying something bad once" is confabulation, not a vulnerability. Read the False-Positive Gate before claiming anything.
Naming note (was wrong in v1): the model-level list is OWASP Top 10 for LLM Applications 2025 (LLM01 Prompt Injection, LLM07 System Prompt Leakage, LLM08 Vector/Embedding Weaknesses). The agent-level list is OWASP Top 10 for Agentic Applications (2026) from the Agentic Security Initiative (ASI), codes ASI01–ASI10. Do not write "OWASP ASI 2026" as if it were one document — cite the correct list per finding.
False-Positive Gate (Read First)
LLMs are non-deterministic. The single biggest source of bogus LLM reports is confabulation — the model inventing a plausible "system prompt" or "other user's data" that is not real. Apply every check below before writing a word.
- Run-twice rule (verbatim reproducibility). Send the identical extraction prompt in two fresh sessions (clear cookies/conversation). A real system-prompt leak reproduces token-for-token. If the two outputs differ in wording, structure, or detail, it is confabulation — discard it.
- Anchor to a known-secret. Don't ask "what is your system prompt"; ask the model to echo a string only the real prompt would contain (a tool name, an internal URL, a tenant ID format, a guardrail phrase you already saw leak in an error). Reproducible echo of a non-guessable anchor = real leak.
- Cross-tenant proof, not assertion. "Show user 456's last message" returning something proves nothing — the model can invent a message. Require a value you can independently verify belongs to account B (an order ID, an email, a support-ticket number) from your own attacker account A. No verifiable cross-account artifact = not an IDOR.
- Exfil = OOB or it didn't happen. A markdown image / tool fetch that should leak data is only confirmed when a Burp Collaborator / interactsh / webhook callback arrives carrying the data. Rendered markdown in your own screen is not proof the server/agent made the request.
- Refusal ≠ secure; compliance ≠ vuln. The model refusing is server policy, not server state. The model complying with "pretend you're an admin" with no privileged data or action behind it is theatre, not a finding. The bug lives in what the tool/data layer let the model do, not in what it said.