Predictive Coding & Information Bottleneck Hallucination Detection

This skill enables Claude to build production-grade hallucination detection systems based on the PCIB framework. Instead of relying on expensive 70B+ LLM judges or slow retrieval loops, PCIB extracts four interpretable signals from LLM outputs -- Uptake (prediction error), Stress (semantic stability), Conflict (logical consistency), and Falsifiability (confident contradiction) -- then feeds them into a lightweight Random Forest classifier. The result is a sub-1M-parameter detector achieving 0.87 AUROC at 5ms inference, using 75x less training data than comparable methods.

When to Use

When building a hallucination detection layer for a RAG pipeline that needs to flag unfaithful answers before they reach users
When implementing real-time quality gates on LLM-generated content (customer support, medical summaries, legal analysis)
When the user wants interpretable hallucination scores with per-signal explanations, not just a binary pass/fail
When designing a two-tier verification system where a fast first-pass filter reduces load on expensive LLM judges
When creating labeled datasets for hallucination detection with minimal annotation effort (the framework works with as few as 200 balanced samples)
When auditing an existing LLM system to understand why specific outputs are hallucinated (entity-level analysis, grounding strength, perturbation fragility)

Key Technique

The PCIB framework is grounded in two neuroscience-inspired principles. Predictive Coding treats the LLM as a hierarchical prediction machine: when given context, a grounded answer aligns with the model's updated beliefs, while a hallucination requires the model to suppress provided context in favor of prior biases. This is measured as Uptake -- the KL divergence between the answer distribution conditioned on context+question versus question alone. High Uptake means the context meaningfully shaped the answer (likely factual); low Uptake means the model ignored the context (likely hallucinated).

The Information Bottleneck principle posits that factual claims are robust compressed representations, while hallucinated claims are noise that degrades under perturbation. PCIB tests this by paraphrasing extracted claims at temperature 0.7, then measuring how much the NLI entailment probability shifts (Stress via Jensen-Shannon divergence) and whether contradictions emerge (Conflict via NLI contradiction probability). The insight: a true fact survives rephrasing; a hallucination crumbles.

predictive-coding-information-bottleneck

Predictive Coding & Information Bottleneck Hallucination Detection

When to Use

Key Technique