Council Review

Run any question, plan, or code through 5 independent advisors who use distinct reasoning methods, collaborate to refine answers, peer-review each other anonymously, and synthesize a verdict you can trust.

This skill implements the Diverse Multi-Agent Debate (DMAD) pattern. It is collaborative, not adversarial: agents seek truth through diversity of reasoning, not by arguing opposing positions.

Why This Works (Research Backing)

Method diversity beats single-method debate. DMAD (ICLR 2025) shows that agents using distinct reasoning methods reliably outperform homogeneous councils — diverse medium-capacity models can beat GPT-4 on GSM-8K (91% vs 82%) when each agent applies a different reasoning approach.
Collaborative debate beats adversarial debate. M3MADBench (2026) shows that across all modalities, collaborative DMAD outperforms adversarial Div-MAD "by a substantial margin." Adversarial paradigms introduce divergent noise; for open questions, plans, and decisions, collaborative deliberation is the right tool.
Anonymous peer review prevents provider bias. Universal across the literature — reviewers defer to role names if visible, so peer-review responses must be shuffled.
Confidence calibration breaks the martingale ceiling. Vanilla MAD often underperforms simple majority vote; confidence-modulated updates ("Demystifying MAD" 2026) systematically drift the council toward correct answers.
Adaptive stopping cuts cost. KS-statistic convergence detection (S2 MAD via llmcouncil) reports up to 94.5% cost reduction on convergent questions.
A true devil's advocate is the only reliable disagreement-inducer (V2). Across techniques for breaking consensus in multi-agent LLM teams, only a dedicated devil's advocate attacking the emerging answer produces genuine disagreement — soft role-framing and "please dissent" instructions test statistically indistinguishable from baseline. An LLM devil's advocate that challenges the recommendation measurably raises group decision accuracy (OpenReview 2026; IUI 2024). V2 adds this as a mandatory pass against the consensus — one devil's advocate vs the converged answer, not a standing advocates/skeptics split.
Sycophancy collapses councils into premature consensus (V2). LLMs defer — to each other and to the answer implied by the framing — which can drop a council below single-agent accuracy (Peacemaker-or-Troublemaker 2026; CONSENSAGENT). V2 adds a sycophancy guardrail to advisor + peer prompts and a structured independent-assessment step (Kahneman's Mediating Assessments Protocol, 2019) so the chairman judges key attributes separately before the holistic call.

For stress-testing a known artifact (PR, draft, spec), use the separate /adversarial-review skill instead — single-critic adversarial probing is the right tool there.

council-review

Council Review

Why This Works (Research Backing)

When to Use