council-review
Council Review
Run any question, plan, or code through 5 independent advisors who use distinct reasoning methods, collaborate to refine answers, peer-review each other anonymously, and synthesize a verdict you can trust.
This skill implements the Diverse Multi-Agent Debate (DMAD) pattern. It is collaborative, not adversarial: agents seek truth through diversity of reasoning, not by arguing opposing positions.
Why This Works (Research Backing)
- Method diversity beats single-method debate. DMAD (ICLR 2025) shows that agents using distinct reasoning methods reliably outperform homogeneous councils — diverse medium-capacity models can beat GPT-4 on GSM-8K (91% vs 82%) when each agent applies a different reasoning approach.
- Collaborative debate beats adversarial debate. M3MADBench (2026) shows that across all modalities, collaborative DMAD outperforms adversarial Div-MAD "by a substantial margin." Adversarial paradigms introduce divergent noise; for open questions, plans, and decisions, collaborative deliberation is the right tool.
- Anonymous peer review prevents provider bias. Universal across the literature — reviewers defer to role names if visible, so peer-review responses must be shuffled.
- Confidence calibration breaks the martingale ceiling. Vanilla MAD often underperforms simple majority vote; confidence-modulated updates ("Demystifying MAD" 2026) systematically drift the council toward correct answers.
- Adaptive stopping cuts cost. KS-statistic convergence detection (S2 MAD via llmcouncil) reports up to 94.5% cost reduction on convergent questions.
- A true devil's advocate is the only reliable disagreement-inducer (V2). Across techniques for breaking consensus in multi-agent LLM teams, only a dedicated devil's advocate attacking the emerging answer produces genuine disagreement — soft role-framing and "please dissent" instructions test statistically indistinguishable from baseline. An LLM devil's advocate that challenges the recommendation measurably raises group decision accuracy (OpenReview 2026; IUI 2024). V2 adds this as a mandatory pass against the consensus — one devil's advocate vs the converged answer, not a standing advocates/skeptics split.
- Sycophancy collapses councils into premature consensus (V2). LLMs defer — to each other and to the answer implied by the framing — which can drop a council below single-agent accuracy (Peacemaker-or-Troublemaker 2026; CONSENSAGENT). V2 adds a sycophancy guardrail to advisor + peer prompts and a structured independent-assessment step (Kahneman's Mediating Assessments Protocol, 2019) so the chairman judges key attributes separately before the holistic call.
For stress-testing a known artifact (PR, draft, spec), use the separate /adversarial-review skill instead — single-critic adversarial probing is the right tool there.
When to Use
More from ngmeyer/skills
skillforge
Forge new Claude Code skills or optimize existing ones to V2. Two modes — `forge` scaffolds a new skill (frontmatter, progressive disclosure, helper scripts, mandatory Gotchas, iterate-then-extract); `optimize` makes an existing skill measurably better at its OUTCOME, not just its packaging (quality audit + domain outcome-research + changelog + V1-vs-V2 verification). Use when the user wants to create, write, build, scaffold, improve, upgrade, or optimize a skill.
1session-close
>
1six-pager
>
1claude-md
>
1session-recover
>
1adversarial-review
>
1