speculative-decoding

Originally fromdavila7/claude-code-templates

Installation

SKILL.md

Speculative Decoding: Accelerating LLM Inference

When to Use This Skill

Use Speculative Decoding when you need to:

Speed up inference by 1.5-3.6× without quality loss
Reduce latency for real-time applications (chatbots, code generation)
Optimize throughput for high-volume serving
Deploy efficiently on limited hardware
Generate faster without changing model architecture

Key Techniques: Draft model speculative decoding, Medusa (multiple heads), Lookahead Decoding (Jacobi iteration)

Papers: Medusa (arXiv 2401.10774), Lookahead Decoding (ICML 2024), Speculative Decoding Survey (ACL 2024)

Installation

Installs

356

Repository

orchestra-resea…h-skills

GitHub Stars

10.4K

First Seen

Feb 7, 2026

Security Audits

Gen Agent Trust HubPass

speculative-decoding — orchestra-research/ai-research-skills