speculative-decoding

Installation

SKILL.md

Speculative Decoding: Accelerating LLM Inference

When to Use This Skill

Use Speculative Decoding when you need to:

Speed up inference by 1.5-3.6× without quality loss
Reduce latency for real-time applications (chatbots, code generation)
Optimize throughput for high-volume serving
Deploy efficiently on limited hardware
Generate faster without changing model architecture

Key Techniques: Draft model speculative decoding, Medusa (multiple heads), Lookahead Decoding (Jacobi iteration)

Papers: Medusa (arXiv 2401.10774), Lookahead Decoding (ICML 2024), Speculative Decoding Survey (ACL 2024)

Installation

Installs

341

Repository

davila7/claude-…emplates

GitHub Stars

28.5K

First Seen

Jan 21, 2026

Security Audits

Gen Agent Trust HubWarn

speculative-decoding — davila7/claude-code-templates