speculative-decoding

Installation

SKILL.md

Speculative Decoding: Accelerating LLM Inference

When to Use This Skill

Use Speculative Decoding when you need to:

Speed up inference by 1.5-3.6× without quality loss
Reduce latency for real-time applications (chatbots, code generation)
Optimize throughput for high-volume serving
Deploy efficiently on limited hardware
Generate faster without changing model architecture

Key Techniques: Draft model speculative decoding, Medusa (multiple heads), Lookahead Decoding (Jacobi iteration)

Papers: Medusa (arXiv 2401.10774), Lookahead Decoding (ICML 2024), Speculative Decoding Survey (ACL 2024)

Installation

Installs

7

Repository

firecrawl/ai-re…h-skills

GitHub Stars

10

First Seen

Mar 28, 2026

Security Audits

Gen Agent Trust HubWarn

speculative-decoding — firecrawl/ai-research-skills