speculative-decoding

Installation
SKILL.md

Speculative Decoding: Accelerating LLM Inference

When to Use This Skill

Use Speculative Decoding when you need to:

  • Speed up inference by 1.5-3.6× without quality loss
  • Reduce latency for real-time applications (chatbots, code generation)
  • Optimize throughput for high-volume serving
  • Deploy efficiently on limited hardware
  • Generate faster without changing model architecture

Key Techniques: Draft model speculative decoding, Medusa (multiple heads), Lookahead Decoding (Jacobi iteration)

Papers: Medusa (arXiv 2401.10774), Lookahead Decoding (ICML 2024), Speculative Decoding Survey (ACL 2024)

Installation

# Standard speculative decoding (transformers)
Related skills

More from ovachiever/droid-tings

Installs
41
GitHub Stars
43
First Seen
Jan 20, 2026