speculative-decoding

Installation

SKILL.md

Speculative Decoding: Accelerating LLM Inference

When to Use This Skill

Use Speculative Decoding when you need to:

Speed up inference by 1.5-3.6× without quality loss
Reduce latency for real-time applications (chatbots, code generation)
Optimize throughput for high-volume serving
Deploy efficiently on limited hardware
Generate faster without changing model architecture

Key Techniques: Draft model speculative decoding, Medusa (multiple heads), Lookahead Decoding (Jacobi iteration)

Papers: Medusa (arXiv 2401.10774), Lookahead Decoding (ICML 2024), Speculative Decoding Survey (ACL 2024)

Installation

# Standard speculative decoding (transformers)

Related skills

More from ovachiever/droid-tings

security-auditor
Continuous security vulnerability scanning for OWASP Top 10, common vulnerabilities, and insecure patterns. Use when reviewing code, before deployments, or on file changes. Scans for SQL injection, XSS, secrets exposure, auth issues. Triggers on file changes, security mentions, deployment prep.
752
react-hook-form-zod
|
459
nextjs-shadcn-builder
Build new Next.js applications or migrate existing frontends (React, Vue, Angular, vanilla JS, etc.) to Next.js + shadcn/ui with systematic analysis and conversion. Enforces shadcn design principles - CSS variables for theming, standard UI components, no hardcoded values, consistent typography/colors. Use for creating Next.js apps, migrating frontends, adopting shadcn/ui, or standardizing component libraries. Includes MCP integration for shadcn documentation and automated codebase analysis.
226
deep-reading-analyst
Comprehensive framework for deep analysis of articles, papers, and long-form content using 10+ thinking models (SCQA, 5W2H, critical thinking, inversion, mental models, first principles, systems thinking, six thinking hats). Use when users want to: (1) deeply understand complex articles/content, (2) analyze arguments and identify logical flaws, (3) extract actionable insights from reading materials, (4) create study notes or learning summaries, (5) compare multiple sources, (6) transform knowledge into practical applications, or (7) apply specific thinking frameworks. Triggered by phrases like 'analyze this article,' 'help me understand,' 'deep dive into,' 'extract insights from,' 'use [framework name],' or when users provide URLs/long-form content for analysis.
191
playwright browser automation
Complete browser automation with Playwright. Auto-detects dev servers, writes clean test scripts to /tmp. Test pages, fill forms, take screenshots, check responsive design, validate UX, test login flows, check links, automate any browser task. Use when user wants to test websites, automate browser interactions, validate web functionality, or perform any browser-based testing.
146
threejs-graphics-optimizer
Performance optimization rules for THREE.js and graphics programming. Covers mobile-first optimization, fallback patterns, memory management, render loop efficiency, and general graphics best practices for smooth 60fps experiences across devices.
107

Installs

Repository

ovachiever/droid-tings

GitHub Stars

First Seen

Jan 20, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykWarn

speculative-decoding

Speculative Decoding: Accelerating LLM Inference

When to Use This Skill

Installation

More from ovachiever/droid-tings

security-auditor

react-hook-form-zod

nextjs-shadcn-builder

deep-reading-analyst

playwright browser automation

threejs-graphics-optimizer