dflash-mlx-speculative-decoding

Installation
SKILL.md

dflash-mlx Speculative Decoding

Skill by ara.so — Daily 2026 Skills collection.

DFlash implements lossless speculative decoding for MLX on Apple Silicon. A small draft model (~1B params) generates 16 tokens in parallel using block diffusion; the target model verifies all 16 in a single forward pass. Tokens are only emitted after target verification — output is lossless (every token is the target model's greedy argmax).

Typical speedups: 1.7x–4.1x over baseline mlx_lm depending on model size and context length. Acceptance rates hover around 87–90% for Qwen3.5 models.

Installation

pip install dflash-mlx

# or isolated install
pipx install dflash-mlx

Requires Python 3.10+, MLX 0.31.1+, Apple Silicon Mac.

Related skills
Installs
264
GitHub Stars
4
First Seen
Apr 14, 2026