hqq-quantization

Installation
SKILL.md

HQQ - Half-Quadratic Quantization

Fast, calibration-free weight quantization supporting 8/4/3/2/1-bit precision with multiple optimized backends.

When to use HQQ

Use HQQ when:

  • Quantizing models without calibration data (no dataset needed)
  • Need fast quantization (minutes vs hours for GPTQ/AWQ)
  • Deploying with vLLM or HuggingFace Transformers
  • Fine-tuning quantized models with LoRA/PEFT
  • Experimenting with extreme quantization (2-bit, 1-bit)

Key advantages:

  • No calibration: Quantize any model instantly without sample data
  • Multiple backends: PyTorch, ATEN, TorchAO, Marlin, BitBlas for optimized inference
  • Flexible precision: 8/4/3/2/1-bit with configurable group sizes
  • Framework integration: Native HuggingFace and vLLM support
  • PEFT compatible: Fine-tune quantized models with LoRA
Related skills

More from zechenzhangagi/ai-research-skills

Installs
68
GitHub Stars
8.3K
First Seen
Jan 21, 2026