awq-quantization

Installation
SKILL.md

AWQ (Activation-aware Weight Quantization)

4-bit quantization that preserves salient weights based on activation patterns, achieving 3x speedup with minimal accuracy loss.

When to use AWQ

Use AWQ when:

  • Need 4-bit quantization with <5% accuracy loss
  • Deploying instruction-tuned or chat models (AWQ generalizes better)
  • Want ~2.5-3x inference speedup over FP16
  • Using vLLM for production serving
  • Have Ampere+ GPUs (A100, H100, RTX 40xx) for Marlin kernel support

Use GPTQ instead when:

  • Need maximum ecosystem compatibility (more tools support GPTQ)
  • Working with ExLlamaV2 backend specifically
  • Have older GPUs without Marlin support

Use bitsandbytes instead when:

Related skills

More from davila7/claude-code-templates

Installs
294
GitHub Stars
27.2K
First Seen
Jan 21, 2026