awq-quantization

Originally fromzechenzhangagi/ai-research-skills

Installation

SKILL.md

AWQ (Activation-aware Weight Quantization)

4-bit quantization that preserves salient weights based on activation patterns, achieving 3x speedup with minimal accuracy loss.

When to use AWQ

Use AWQ when:

Need 4-bit quantization with <5% accuracy loss
Deploying instruction-tuned or chat models (AWQ generalizes better)
Want ~2.5-3x inference speedup over FP16
Using vLLM for production serving
Have Ampere+ GPUs (A100, H100, RTX 40xx) for Marlin kernel support

Use GPTQ instead when:

Need maximum ecosystem compatibility (more tools support GPTQ)
Working with ExLlamaV2 backend specifically
Have older GPUs without Marlin support

Installs

365

Repository

orchestra-resea…h-skills

GitHub Stars

10.5K

First Seen

Feb 7, 2026

Security Audits

Gen Agent Trust HubFail

awq-quantization — orchestra-research/ai-research-skills