awq-quantization

Originally fromdavila7/claude-code-templates

Installation

SKILL.md

AWQ (Activation-aware Weight Quantization)

4-bit quantization that preserves salient weights based on activation patterns, achieving 3x speedup with minimal accuracy loss.

When to use AWQ

Use AWQ when:

Need 4-bit quantization with <5% accuracy loss
Deploying instruction-tuned or chat models (AWQ generalizes better)
Want ~2.5-3x inference speedup over FP16
Using vLLM for production serving
Have Ampere+ GPUs (A100, H100, RTX 40xx) for Marlin kernel support

Use GPTQ instead when:

Need maximum ecosystem compatibility (more tools support GPTQ)
Working with ExLlamaV2 backend specifically
Have older GPUs without Marlin support

Installs

106

Repository

zechenzhangagi/…h-skills

GitHub Stars

10.4K

First Seen

Jan 21, 2026

Security Audits

Gen Agent Trust HubPass

awq-quantization — zechenzhangagi/ai-research-skills