gptq

Installation
SKILL.md

GPTQ (Generative Pre-trained Transformer Quantization)

Post-training quantization method that compresses LLMs to 4-bit with minimal accuracy loss using group-wise quantization.

When to use GPTQ

Use GPTQ when:

  • Need to fit large models (70B+) on limited GPU memory
  • Want 4× memory reduction with <2% accuracy loss
  • Deploying on consumer GPUs (RTX 4090, 3090)
  • Need faster inference (3-4× speedup vs FP16)

Use AWQ instead when:

  • Need slightly better accuracy (<1% loss)
  • Have newer GPUs (Ampere, Ada)
  • Want Marlin kernel support (2× faster on some GPUs)

Use bitsandbytes instead when:

  • Need simple integration with transformers
Related skills

More from ovachiever/droid-tings

Installs
43
GitHub Stars
43
First Seen
Jan 20, 2026