gptq

Originally fromovachiever/droid-tings

Installation

SKILL.md

GPTQ (Generative Pre-trained Transformer Quantization)

Post-training quantization method that compresses LLMs to 4-bit with minimal accuracy loss using group-wise quantization.

When to use GPTQ

Use GPTQ when:

Need to fit large models (70B+) on limited GPU memory
Want 4× memory reduction with <2% accuracy loss
Deploying on consumer GPUs (RTX 4090, 3090)
Need faster inference (3-4× speedup vs FP16)

Use AWQ instead when:

Need slightly better accuracy (<1% loss)
Have newer GPUs (Ampere, Ada)
Want Marlin kernel support (2× faster on some GPUs)

Installs

358

Repository

orchestra-resea…h-skills

GitHub Stars

10.4K

First Seen

Feb 7, 2026

Security Audits

Gen Agent Trust HubPass

gptq — orchestra-research/ai-research-skills