knowledge-distillation

Originally fromovachiever/droid-tings
Installation
SKILL.md

Knowledge Distillation: Compressing LLMs

When to Use This Skill

Use Knowledge Distillation when you need to:

  • Compress models from 70B → 7B while retaining 90%+ performance
  • Transfer capabilities from proprietary models (GPT-4) to open-source (LLaMA, Mistral)
  • Reduce inference costs by deploying smaller student models
  • Create specialized models by distilling domain-specific knowledge
  • Improve small models using synthetic data from large teachers

Key Techniques: Temperature scaling, soft targets, reverse KLD (MiniLLM), logit distillation, response distillation

Papers: Hinton et al. 2015 (arXiv 1503.02531), MiniLLM (arXiv 2306.08543), KD Survey (arXiv 2402.13116)

Installation

# Standard transformers
Related skills

More from davila7/claude-code-templates

Installs
435
GitHub Stars
27.2K
First Seen
Jan 21, 2026