ml-inference-optimization

Installation
SKILL.md

ML Inference Optimization

When to Use This Skill

Use this skill when:

  • Optimizing ML inference latency
  • Reducing model size for deployment
  • Implementing model compression techniques
  • Designing inference caching strategies
  • Deploying models at the edge
  • Balancing accuracy vs. latency trade-offs

Keywords: inference optimization, latency, model compression, distillation, pruning, quantization, caching, edge ML, TensorRT, ONNX, model serving, batching, hardware acceleration

Inference Optimization Overview

┌─────────────────────────────────────────────────────────────────────┐
Related skills
Installs
24
GitHub Stars
65
First Seen
Jan 24, 2026