llm-inference-batching-scheduler

Installation

SKILL.md

LLM Inference Batching Scheduler

This skill provides guidance for solving LLM inference batching and scheduling optimization problems, where requests must be grouped into batches while minimizing cost, padding waste, and latency.

Problem Understanding

Before implementation, thoroughly analyze the problem structure:

Constraint Analysis

Identify all hard constraints - Extract exact limits for:
- Maximum unique shapes allowed (e.g., ≤ 8 shapes across all buckets)
- Latency thresholds (P95, P99)
- Cost budget thresholds
- Padding ratio limits
Compute hard bounds early - Before coding, calculate:
- Minimum possible padding from alignment requirements
- Minimum number of batches required for coverage