llm-inference-batching-scheduler

Installation
SKILL.md

LLM Inference Batching Scheduler

This skill provides guidance for solving LLM inference batching and scheduling optimization problems, where requests must be grouped into batches while minimizing cost, padding waste, and latency.

Problem Understanding

Before implementation, thoroughly analyze the problem structure:

Constraint Analysis

  1. Identify all hard constraints - Extract exact limits for:

    • Maximum unique shapes allowed (e.g., ≤ 8 shapes across all buckets)
    • Latency thresholds (P95, P99)
    • Cost budget thresholds
    • Padding ratio limits
  2. Compute hard bounds early - Before coding, calculate:

    • Minimum possible padding from alignment requirements
    • Minimum number of batches required for coverage
Related skills

More from letta-ai/skills

Installs
41
Repository
letta-ai/skills
GitHub Stars
97
First Seen
Jan 24, 2026