cost-aware-llm-pipeline

Installation
Summary

Intelligent model routing, budget tracking, and retry logic to optimize LLM API costs without sacrificing quality.

  • Routes requests to cheaper models (Haiku) for simple tasks and expensive models (Sonnet, Opus) only when complexity thresholds are met, reducing spend by 3–19x on routine work
  • Tracks cumulative API costs with immutable dataclasses, enforces budget limits, and fails early to prevent overspend
  • Implements narrow retry logic that retries only on transient errors (network, rate limit, server errors) and fails immediately on permanent failures (auth, validation)
  • Caches long system prompts using Claude's prompt caching feature to reduce token usage and latency on repeated requests
SKILL.md

Cost-Aware LLM Pipeline

Patterns for controlling LLM API costs while maintaining quality. Combines model routing, budget tracking, retry logic, and prompt caching into a composable pipeline.

When to Activate

  • Building applications that call LLM APIs (Claude, GPT, etc.)
  • Processing batches of items with varying complexity
  • Need to stay within a budget for API spend
  • Optimizing cost without sacrificing quality on complex tasks

Core Concepts

1. Model Routing by Task Complexity

Automatically select cheaper models for simple tasks, reserving expensive models for complex ones.

MODEL_SONNET = "claude-sonnet-4-6"
Related skills
Installs
3.7K
GitHub Stars
179.7K
First Seen
Feb 17, 2026