cost-aware-llm-pipeline

Installation
Summary

Intelligent model routing, budget tracking, and retry logic to optimize LLM API costs without sacrificing quality.

  • Routes requests to cheaper models (Haiku) for simple tasks and expensive models (Sonnet, Opus) only when complexity thresholds are met, reducing spend by 3–19x on routine work
  • Tracks cumulative API costs with immutable dataclasses, enforces budget limits, and fails early to prevent overspend
  • Implements narrow retry logic that retries only on transient errors (network, rate limit, server errors) and fails immediately on permanent failures (auth, validation)
  • Caches long system prompts using Claude's prompt caching feature to reduce token usage and latency on repeated requests
SKILL.md

Cost-Aware LLM Pipeline

Patterns for controlling LLM API costs while maintaining quality. Combines model routing, budget tracking, retry logic, and prompt caching into a composable pipeline.

When to Activate

  • Building applications that call LLM APIs (Claude, GPT, etc.)
  • Processing batches of items with varying complexity
  • Need to stay within a budget for API spend
  • Optimizing cost without sacrificing quality on complex tasks

Core Concepts

1. Model Routing by Task Complexity

Automatically select cheaper models for simple tasks, reserving expensive models for complex ones.

Related skills
Installs
3.8K
GitHub Stars
181.5K
First Seen
Feb 17, 2026