cost-aware-llm-pipeline

Installation

Summary

Intelligent model routing, budget tracking, and retry logic to optimize LLM API costs without sacrificing quality.

Routes requests to cheaper models (Haiku) for simple tasks and expensive models (Sonnet, Opus) only when complexity thresholds are met, reducing spend by 3–19x on routine work
Tracks cumulative API costs with immutable dataclasses, enforces budget limits, and fails early to prevent overspend
Implements narrow retry logic that retries only on transient errors (network, rate limit, server errors) and fails immediately on permanent failures (auth, validation)
Caches long system prompts using Claude's prompt caching feature to reduce token usage and latency on repeated requests

SKILL.md

Cost-Aware LLM Pipeline

Patterns for controlling LLM API costs while maintaining quality. Combines model routing, budget tracking, retry logic, and prompt caching into a composable pipeline.

When to Activate

Building applications that call LLM APIs (Claude, GPT, etc.)
Processing batches of items with varying complexity
Need to stay within a budget for API spend
Optimizing cost without sacrificing quality on complex tasks

Core Concepts

1. Model Routing by Task Complexity

Automatically select cheaper models for simple tasks, reserving expensive models for complex ones.

MODEL_SONNET = "claude-sonnet-4-6"

Related skills

More from affaan-m/everything-claude-code

Installs

3.7K

Repository

affaan-m/everyt…ude-code

GitHub Stars

179.7K

First Seen

Feb 17, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykPass

cost-aware-llm-pipeline

Cost-Aware LLM Pipeline

When to Activate

Core Concepts

1. Model Routing by Task Complexity

More from affaan-m/everything-claude-code

security-review

golang-patterns

coding-standards

frontend-patterns

backend-patterns

golang-testing