cost-aware-llm-pipeline
Intelligent model routing, budget tracking, and retry logic to optimize LLM API costs without sacrificing quality.
- Routes requests to cheaper models (Haiku) for simple tasks and expensive models (Sonnet, Opus) only when complexity thresholds are met, reducing spend by 3–19x on routine work
- Tracks cumulative API costs with immutable dataclasses, enforces budget limits, and fails early to prevent overspend
- Implements narrow retry logic that retries only on transient errors (network, rate limit, server errors) and fails immediately on permanent failures (auth, validation)
- Caches long system prompts using Claude's prompt caching feature to reduce token usage and latency on repeated requests
Cost-Aware LLM Pipeline
Patterns for controlling LLM API costs while maintaining quality. Combines model routing, budget tracking, retry logic, and prompt caching into a composable pipeline.
When to Activate
- Building applications that call LLM APIs (Claude, GPT, etc.)
- Processing batches of items with varying complexity
- Need to stay within a budget for API spend
- Optimizing cost without sacrificing quality on complex tasks
Core Concepts
1. Model Routing by Task Complexity
Automatically select cheaper models for simple tasks, reserving expensive models for complex ones.
MODEL_SONNET = "claude-sonnet-4-6"
More from affaan-m/everything-claude-code
security-review
Use this skill when adding authentication, handling user input, working with secrets, creating API endpoints, or implementing payment/sensitive features. Provides comprehensive security checklist and patterns.
7.9Kgolang-patterns
Idiomatic Go patterns, best practices, and conventions for building robust, efficient, and maintainable Go applications.
7.4Kcoding-standards
Baseline cross-project coding conventions for naming, readability, immutability, and code-quality review. Use detailed frontend or backend skills for framework-specific patterns.
6.7Kfrontend-patterns
Frontend development patterns for React, Next.js, state management, performance optimization, and UI best practices.
6.6Kbackend-patterns
Backend architecture patterns, API design, database optimization, and server-side best practices for Node.js, Express, and Next.js API routes.
6.6Kgolang-testing
Go testing patterns including table-driven tests, subtests, benchmarks, fuzzing, and test coverage. Follows TDD methodology with idiomatic Go practices.
6.1K