cost-aware-llm-pipeline
Intelligent model routing, budget tracking, and retry logic to optimize LLM API costs without sacrificing quality.
- Routes requests to cheaper models (Haiku) for simple tasks and expensive models (Sonnet, Opus) only when complexity thresholds are met, reducing spend by 3–19x on routine work
- Tracks cumulative API costs with immutable dataclasses, enforces budget limits, and fails early to prevent overspend
- Implements narrow retry logic that retries only on transient errors (network, rate limit, server errors) and fails immediately on permanent failures (auth, validation)
- Caches long system prompts using Claude's prompt caching feature to reduce token usage and latency on repeated requests
Cost-Aware LLM Pipeline
Patterns for controlling LLM API costs while maintaining quality. Combines model routing, budget tracking, retry logic, and prompt caching into a composable pipeline.
When to Activate
- Building applications that call LLM APIs (Claude, GPT, etc.)
- Processing batches of items with varying complexity
- Need to stay within a budget for API spend
- Optimizing cost without sacrificing quality on complex tasks
Core Concepts
1. Model Routing by Task Complexity
Automatically select cheaper models for simple tasks, reserving expensive models for complex ones.
More from affaan-m/everything-claude-code
security-review
Use this skill when adding authentication, handling user input, working with secrets, creating API endpoints, or implementing payment/sensitive features. Provides comprehensive security checklist and patterns.
8.1Kgolang-patterns
Idiomatic Go patterns, best practices, and conventions for building robust, efficient, and maintainable Go applications.
7.6Kcoding-standards
Baseline cross-project coding conventions for naming, readability, immutability, and code-quality review. Use detailed frontend or backend skills for framework-specific patterns.
6.8Kfrontend-patterns
Frontend development patterns for React, Next.js, state management, performance optimization, and UI best practices.
6.8Kbackend-patterns
Backend architecture patterns, API design, database optimization, and server-side best practices for Node.js, Express, and Next.js API routes.
6.7Kgolang-testing
Go testing patterns including table-driven tests, subtests, benchmarks, fuzzing, and test coverage. Follows TDD methodology with idiomatic Go practices.
6.3K