groq-cost-tuning
Installation
SKILL.md
Groq Cost Tuning
Overview
Optimize Groq inference costs by selecting the right model for each use case and managing token volume. Groq's pricing is extremely competitive (Llama 3.1 8B at ~$0.05/M tokens, Llama 3.3 70B at ~$0.59/M tokens, Mixtral at ~$0.24/M tokens), but high throughput (500+ tokens/sec) makes it easy to burn through large volumes quickly.
Prerequisites
- Groq Cloud account with billing dashboard access
- Understanding of which use cases need which model quality
- Application-level request routing capability
Instructions
Step 1: Implement Smart Model Routing
// Route requests to cheapest model that meets quality requirements
const MODEL_ROUTING: Record<string, { model: string; costPer1MTokens: number }> = {
'classification': { model: 'llama-3.1-8b-instant', costPer1MTokens: 0.05 },
'summarization': { model: 'llama-3.1-8b-instant', costPer1MTokens: 0.05 },
'code-review': { model: 'llama-3.3-70b-versatile', costPer1MTokens: 0.59 },
'creative-writing':{ model: 'llama-3.3-70b-versatile', costPer1MTokens: 0.59 },
Related skills