mistral-performance-tuning
Installation
SKILL.md
Mistral AI Performance Tuning
Overview
Optimize Mistral AI API response times and throughput. Key levers: model selection (Mistral Small ~200ms TTFT vs Large ~500ms), prompt length (fewer tokens = faster), streaming (perceived speed), caching (zero-latency repeats), and concurrent request management.
Prerequisites
- Mistral API integration in production
- Understanding of RPM/TPM limits for your tier
- Application architecture supporting streaming