High Concurrency & Scalability

When to Use

Choose or refactor concurrency models—threads, async/await, actors, coroutines—for target throughput and latency
Reduce lock contention and design low-contention, lock-free, or partitioned data paths
Size connection pools, file descriptors, thread pools, and memory limits per dependency
Design caching layers, TTL strategy, and stampede / thundering-herd mitigation
Plan horizontal scaling, load balancing, session affinity, and stateless vs sticky tradeoffs
Apply backpressure, bounded queues, rate limiting, and bulkheads under overload
Scale the data layer—read replicas, routing, sharding concepts, pool tuning, hot keys
Profile bottlenecks, model capacity, and tie scale triggers to SLOs and error budgets
Define autoscaling signals, warm pools, and cold-start vs cost tradeoffs
Architect multi-region read paths and CDN/edge caching at a design level