qdrant-scaling-qps

Installation

SKILL.md

Scaling for Query Throughput (QPS)

Throughput scaling means handling more parallel queries per second. This is different from latency - throughput and latency are opposite tuning directions and cannot be optimized simultaneously on the same node.

High throughput favors fewer, larger segments so each query touches less overhead.

Performance Tuning for Higher RPS

Use fewer, larger segments (default_segment_number: 2) Maximizing throughput
Enable quantization with always_ram=true to reduce disk IO Quantization
Use batch search API to amortize overhead Batch search

Minimize impact of Update Workloads

Configure update throughput control (v1.17+) to prevent unoptimized searches degrading reads Low latency search
Set optimizer_cpu_budget to limit indexing CPUs (e.g. 2 on an 8-CPU node reserves 6 for queries)
Configure delayed read fan-out (v1.17+) for tail latency Delayed fan-outs

Installs

6

Repository

GitHub Stars

181

First Seen

Apr 12, 2026

Security Audits

Gen Agent Trust HubPass

qdrant-scaling-qps — qdrant/skills