qdrant-minimize-latency

Installation
SKILL.md

Scaling for Query Latency

Latency of a single query is determined by the slowest component in the query execution path. It is sometimes correlated with throughput, but not always — throughput and latency are opposite tuning directions.

Low latency optimization is aimed at utilising maximum resource saturation for a single query, while throughput optimization is aimed at minimizing per-query resource usage to allow more parallel queries.

Performance Tuning for Lower Latency

  • Increase segment count to match CPU cores (default_segment_number: 16) Minimizing latency
  • Keep quantized vectors and HNSW in RAM (always_ram=true)
  • Reduce hnsw_ef at query time (trade recall for speed) Search params
  • Use local NVMe, avoid network-attached storage

Memory Pressure and Latency

RAM is the most critical resource for latency. If working set exceeds available RAM, OS cache eviction causes severe, sustained latency degradation.

Installs
6
Repository
qdrant/skills
GitHub Stars
181
First Seen
Apr 12, 2026
qdrant-minimize-latency — qdrant/skills