qdrant-horizontal-scaling
What to Do When Qdrant Needs More Capacity
Vertical first: simpler operations, no network overhead, good up to ~100M vectors per node depending on dimensions and quantization. Horizontal when: data exceeds single node capacity, need fault tolerance, need to isolate tenants, or IOPS-bound (more nodes = more independent IOPS).
Most basic distributed configuration
- 3 nodes, 3 shards with
replication_factor: 2for zero-downtime scaling
Minimum of 3 nodes is important for consensus and fault tolerance. With 3 nodes, you can lose 1 node without downtime. With 2 nodes, losing 1 node causes downtime for collection operations.
Replication factor of 2 means each shard has 1 replica, so you have 2 copies of data. This allows for zero-downtime scaling and maintenance. With replication_factor: 1, zero-downtime is not guaranteed even for point-level operations, and cluster maintenance requires downtime.
Choosing number of shards
Shards are the unit of data distribution. More shards allows more nodes and better distribution, but adds overhead. Fewer shards reduces overhead but limits horizontal scaling.
For cluster of 3-6 nodes the recommended shard count is 6-12. This allows for 2-4 shards per node, which balances distribution and overhead.