Qdrant Driver Report (Jan 2026) | qail-qdrant and official client

1.17x

Single-query ratio

140us vs 164us

1.46x

Pool ratio

16.2ms vs 23.6ms

4.00x

HTTP/2 batch ratio

4.8ms vs 19.0ms

Single-Query Search

1,000 sequential searches on localhost.

Driver	Latency/query	Throughput	Relative to official client
qail-qdrant gRPC	140.3us	7,126 ops/s	1.17x
Official client	164.0us	6,096 ops/s	baseline

Throughput Comparison

Single-query search throughput in operations per second. Higher is better.

Official client qail-qdrant gRPC ↑ ops/s

Single-query

6.1K

7.1K

1.17×

Implementation notes

Buffer pooling uses .split() rather than .clone() on the hot path.
The transport path talks to h2 directly rather than routing through a heavier wrapper.
Protobuf tags are pre-computed before the request loop.
Vector copies were reduced to a single memcpy for the 1536-float case.

HTTP/2 Batch Search

50 queries sent concurrently over a single connection.

Approach	Total time	Per query	Relative to sequential
HTTP/2 pipelined	4.8ms	95us	4.00x
Sequential	19.0ms	380us	baseline

Batch Latency — Per Query

Per-query latency under 50 concurrent searches. Lower is better.

Sequential HTTP/2 pipelined ↓ μs

50 queries

380 μs

95.0 μs

4.00×

Interpretation

The 50-request HTTP/2 batch reduced per-query latency from 380us to 95us in this harness. Treat the number as a transport-shape result tied to this workload rather than a general claim about every vector search path.

Reproduce Results

git clone https://github.com/qail-io/qail.git
cd qail/qdrant

docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant
python3 examples/seed_qdrant.py
cargo run --example fair_benchmark --release
cargo run --example batch_benchmark --release