1.17x
Single-query ratio
140us vs 164us
1.46x
Pool ratio
16.2ms vs 23.6ms
4.00x
HTTP/2 batch ratio
4.8ms vs 19.0ms
Single-Query Search
1,000 sequential searches on localhost.
| Driver | Latency/query | Throughput | Relative to official client |
|---|---|---|---|
| qail-qdrant gRPC | 140.3us | 7,126 ops/s | 1.17x |
| Official client | 164.0us | 6,096 ops/s | baseline |
Implementation notes
- Buffer pooling uses
.split()rather than.clone()on the hot path. - The transport path talks to h2 directly rather than routing through a heavier wrapper.
- Protobuf tags are pre-computed before the request loop.
- Vector copies were reduced to a single
memcpyfor the 1536-float case.
HTTP/2 Batch Search
50 queries sent concurrently over a single connection.
| Approach | Total time | Per query | Relative to sequential |
|---|---|---|---|
| HTTP/2 pipelined | 4.8ms | 95us | 4.00x |
| Sequential | 19.0ms | 380us | baseline |
Interpretation
The 50-request HTTP/2 batch reduced per-query latency from 380us to 95us in this harness. Treat the number as a transport-shape result tied to this workload rather than a general claim about every vector search path.
Reproduce Results
git clone https://github.com/qail-io/qail.git
cd qail/qdrant
docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant
python3 examples/seed_qdrant.py
cargo run --example fair_benchmark --release
cargo run --example batch_benchmark --release