Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Performance & Observability

The gateway is built for production throughput with prepared statement caching, query caching, rate limiting, and full observability.


Query Cache

LRU cache with configurable TTL and automatic table-level invalidation.

[gateway.cache]
enabled = true
max_entries = 1000
ttl_secs = 60

Behavior

  • Cache key: Normalized query string + auth context hash
  • Invalidation: Automatic on any mutation (INSERT, UPDATE, DELETE) to the same table
  • Eviction: LRU when capacity is reached
  • TTL: Configurable per-entry time-to-live
  • Correctness: Concurrent reads during a mutation never serve stale data — invalidation is atomic

Cache Headers

Responses include cache status headers:

X-Cache: HIT                    # Served from cache
X-Cache: MISS                   # Fresh query executed

Prepared Statement Caching

The gateway caches prepared statements per query shape, eliminating repeated parse overhead. This is separate from the query cache — it caches the PostgreSQL server-side prepared statement, not the result.


Rate Limiting

Built-in token-bucket rate limiter keyed by client IP:

[gateway]
rate_limit_rate = 100.0         # Tokens refilled per second
rate_limit_burst = 200          # Maximum burst capacity

When exceeded, the gateway returns:

HTTP/1.1 429 Too Many Requests
Retry-After: 1
{
  "error": {
    "code": "RATE_LIMITED",
    "message": "Too many requests",
    "status": 429
  }
}

Request Timeouts

All queries have a configurable timeout (default: 30 seconds, set via statement_timeout_ms in qail.toml). Long-running queries are cancelled and return:

{
  "error": {
    "code": "TIMEOUT",
    "message": "Request timed out",
    "status": 408
  }
}

Prometheus Metrics

GET /metrics

Exposes request counts, latencies (p50/p95/p99), error rates, cache hit ratios, and connection pool stats in Prometheus format.


Request Tracing

Every response includes tracing headers:

X-Request-Id: req-uuid-...       # Unique request identifier
X-Response-Time: 12ms            # Duration

Health Check

GET /health

Returns 200 OK when the gateway and database connection are healthy.


Schema Introspection

GET /api/_schema         # Full schema: tables, columns, types, FKs
GET /api/_openapi        # Auto-generated OpenAPI 3.0 spec

The OpenAPI spec is generated from the live database schema — always up to date.


EXPLAIN Cost Guard

The gateway runs EXPLAIN on read queries before execution to reject expensive queries early.

Configure thresholds in qail.toml:

[gateway]
explain_max_cost = 100000.0    # Reject if estimated cost exceeds this
explain_max_rows = 1000000     # Reject if estimated rows exceed this

Per-role overrides allow analytics roles to run heavier queries:

[gateway.role_overrides.reporting]
explain_max_cost = 500000.0

Rejected queries return a structured QUERY_TOO_EXPENSIVE error:

{
  "code": "QUERY_TOO_EXPENSIVE",
  "message": "Query rejected: estimated cost 150000 exceeds limit 100000...",
  "details": "{\"estimated_cost\":150000,\"cost_limit\":100000,\"estimated_rows\":2000000,\"row_limit\":1000000,\"suggestions\":[\"Add WHERE clauses\",\"Reduce ?expand depth\"]}"
}

The details field is machine-readable JSON — client SDKs can parse it to display actionable suggestions.

EXPLAIN Endpoint

Inspect query plans manually:

GET /api/orders/_explain?status=paid&expand=users

Returns the PostgreSQL EXPLAIN ANALYZE output for the generated query.


Error Responses

All errors follow a consistent JSON structure:

{
  "error": {
    "code": "QUERY_ERROR",
    "message": "column \"foo\" does not exist",
    "status": 400,
    "request_id": "req-uuid-..."
  }
}

Error Codes

CodeHTTPDescription
RATE_LIMITED429Rate limit exceeded
CONCURRENCY_LIMIT429Tenant concurrency limit reached
QUERY_TOO_EXPENSIVE422EXPLAIN cost/row estimate exceeded threshold
TIMEOUT408Query exceeded statement timeout
PARSE_ERROR400Malformed query parameters
QUERY_ERROR400Database query failed
UNAUTHORIZED401Authentication failed
FORBIDDEN403Policy denied access
NOT_FOUND404Resource not found
INTERNAL500Unexpected server error

Benchmark: Gateway vs GraphQL

The gateway’s ?expand= does server-side JOINs — same approach as the Qail AST driver but over HTTP:

ApproachAvg LatencyDB Queriesvs Qail
Qail AST (binary)449µs1baseline
Gateway (?expand=)635µs11.4×
GraphQL + DataLoader1.52ms33.4×
GraphQL naive (N+1)18.2ms15140×

The 1.4× gap is pure JSON serialization overhead. On the wire, the gateway executes the exact same single-query JOIN as the Qail driver.

Unlike GraphQL, the gateway makes N+1 structurally impossible. ?expand= always resolves to a server-side JOIN — there’s no resolver pattern to misconfigure.