# ADR-001 — Use pgvector + HNSW over Qdrant / Pinecone / Weaviate

- **Status:** Accepted
- **Date:** 2026-05-09
- **Module:** 01 — Embed, Store & Search
- **Stakeholders:** retrieval engineer, data infra lead, security reviewer

## Context

The platform serves hybrid (semantic + lexical + reranked) search over
1M-capable document corpora. The vector index is the hot path on every
query — it has to hit the <100 ms P99 budget while staying within reach
of a single-team operations footprint. The classic options:

1. **Pinecone** — managed, opinionated, fastest path to a hosted index.
2. **Qdrant** — open-source-with-managed, broader query feature set
   (payload filters as first-class), strong gRPC story.
3. **Weaviate** — open-source, multi-modal, schema-first.
4. **pgvector** — Postgres extension. SQL is the API; the vector index
   is just another table with an HNSW or IVFFlat index.

We are building a reference platform for tutorial purposes — the choice
has to be reproducible by a learner on a laptop in <15 minutes and
survive a real production deploy.

## Decision

Adopt **pgvector + HNSW**, with Qdrant kept as a documented alternative
in `docker-compose.qdrant.yml`.

```sql
-- seed/01_create_tables.sql
CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE documents (
  id UUID PRIMARY KEY,
  content TEXT NOT NULL,
  embedding vector(1536) NOT NULL,
  metadata JSONB,
  ts_content tsvector GENERATED ALWAYS AS (to_tsvector('english', content)) STORED,
  created_at TIMESTAMPTZ DEFAULT now()
);

-- migrations/tune_hnsw.sql
CREATE INDEX documents_embedding_hnsw_idx
  ON documents USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

SET hnsw.ef_search = 40;  -- query-time accuracy/latency knob
```

```python
# Module 02's recall benchmark sweeps ef_search 10–200 vs p50 latency
# Result: ef_search=40 hits ~0.81 recall@10 with hybrid+rerank
```

## Tradeoffs we accept

| Lever                           | Pinecone            | Qdrant                | Weaviate             | pgvector (chosen)                         |
| ------------------------------- | ------------------- | --------------------- | -------------------- | ----------------------------------------- |
| Day-1 setup                     | Vendor account      | Self-host or managed  | Self-host or managed | `CREATE EXTENSION vector` — 30 seconds    |
| Single-store JOIN with metadata | Build it            | Payload filter (good) | Limited              | Native SQL `WHERE` + JSONB                |
| Tutorial reproducibility        | Cloud account       | Docker container      | Docker container     | Same container as the rest of the app     |
| P99 latency at 1M vectors       | <50 ms              | <50 ms                | <100 ms              | <100 ms (with HNSW + ef_search tuned)     |
| Full-text search (BM25)         | Build it / external | Limited               | Built-in             | Native (`tsvector` + GIN — see ADR-002)   |
| Operational footprint           | Zero (managed)      | One container         | One container        | Zero new infra (already running Postgres) |
| Vendor lock-in                  | High                | None                  | None                 | None                                      |
| Cost at <10M vectors            | $$                  | $                     | $                    | Free (RDS db.t4g.medium baseline)         |

We optimize for **single-store hybrid retrieval** + **operational
parsimony**. The hybrid + rerank pipeline (ADR-002 + ADR-003) reads
both the vector index AND the BM25 GIN index in the same SQL query
plan — that's only possible when both are in Postgres. A separate
vector store would force a fan-out + RRF in the application layer
with cross-store consistency questions.

Pinecone is the right answer if the team has zero Postgres operations
expertise. Qdrant is the right answer if vector-native payload filters
are a frequent workload. Both are documented as exit ramps.

## Consequences (positive)

- Single SQL query can run vector + BM25 + JSONB metadata filter and
  return ranked results — no fan-out, no cross-store joins.
- HNSW + GIN indexes live next to the source-of-truth row, so
  cascade-delete (GDPR — see Module 05) is one transaction.
- Backups are `pg_dump`. Per-tenant restore is `WHERE tenant_id`.
- Local development is one container (the same `pgvector:pg16` image
  as production).
- Bench harness (`scripts/benchmark_hnsw.py`) sweeps `ef_search` 10–200
  and produces a recall-vs-latency curve — Module 02's tuning artifact.

## Consequences (negative)

- **No native gRPC streaming.** Postgres protocol is the only on-ramp;
  high-fanout services that need binary streaming would prefer Qdrant.
- **Index build is single-threaded by default.** A 1M-vector HNSW
  build takes ~10–15 minutes on `db.t4g.medium`. Mitigation: batch
  during off-hours; ADR-004 makes incremental updates cheap.
- **No managed UI for browsing the vector store.** A learner queries
  via SQL or `psql` directly. This is fine for the tutorial but in
  production teams typically front it with a dashboard.
- **Memory ceiling.** A 1M × 1536-dim HNSW index lives in
  `shared_buffers`. RDS instance sizing matters at scale.

## Reversal plan

The retrieval interface is `api/main.py`'s `/search` + `/search/hybrid`
endpoints, both of which call `psycopg2` against the local index.
Replacement is bounded:

1. Add `api/qdrant_client.py` (or `pinecone_client.py`) with the same
   `search(query_vector, k, filter)` signature.
2. Switch the search endpoint behind a feature flag.
3. Re-run Module 02's eval harness — recall@10 and MRR assertions in
   `scripts/eval.py` will fail loud if the swap regresses quality.
4. Cut over after a 1-week soak with shadow traffic.

The starter kit ships `docker-compose.qdrant.yml` for exactly this —
Qdrant is one `docker compose up` away if you outgrow pgvector.

Estimated effort: **1-2 engineer-weeks** for a tested swap. Reversible.

## References

- `seed/01_create_tables.sql` (schema with HNSW + GIN)
- `migrations/tune_hnsw.sql` (M / ef_construction / ef_search tuning)
- `scripts/benchmark_hnsw.py` (recall vs latency sweep)
- `docker-compose.qdrant.yml` (alternative vector store)
- ADR-002 (BM25 + RRF — depends on single-store assumption)
- ADR-005 (Deprecated single-embedding-version — orthogonal to vector store choice)
