Context
Original v0 design (ADR-005, originally accepted) had one shared chunks table with a tenant_id column and a WHERE tenant_id = ? filter on every retrieval query. This was the simplest possible design and shipped on day 4 of M02 build-out.
It worked fine for ~2 weeks. Then we hit two production-shaped problems that the design couldn't survive:
- Cross-tenant chunk leak via the cross-encoder reranker. The reranker took the top-50 candidates from semantic + BM25, then reranked. Under a specific failure mode — when the
WHERE tenant_id = ?clause was missing from the keyword search path because of a copy-paste bug — the reranker would silently fold in chunks from other tenants because they were the most relevant. The user got a confidently-wrong answer that referenced another tenant's documents. Caught by an internal compliance reviewer; logged asrunbooks/incident-2026-04-21-cross-tenant-leak.md. - Index-level performance regression. At 50k chunks across 3 tenants, the HNSW index started returning candidates ranked by global similarity, then filtering by tenant_id post-retrieval. The ANN traversal didn't know about tenants. Result: for a query that should return 5 relevant chunks for tenant A, the index would return 50 chunks (top globally) and then filter; if tenant A only had 2 chunks in that top-50, the user got an under-recall problem nobody could explain from the SQL side.
The first incident was the gating one. The second was the performance one. Together they killed the design.
What was originally decided
# DEPRECATED — see "What we got wrong" below
class SharedTenantRetriever:
"""One vector index. Filter by tenant_id."""
async def retrieve(self, query: str, tenant_id: str, k: int = 5) -> list[Chunk]:
embedding = await self.encoder.encode(query)
return await self.db.fetch("""
SELECT id, content, embedding <=> $1 AS distance
FROM chunks
WHERE tenant_id = $2 -- the entire tenant boundary, hopefully
ORDER BY distance
LIMIT $3
""", embedding, tenant_id, k)
# Hybrid path had the same WHERE clause — until someone forgot it on the BM25 path.
What we reversed to
Per-tenant index isolation. Each tenant gets their own logical index (table) with no cross-tenant data path:
# src/retrieval/tenant_aware.py — current
class TenantAwareRetriever:
"""One index per tenant. No WHERE clause needed; the table IS the boundary."""
def __init__(self, tenant_registry: TenantRegistry):
self._registry = tenant_registry
async def retrieve(self, query: str, tenant_id: str, k: int = 5) -> list[Chunk]:
config = self._registry.get(tenant_id)
embedding = await self.encoder.encode(query)
# No tenant_id WHERE clause — the table itself is tenant-scoped
return await self.db.fetch(f"""
SELECT id, content, embedding <=> $1 AS distance
FROM {config.chunks_table} -- e.g. chunks_acme, chunks_globex
ORDER BY distance
LIMIT $2
""", embedding, k)
Per-tenant tables are created on tenant onboarding via a migration template. M04's TenantConfig carries chunks_table (and cache_prefix, index_name) so callers never construct table names from raw user input.
Why reversed
- 2026-04-21 incident: Cross-tenant chunk leak through the reranker because the BM25 path was missing the
WHERE tenant_id = ?clause. The compliance reviewer caught it during an unrelated audit. Two-week dwell time. - 2026-04-22 architecture review: Engineering manager asked, "what is the actual tenant boundary here?" The answer "the WHERE clause on every query" was untenable. There were 6 retrieval entry points. One of them was already missing the clause.
The first incident was the gating one. The second was the credibility one — capability-style boundaries enforced by every-developer-remembering aren't real boundaries.
What we got wrong (and what we'd do again)
Got wrong: assumed multi-tenancy was a query concern when it's actually a schema concern. A WHERE tenant_id = ? clause is a runtime check that a developer has to remember to apply on every query path. Per-tenant tables make the database structurally enforce the boundary — there's no path through the schema that crosses tenants.
Would do again: the chunks table shape (id, content, embedding, source, etc.). Only the multi-tenant topology changed.
Reversal cost
- Migration template (
migrations/per_tenant_chunks.sql.j2): 1 day - TenantRegistry + TenantAwareRetriever: 1 day
- Backfill script (split shared
chunksinto 3 per-tenant tables): 0.5 day - Test rewrite + per-tenant test fixtures: 1 day
- Onboarding flow update (create per-tenant table on tenant create): 1 day
- Total: ~4.5 engineer-days
Lessons
- "Tenant boundary at the query layer" is not a real boundary. Every code path needs to remember it; one path forgetting = one incident.
- Schema is a feature, not a polish item. The reversal cost ~4.5 engineer-days; the leak cost compliance review effort + a customer-facing apology + 2 weeks of dwell time we'll never know the cost of. Schema-level isolation is cheaper than incident response.
- HNSW does not understand tenants. Vector indexes optimize global similarity; tenant filtering applied post-retrieval gives you under-recall on small tenants. This is a class-of-bug, not a v0 oversight.
- Document the deprecation. The next engineer who proposes "let's just put
tenant_idonchunksand filter at query time" needs to find this ADR before they re-introduce the bug.
References
src/retrieval/tenant_aware.py— current per-tenant retrieversrc/serving/tenant_isolation.py— TenantConfig + TenantRegistrymigrations/per_tenant_chunks.sql.j2— table-creation templaterunbooks/incident-2026-04-21-cross-tenant-leak.md— the incidenttests/test_tenant_isolation.py— 5-pass tenant isolation gate- ADR-002 (pgvector retrieval shape stays the same; only the multi-tenant topology changed)
- ADR-004 (failure cascade respects per-tenant boundaries)