ADR-005: Multi-tenant retrieval via row-filter on shared index (DEPRECATED) | Full-Stack AI Platform

Context

Original v0 design (ADR-005, originally accepted) had one shared chunks table with a tenant_id column and a WHERE tenant_id = ? filter on every retrieval query. This was the simplest possible design and shipped on day 4 of M02 build-out.

It worked fine for ~2 weeks. Then we hit two production-shaped problems that the design couldn't survive:

Cross-tenant chunk leak via the cross-encoder reranker. The reranker took the top-50 candidates from semantic + BM25, then reranked. Under a specific failure mode — when the WHERE tenant_id = ? clause was missing from the keyword search path because of a copy-paste bug — the reranker would silently fold in chunks from other tenants because they were the most relevant. The user got a confidently-wrong answer that referenced another tenant's documents. Caught by an internal compliance reviewer; logged as runbooks/incident-2026-04-21-cross-tenant-leak.md.
Index-level performance regression. At 50k chunks across 3 tenants, the HNSW index started returning candidates ranked by global similarity, then filtering by tenant_id post-retrieval. The ANN traversal didn't know about tenants. Result: for a query that should return 5 relevant chunks for tenant A, the index would return 50 chunks (top globally) and then filter; if tenant A only had 2 chunks in that top-50, the user got an under-recall problem nobody could explain from the SQL side.

The first incident was the gating one. The second was the performance one. Together they killed the design.

What was originally decided

# DEPRECATED — see "What we got wrong" below
class SharedTenantRetriever:
    """One vector index. Filter by tenant_id."""

    async def retrieve(self, query: str, tenant_id: str, k: int = 5) -> list[Chunk]:
        embedding = await self.encoder.encode(query)
        return await self.db.fetch("""
            SELECT id, content, embedding <=> $1 AS distance
            FROM chunks
            WHERE tenant_id = $2     -- the entire tenant boundary, hopefully
            ORDER BY distance
            LIMIT $3
        """, embedding, tenant_id, k)

# Hybrid path had the same WHERE clause — until someone forgot it on the BM25 path.

What we reversed to

Per-tenant index isolation. Each tenant gets their own logical index (table) with no cross-tenant data path:

# src/retrieval/tenant_aware.py — current
class TenantAwareRetriever:
    """One index per tenant. No WHERE clause needed; the table IS the boundary."""

    def __init__(self, tenant_registry: TenantRegistry):
        self._registry = tenant_registry

    async def retrieve(self, query: str, tenant_id: str, k: int = 5) -> list[Chunk]:
        config = self._registry.get(tenant_id)
        embedding = await self.encoder.encode(query)
        # No tenant_id WHERE clause — the table itself is tenant-scoped
        return await self.db.fetch(f"""
            SELECT id, content, embedding <=> $1 AS distance
            FROM {config.chunks_table}    -- e.g. chunks_acme, chunks_globex
            ORDER BY distance
            LIMIT $2
        """, embedding, k)

Per-tenant tables are created on tenant onboarding via a migration template. M04's TenantConfig carries chunks_table (and cache_prefix, index_name) so callers never construct table names from raw user input.

Why reversed

2026-04-21 incident: Cross-tenant chunk leak through the reranker because the BM25 path was missing the WHERE tenant_id = ? clause. The compliance reviewer caught it during an unrelated audit. Two-week dwell time.
2026-04-22 architecture review: Engineering manager asked, "what is the actual tenant boundary here?" The answer "the WHERE clause on every query" was untenable. There were 6 retrieval entry points. One of them was already missing the clause.

The first incident was the gating one. The second was the credibility one — capability-style boundaries enforced by every-developer-remembering aren't real boundaries.

What we got wrong (and what we'd do again)

Got wrong: assumed multi-tenancy was a query concern when it's actually a schema concern. A WHERE tenant_id = ? clause is a runtime check that a developer has to remember to apply on every query path. Per-tenant tables make the database structurally enforce the boundary — there's no path through the schema that crosses tenants.

Would do again: the chunks table shape (id, content, embedding, source, etc.). Only the multi-tenant topology changed.

Reversal cost

Migration template (migrations/per_tenant_chunks.sql.j2): 1 day
TenantRegistry + TenantAwareRetriever: 1 day
Backfill script (split shared chunks into 3 per-tenant tables): 0.5 day
Test rewrite + per-tenant test fixtures: 1 day
Onboarding flow update (create per-tenant table on tenant create): 1 day
Total: ~4.5 engineer-days

Lessons

"Tenant boundary at the query layer" is not a real boundary. Every code path needs to remember it; one path forgetting = one incident.
Schema is a feature, not a polish item. The reversal cost ~4.5 engineer-days; the leak cost compliance review effort + a customer-facing apology + 2 weeks of dwell time we'll never know the cost of. Schema-level isolation is cheaper than incident response.
HNSW does not understand tenants. Vector indexes optimize global similarity; tenant filtering applied post-retrieval gives you under-recall on small tenants. This is a class-of-bug, not a v0 oversight.
Document the deprecation. The next engineer who proposes "let's just put tenant_id on chunks and filter at query time" needs to find this ADR before they re-introduce the bug.

References

src/retrieval/tenant_aware.py — current per-tenant retriever
src/serving/tenant_isolation.py — TenantConfig + TenantRegistry
migrations/per_tenant_chunks.sql.j2 — table-creation template
runbooks/incident-2026-04-21-cross-tenant-leak.md — the incident
tests/test_tenant_isolation.py — 5-pass tenant isolation gate
ADR-002 (pgvector retrieval shape stays the same; only the multi-tenant topology changed)
ADR-004 (failure cascade respects per-tenant boundaries)