ADR-002: Multi-tenant isolation via Postgres RLS, not schema-per-tenant | Enterprise AI Platform

Context

The platform serves multiple tenants from a single Postgres instance with pgvector for retrieval. Tenant isolation is a hard requirement — a query from acme must never return rows belonging to northwind, even on programmer error. The classic options:

Schema-per-tenant — one Postgres schema per tenant; queries route to the right schema via a session variable.
Database-per-tenant — strong isolation, very expensive ops.
Shared schema + Row-Level Security — single schema with a tenant_id column on every row and a Postgres POLICY enforcing tenant_id = current_setting('app.current_tenant_id').

Decision

Adopt shared schema + Row-Level Security. Each table has a tenant_id UUID NOT NULL; an RLS policy filters rows by the session-scoped variable app.current_tenant_id; the TenantContextMiddleware sets the variable on every request.

-- db/rls_setup.sql
ALTER TABLE documents ENABLE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation ON documents
  FOR ALL
  USING (tenant_id = current_setting('app.current_tenant_id')::uuid);

# middleware/tenant_context.py
@contextmanager
def tenant_db_context(conn, tenant_id: str):
    with conn.cursor() as cur:
        cur.execute("SET LOCAL app.current_tenant_id = %s", (tenant_id,))
        yield

Tradeoffs we accept

Lever	Schema-per-tenant	Shared + RLS (chosen)
Onboarding cost	New schema + migrations per tenant	New `tenant_id`, no DDL
Schema-evolution operations	N migrations per change	1 migration per change
Cross-tenant queries (analytics)	Cross-schema gymnastics	Single `WHERE` clause
Per-row enforcement	None — a missing schema selector is silent	DB-level — a missing session var fails closed
Backup / restore granularity	Per-tenant trivial	Logical export with `WHERE tenant_id = ?`
Index efficiency	Smaller per-tenant indexes	Larger index, but `tenant_id` always in `WHERE`
Auditability	Implicit via schema selector	Explicit via session variable in every query plan

We optimize for operational simplicity at <50 tenants and defense in depth (the RLS policy is the floor; application-level filters are the ceiling — both must be wrong to leak). At 500+ tenants schema-per- tenant becomes attractive; the migration plan is in §Reversal.

Consequences (positive)

Onboarding a new tenant is one API call (POST /tenants) — no DDL, no migration deploy, no per-tenant index creation.
Cross-tenant analytics (e.g. compliance reports, billing aggregations) is a single SQL with tenant_id IN (...) — no schema iteration loop.
Every query plan in EXPLAIN shows the RLS predicate. A leakage bug is grep-visible — see security/leakage_checker.py.
Backups are a single pg_dump. Per-tenant restore is WHERE tenant_id.

Consequences (negative)

Index size. A 4M-chunk pgvector index per tenant becomes a 32M-row index across 8 tenants. Mitigation: tenant_id is the leading column on every relevant index; query plans use tenant_id first.
Noisy-neighbor query performance. A heavy tenant can warm the shared buffer cache to its data. Mitigation: compute/resource_fairness.py caps concurrent requests per tenant.
Compliance friction with strict-isolation tenants. Some regulated tenants will not accept shared-schema RLS in a security review. Mitigation: ADR-005 documents the v1 → schema-per-tenant migration as pre-planned for the Pro+ tenant tier.
Single point of compromise. A Postgres CVE that bypasses RLS (rare, but: CVE-2023-5869 was a real one) leaks all tenants. Mitigation: security/leakage_checker.py + the application-level WHERE tenant_id = ? belt-and-suspenders pattern.

Reversal plan

Migration to schema-per-tenant for any individual tenant is documented:

Create new schema tenant_{uuid} with the canonical DDL.
Replicate rows: INSERT INTO tenant_{uuid}.documents SELECT * FROM public.documents WHERE tenant_id = '{uuid}'.
Switch TenantContextMiddleware to set search_path instead of app.current_tenant_id for that tenant.
Drop rows from public.* after a soak period.

Estimated effort: 3 engineer-days per tenant for a verified migration. Per-tenant — does not require a global cutover. Reversible.

References

apps/web/public/downloads/enterprise-ai-platform-starter.zip!/db/rls_setup.py
apps/web/public/downloads/enterprise-ai-platform-starter.zip!/middleware/tenant_context.py
apps/web/public/downloads/enterprise-ai-platform-starter.zip!/security/leakage_checker.py
ADR-005 (deprecated single shared documents table — receipt of the v1 migration)