Skip to content
Back to Enterprise AI Platform

Multi-tenant isolation via Postgres RLS, not schema-per-tenant

✓ AcceptedEnterprise AI Platform04 — Multi-Tenancy at Scale
By AI-DE Engineering Team·Stakeholders: platform engineer, security engineer, DBA

Context

The platform serves multiple tenants from a single Postgres instance with pgvector for retrieval. Tenant isolation is a hard requirement — a query from acme must never return rows belonging to northwind, even on programmer error. The classic options:

  1. Schema-per-tenant — one Postgres schema per tenant; queries route to the right schema via a session variable.
  2. Database-per-tenant — strong isolation, very expensive ops.
  3. Shared schema + Row-Level Security — single schema with a tenant_id column on every row and a Postgres POLICY enforcing tenant_id = current_setting('app.current_tenant_id').

Decision

Adopt shared schema + Row-Level Security. Each table has a tenant_id UUID NOT NULL; an RLS policy filters rows by the session-scoped variable app.current_tenant_id; the TenantContextMiddleware sets the variable on every request.

-- db/rls_setup.sql
ALTER TABLE documents ENABLE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation ON documents
  FOR ALL
  USING (tenant_id = current_setting('app.current_tenant_id')::uuid);
# middleware/tenant_context.py
@contextmanager
def tenant_db_context(conn, tenant_id: str):
    with conn.cursor() as cur:
        cur.execute("SET LOCAL app.current_tenant_id = %s", (tenant_id,))
        yield

Tradeoffs we accept

LeverSchema-per-tenantShared + RLS (chosen)
Onboarding costNew schema + migrations per tenantNew tenant_id, no DDL
Schema-evolution operationsN migrations per change1 migration per change
Cross-tenant queries (analytics)Cross-schema gymnasticsSingle WHERE clause
Per-row enforcementNone — a missing schema selector is silentDB-level — a missing session var fails closed
Backup / restore granularityPer-tenant trivialLogical export with WHERE tenant_id = ?
Index efficiencySmaller per-tenant indexesLarger index, but tenant_id always in WHERE
AuditabilityImplicit via schema selectorExplicit via session variable in every query plan

We optimize for operational simplicity at <50 tenants and defense in depth (the RLS policy is the floor; application-level filters are the ceiling — both must be wrong to leak). At 500+ tenants schema-per- tenant becomes attractive; the migration plan is in §Reversal.

Consequences (positive)

  • Onboarding a new tenant is one API call (POST /tenants) — no DDL, no migration deploy, no per-tenant index creation.
  • Cross-tenant analytics (e.g. compliance reports, billing aggregations) is a single SQL with tenant_id IN (...) — no schema iteration loop.
  • Every query plan in EXPLAIN shows the RLS predicate. A leakage bug is grep-visible — see security/leakage_checker.py.
  • Backups are a single pg_dump. Per-tenant restore is WHERE tenant_id.

Consequences (negative)

  • Index size. A 4M-chunk pgvector index per tenant becomes a 32M-row index across 8 tenants. Mitigation: tenant_id is the leading column on every relevant index; query plans use tenant_id first.
  • Noisy-neighbor query performance. A heavy tenant can warm the shared buffer cache to its data. Mitigation: compute/resource_fairness.py caps concurrent requests per tenant.
  • Compliance friction with strict-isolation tenants. Some regulated tenants will not accept shared-schema RLS in a security review. Mitigation: ADR-005 documents the v1 → schema-per-tenant migration as pre-planned for the Pro+ tenant tier.
  • Single point of compromise. A Postgres CVE that bypasses RLS (rare, but: CVE-2023-5869 was a real one) leaks all tenants. Mitigation: security/leakage_checker.py + the application-level WHERE tenant_id = ? belt-and-suspenders pattern.

Reversal plan

Migration to schema-per-tenant for any individual tenant is documented:

  1. Create new schema tenant_{uuid} with the canonical DDL.
  2. Replicate rows: INSERT INTO tenant_{uuid}.documents SELECT * FROM public.documents WHERE tenant_id = '{uuid}'.
  3. Switch TenantContextMiddleware to set search_path instead of app.current_tenant_id for that tenant.
  4. Drop rows from public.* after a soak period.

Estimated effort: 3 engineer-days per tenant for a verified migration. Per-tenant — does not require a global cutover. Reversible.

References

  • apps/web/public/downloads/enterprise-ai-platform-starter.zip!/db/rls_setup.py
  • apps/web/public/downloads/enterprise-ai-platform-starter.zip!/middleware/tenant_context.py
  • apps/web/public/downloads/enterprise-ai-platform-starter.zip!/security/leakage_checker.py
  • ADR-005 (deprecated single shared documents table — receipt of the v1 migration)
Built into the project

This decision shipped as part of Enterprise AI Platform — see the full architecture, starter kit, and 4 more ADRs.

Open project →
Press Cmd+K to open