Context
The platform serves multiple tenants from a single Postgres instance
with pgvector for retrieval. Tenant isolation is a hard requirement —
a query from acme must never return rows belonging to northwind,
even on programmer error. The classic options:
- Schema-per-tenant — one Postgres schema per tenant; queries route to the right schema via a session variable.
- Database-per-tenant — strong isolation, very expensive ops.
- Shared schema + Row-Level Security — single schema with a
tenant_idcolumn on every row and a PostgresPOLICYenforcingtenant_id = current_setting('app.current_tenant_id').
Decision
Adopt shared schema + Row-Level Security. Each table has a
tenant_id UUID NOT NULL; an RLS policy filters rows by the
session-scoped variable app.current_tenant_id; the
TenantContextMiddleware sets the variable on every request.
-- db/rls_setup.sql
ALTER TABLE documents ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON documents
FOR ALL
USING (tenant_id = current_setting('app.current_tenant_id')::uuid);
# middleware/tenant_context.py
@contextmanager
def tenant_db_context(conn, tenant_id: str):
with conn.cursor() as cur:
cur.execute("SET LOCAL app.current_tenant_id = %s", (tenant_id,))
yield
Tradeoffs we accept
| Lever | Schema-per-tenant | Shared + RLS (chosen) |
|---|---|---|
| Onboarding cost | New schema + migrations per tenant | New tenant_id, no DDL |
| Schema-evolution operations | N migrations per change | 1 migration per change |
| Cross-tenant queries (analytics) | Cross-schema gymnastics | Single WHERE clause |
| Per-row enforcement | None — a missing schema selector is silent | DB-level — a missing session var fails closed |
| Backup / restore granularity | Per-tenant trivial | Logical export with WHERE tenant_id = ? |
| Index efficiency | Smaller per-tenant indexes | Larger index, but tenant_id always in WHERE |
| Auditability | Implicit via schema selector | Explicit via session variable in every query plan |
We optimize for operational simplicity at <50 tenants and defense in depth (the RLS policy is the floor; application-level filters are the ceiling — both must be wrong to leak). At 500+ tenants schema-per- tenant becomes attractive; the migration plan is in §Reversal.
Consequences (positive)
- Onboarding a new tenant is one API call (
POST /tenants) — no DDL, no migration deploy, no per-tenant index creation. - Cross-tenant analytics (e.g. compliance reports, billing aggregations)
is a single SQL with
tenant_id IN (...)— no schema iteration loop. - Every query plan in
EXPLAINshows the RLS predicate. A leakage bug is grep-visible — seesecurity/leakage_checker.py. - Backups are a single
pg_dump. Per-tenant restore isWHERE tenant_id.
Consequences (negative)
- Index size. A 4M-chunk pgvector index per tenant becomes a 32M-row
index across 8 tenants. Mitigation:
tenant_idis the leading column on every relevant index; query plans usetenant_idfirst. - Noisy-neighbor query performance. A heavy tenant can warm the
shared buffer cache to its data. Mitigation:
compute/resource_fairness.pycaps concurrent requests per tenant. - Compliance friction with strict-isolation tenants. Some regulated tenants will not accept shared-schema RLS in a security review. Mitigation: ADR-005 documents the v1 → schema-per-tenant migration as pre-planned for the Pro+ tenant tier.
- Single point of compromise. A Postgres CVE that bypasses RLS
(rare, but: CVE-2023-5869 was a real one) leaks all tenants.
Mitigation:
security/leakage_checker.py+ the application-levelWHERE tenant_id = ?belt-and-suspenders pattern.
Reversal plan
Migration to schema-per-tenant for any individual tenant is documented:
- Create new schema
tenant_{uuid}with the canonical DDL. - Replicate rows:
INSERT INTO tenant_{uuid}.documents SELECT * FROM public.documents WHERE tenant_id = '{uuid}'. - Switch
TenantContextMiddlewareto setsearch_pathinstead ofapp.current_tenant_idfor that tenant. - Drop rows from
public.*after a soak period.
Estimated effort: 3 engineer-days per tenant for a verified migration. Per-tenant — does not require a global cutover. Reversible.
References
apps/web/public/downloads/enterprise-ai-platform-starter.zip!/db/rls_setup.pyapps/web/public/downloads/enterprise-ai-platform-starter.zip!/middleware/tenant_context.pyapps/web/public/downloads/enterprise-ai-platform-starter.zip!/security/leakage_checker.py- ADR-005 (deprecated single shared
documentstable — receipt of the v1 migration)