Context (when this was Accepted)
In Module 01 we shipped a single documents table with no tenant_id
column. The MVP was for one customer; the multi-tenant story was a
"phase 2 problem". The schema looked like:
-- v1 schema (Module 01)
CREATE TABLE documents (
id UUID PRIMARY KEY,
embedding vector(1536) NOT NULL,
content TEXT NOT NULL,
permission_level TEXT NOT NULL, -- PUBLIC | INTERNAL | CONFIDENTIAL | RESTRICTED
created_at TIMESTAMPTZ DEFAULT now()
);
Permission was enforced application-side via the RBACRetriever
checking the requesting user's role against permission_level. The
ADR documents the v1 decision so future readers can see why we
shipped this and why we reversed it.
What changed
In Module 02 a security review surfaced two failure modes:
- Cross-tenant join risk. As lineage and audit tables added joins
on
document_id, a missingWHEREclause in any one query could return rows from another customer. - Index rot at scale. Once the table held 4M chunks per tenant
across 8 tenants, the single
permission_levelindex became less selective than a(tenant_id, permission_level)composite would be.
The fix landed in Module 04 — added tenant_id UUID NOT NULL,
backfilled, recreated indexes with tenant_id as the leading column,
enabled Row-Level Security with the tenant_isolation policy, and
threaded the app.current_tenant_id session variable through
TenantContextMiddleware. The reversal is documented in ADR-002.
Why we left this ADR Deprecated rather than deleting it
Future maintainers will look at the schema in db/rls_setup.py and
see RLS as an obvious requirement. The interesting question — why
didn't they ship RLS in v1 — is answered by this ADR, not by the
production schema. Deleting this ADR would make the v1 → v4 migration
an unexplained git-archaeology exercise.
The MADR convention treats Deprecated ADRs as part of the permanent record. We follow that convention.
What we got wrong (and what we'd do again)
Got wrong:
- We deferred multi-tenancy as "phase 2" but shipped an interface
that committed to single-tenant assumptions. A
tenant_id NULLABLEcolumn from day 1 would have made the migration a backfill exercise instead of a schema-change exercise. - We treated
permission_levelas a tenant-isolation primitive. It isn't — it's a within-tenant access control primitive. Confusing the two cost us a security review.
Got right:
- The
RBACRetrieverinterface (a class withretrieve(query, user)) was the right abstraction. The migration to multi-tenancy required no caller-side changes — the retriever started accepting atenant_idfrom the request context and the rest was internal. - Shipping the audit log on day 1. The migration's verification step was "diff the audit log row counts before/after" — possible only because the audit log existed.
References
- ADR-002 (RLS multi-tenant — the replacement)
apps/web/public/downloads/enterprise-ai-platform-starter.zip!/sql/init.sql(current schema)apps/web/public/downloads/enterprise-ai-platform-starter.zip!/db/rls_setup.py(RLS policy code)apps/web/public/downloads/enterprise-ai-platform-starter.zip!/rag/rbac_retriever.py(the retriever — same shape across the migration)