Context
When Module 02 first shipped, the persistence layer was synchronous
SQLAlchemy throughout — models.py declares declarative_base() and
sessionmaker() against a sync engine, and every database call in
tracker.py, aggregator.py, api.py, and the early budget prototypes
went through a Session object.
The reasoning at the time was sound:
- One driver, one ORM, one mental model. SQLAlchemy is the industry default; new contributors don't need to learn a second async stack.
- Cost-tracking writes are cheap. A single insert on
llm_requeststakes < 5 ms with the right indexes; blocking that briefly looked acceptable. - FastAPI's threadpool absorbs blocking I/O. Sync handlers under
FastAPI run in a worker thread, so a blocked
session.commit()doesn't freeze the event loop — it just consumes a thread slot.
This held through Modules 02–04. Routing, caching, and aggregation are either off-hot-path (aggregator) or have effective in-memory paths (cache hits don't touch Postgres). Sync was fine.
What changed
Module 05 introduced budget enforcement on the hot path. The shape of the work changed:
- Every request now does a budget check against Postgres before the model call. At 50k req/day that's ~50k extra queries/day, sustained.
- The check has to complete inside the platform's p95 latency budget (~380 ms total), so a 50–80 ms blocking budget read consumes 13–21% of the budget on its own.
- Blocked threads accumulate. With FastAPI's default
WORKERS=4 × THREADS=40pool and a 50 ms median budget read, the pool saturates at ~3,200 req/s steady-state. Burst traffic stalls the response queue.
We ran a 5-minute load test (tests/load/budget_burst.py) at 100 RPS
sustained. Sync result:
p50 latency: 142 ms
p95 latency: 890 ms <-- 134% over budget
p99 latency: 2.4 s <-- thread-pool saturation visible
That's the regression that killed this ADR.
What we got wrong (and what we'd do again)
Wrong: assuming "fine on Modules 02–04" extrapolates to "fine on Module 05." It didn't, because Module 05 changed the hot-path shape: cost-tracking writes (off-hot-path-ish, fire-and-forget) are not the same as budget reads (synchronously gating the model call).
Wrong: treating the threadpool as a free buffer. FastAPI's threadpool handles blocking I/O gracefully, not cheaply. Under burst load, blocked threads queue up on the event loop, which queues new connections, which shows up as p99 latency cliffs.
Right: keeping sync everywhere through M02–M04. The cost of mixed sync + async early would have been real (two patterns to teach, two error modes, two test fixtures). We got real production value from sync first, then paid the rewrite cost only when the workload demanded it.
Right: documenting this as a deprecated ADR rather than silently
rewriting. Every contributor who reads governance.py and wonders "why is
this asyncpg when the rest is SQLAlchemy?" gets the answer here.
How we reversed it
The reversal landed in two PRs over Module 05:
- PR #142 — Introduce
asyncpgfor governance.src/cost/governance.py,src/cost/budget.py, andsrc/cost/failures.pyswitch toasyncpg.Pool. The pool is sized atmin_size=10, max_size=30against the same Postgres instance the sync layer uses. - PR #148 — Cohabit cleanly. A new
db/__init__.pyexposes both the syncSessionLocal(for ORM models, aggregator, tracker) and the asyncpool(for governance). Both use the same connection string. The coexistence is documented indb/README.md.
Same load test after the reversal:
p50 latency: 108 ms
p95 latency: 340 ms <-- inside budget
p99 latency: 680 ms <-- no saturation cliff
The sync layer remains for ORM models, aggregator, and ingestion. Only the hot-path budget reads moved to async. We did not "fix it everywhere" — that's a Phase-2 cleanup in a future module if and when the next bottleneck appears.
Why we reversed it (in one sentence)
Sync I/O on the hot path saturated the FastAPI threadpool under burst
load and pushed p95 latency 134% over budget; async budget reads through
asyncpg.Pool brought it back inside budget without changing the rest of
the persistence layer.
What this ADR replaces
- The original M02–M04 design assumed a single SQLAlchemy session for all paths. ADR-002 (the live three-tier budget design) supersedes that for the budget hot path.
- Cost-tracking writes (
tracker.py) are explicitly not migrated to async — they remain sync as ADR-004 documents.
References
src/cost/governance.py— async budget enforcementsrc/cost/budget.py— async three-tier checktracker.py+aggregator.py— kept sync (post-ADR-005, pre-ADR-006 if ever needed)tests/test_smoke.py::test_budget_under_burst— the regression test that pinned the saturation- ADR-002 (live design)
- ADR-004 (sync persistence pattern that survived the reversal)