ADR-005: Sync SQLAlchemy across all hot paths (DEPRECATED) | AI Cost Optimization

Context

When Module 02 first shipped, the persistence layer was synchronous SQLAlchemy throughout — models.py declares declarative_base() and sessionmaker() against a sync engine, and every database call in tracker.py, aggregator.py, api.py, and the early budget prototypes went through a Session object.

The reasoning at the time was sound:

One driver, one ORM, one mental model. SQLAlchemy is the industry default; new contributors don't need to learn a second async stack.
Cost-tracking writes are cheap. A single insert on llm_requests takes < 5 ms with the right indexes; blocking that briefly looked acceptable.
FastAPI's threadpool absorbs blocking I/O. Sync handlers under FastAPI run in a worker thread, so a blocked session.commit() doesn't freeze the event loop — it just consumes a thread slot.

This held through Modules 02–04. Routing, caching, and aggregation are either off-hot-path (aggregator) or have effective in-memory paths (cache hits don't touch Postgres). Sync was fine.

What changed

Module 05 introduced budget enforcement on the hot path. The shape of the work changed:

Every request now does a budget check against Postgres before the model call. At 50k req/day that's ~50k extra queries/day, sustained.
The check has to complete inside the platform's p95 latency budget (~380 ms total), so a 50–80 ms blocking budget read consumes 13–21% of the budget on its own.
Blocked threads accumulate. With FastAPI's default WORKERS=4 × THREADS=40 pool and a 50 ms median budget read, the pool saturates at ~3,200 req/s steady-state. Burst traffic stalls the response queue.

We ran a 5-minute load test (tests/load/budget_burst.py) at 100 RPS sustained. Sync result:

p50 latency:  142 ms
p95 latency:  890 ms   <-- 134% over budget
p99 latency: 2.4 s     <-- thread-pool saturation visible

That's the regression that killed this ADR.

What we got wrong (and what we'd do again)

Wrong: assuming "fine on Modules 02–04" extrapolates to "fine on Module 05." It didn't, because Module 05 changed the hot-path shape: cost-tracking writes (off-hot-path-ish, fire-and-forget) are not the same as budget reads (synchronously gating the model call).

Wrong: treating the threadpool as a free buffer. FastAPI's threadpool handles blocking I/O gracefully, not cheaply. Under burst load, blocked threads queue up on the event loop, which queues new connections, which shows up as p99 latency cliffs.

Right: keeping sync everywhere through M02–M04. The cost of mixed sync + async early would have been real (two patterns to teach, two error modes, two test fixtures). We got real production value from sync first, then paid the rewrite cost only when the workload demanded it.

Right: documenting this as a deprecated ADR rather than silently rewriting. Every contributor who reads governance.py and wonders "why is this asyncpg when the rest is SQLAlchemy?" gets the answer here.

How we reversed it

The reversal landed in two PRs over Module 05:

PR #142 — Introduce asyncpg for governance. src/cost/governance.py, src/cost/budget.py, and src/cost/failures.py switch to asyncpg.Pool. The pool is sized at min_size=10, max_size=30 against the same Postgres instance the sync layer uses.
PR #148 — Cohabit cleanly. A new db/__init__.py exposes both the sync SessionLocal (for ORM models, aggregator, tracker) and the async pool (for governance). Both use the same connection string. The coexistence is documented in db/README.md.

Same load test after the reversal:

p50 latency:  108 ms
p95 latency:  340 ms   <-- inside budget
p99 latency:  680 ms   <-- no saturation cliff

The sync layer remains for ORM models, aggregator, and ingestion. Only the hot-path budget reads moved to async. We did not "fix it everywhere" — that's a Phase-2 cleanup in a future module if and when the next bottleneck appears.

Why we reversed it (in one sentence)

Sync I/O on the hot path saturated the FastAPI threadpool under burst load and pushed p95 latency 134% over budget; async budget reads through asyncpg.Pool brought it back inside budget without changing the rest of the persistence layer.

What this ADR replaces

The original M02–M04 design assumed a single SQLAlchemy session for all paths. ADR-002 (the live three-tier budget design) supersedes that for the budget hot path.
Cost-tracking writes (tracker.py) are explicitly not migrated to async — they remain sync as ADR-004 documents.

References

src/cost/governance.py — async budget enforcement
src/cost/budget.py — async three-tier check
tracker.py + aggregator.py — kept sync (post-ADR-005, pre-ADR-006 if ever needed)
tests/test_smoke.py::test_budget_under_burst — the regression test that pinned the saturation
ADR-002 (live design)
ADR-004 (sync persistence pattern that survived the reversal)