Skip to content
Back to AI Cost Optimization

Three-tier budget hierarchy with fail-open enforcement

✓ AcceptedAI Cost Optimization05 — Platform Operations & Cost Governance
By AI-DE Engineering Team·Stakeholders: platform engineer, finance partner, on-call SRE

Context

By Module 04 the platform tracks every token and routes every prompt — but nothing prevents a runaway team or a runaway model from burning the monthly budget in a weekend. Three forces drive the budget design:

  1. Multi-level accountability. Finance owns the org-wide cap. Team leads own per-team allocations (each team is a P&L line in the company plan). Engineers own per-user safety nets (so one bad agent loop doesn't take down a team). Any single-level system makes one of these stakeholders unhappy.
  2. Latency-critical hot path. Budget checks sit on every /chat call (Module 02 instrumented latency_ms end-to-end at p95 ~ 380 ms). A budget-check pattern that blocks for 30 ms is a 8% latency hit; one that blocks for 200 ms is a regression we'd revert before shipping.
  3. Database is a shared dependency. The same Postgres instance handles llm_requests writes, cost_daily_summary upserts (ADR-004), and budget reads. A Postgres outage cannot freeze the LLM platform — that's an availability target written into sla.yaml.

The naive design (single global cap, hard-reject on breach) fails (1) and (2). The "freeze on Postgres outage" pattern fails (3). We needed a design that handles all three.

Decision

We adopt a three-tier hierarchical budget with fail-open enforcement.

Hierarchy: Org cap (top) → Team cap (mid) → User cap (bottom). Each level has its own budget_tiers row keyed by (tier_id, period_start) with a UNIQUE constraint. Spending is recorded at request time and rolled up daily via the same ON CONFLICT pattern as cost_daily_summary (ADR-004).

# src/cost/budget.py
async def check_budget(org_id, team_id, user_id, est_cost_usd) -> BudgetVerdict:
    # Three reads, async, single round-trip via UNNEST
    org, team, user = await asyncpg_conn.fetchrow(_BUDGET_SQL, ...)

    if user.spent + est_cost_usd > user.cap:   return BudgetVerdict.OVER_USER
    if team.spent + est_cost_usd > team.cap:   return BudgetVerdict.OVER_TEAM
    if org.spent  + est_cost_usd > org.cap:    return BudgetVerdict.OVER_ORG
    return BudgetVerdict.OK

Fail-open: if Postgres is unreachable, the budget check returns OK and falls back to a Redis-backed sliding-window rate limiter (30 req/min/user). The overage is recorded in overage_requests for post-hoc reconciliation when Postgres is reachable again. Hard-reject only triggers when both Postgres and Redis are unreachable.

# src/cost/governance.py
try:
    verdict = await asyncio.wait_for(check_budget(...), timeout=0.05)
except (asyncpg.PostgresError, asyncio.TimeoutError):
    if rate_limiter.allow(user_id):
        await record_overage(user_id, est_cost_usd, reason="db_unreachable")
        return BudgetVerdict.OK
    return BudgetVerdict.HARD_REJECT

The Postgres timeout is 50 ms — chosen as 1/8 of the p95 latency budget so even a slow-Postgres day adds < 6 ms to the median path.

Tradeoffs we accept

LeverAlternativeChosen
Fairness modelSingle global cap3-tier with org/team/user caps
Outage behaviourHard-reject if budget DB downFail-open with rate limit + overage log
ReconciliationSynchronous strictEventual via overage_requests
Hot-path blockPer-call DB writePer-call DB read + nightly rollup write
Data modelOne table per scopeOne budget_tiers table with tier_kind column

We accept the overage risk because the worst-case is bounded — 30 req/min × 5 minutes of Postgres downtime × ~$0.005/req = ~$0.75 in absolute terms before either Postgres recovers or the rate limiter caps the spend. That's materially less than the cost of false-rejecting paying users during a Postgres blip.

Consequences (positive)

  • Latency intact. The p95 budget-check latency is 8 ms (one async query, single round-trip via Postgres UNNEST). Hot-path impact is < 2%.
  • Three stakeholders, three knobs. Finance updates org caps in governance.py; team leads update team caps via the admin UI; users see their per-user spent/cap on every response in X-Cost-Headers.
  • Postgres outage is recoverable, not catastrophic. The fail-open path is exercised in tests/test_smoke.py::test_budget_fails_open_when_db_down; the reconciliation flow has a runbook entry.
  • Reconciliation is auditable. Every fail-open decision lands in overage_requests with (user_id, amount_usd, reason, requested_at, approval_status) — finance can review weekly.

Consequences (negative)

  • Three ways to over-spend. A team can hit its cap while the org cap is fine; a user can hit their cap while the team cap is fine. Surface the reason on the 429 response (X-Budget-Reason: over_team_cap) so debug is obvious; the alternative — a single opaque "budget exceeded" — is worse.
  • Eventual consistency on reconciliation. Overage rows can sit unreconciled for 24 h before the nightly job picks them up. We accept the lag because the absolute amounts are small and the alternative is freezing the platform.
  • More moving parts in incident response. The runbook entry "Postgres unreachable" now has a sub-step: "Check overage_requests table for the fail-open volume; if > $X/hour, switch enforcement to hard-reject via BUDGET_FAIL_OPEN=false."

Reversal plan

If finance loses tolerance for fail-open behaviour (e.g. a single bad outage costs > $10k overage), the reversal is:

  1. Set BUDGET_FAIL_OPEN=false in .env.
  2. src/cost/governance.py switches the except branch to BudgetVerdict.HARD_REJECT instead of the rate-limit fallback.
  3. Document the SLA change in sla.yaml: budget enforcement availability is now bound to Postgres availability (drops from 99.9% to 99.5%).

Estimated effort: 1 engineer-day, plus a coordinated SLA update with the on-call team.

References

  • migrations/005_budget_tables.sqlbudget_tiers, overage_requests, cost_anomalies schemas
  • src/cost/budget.py — three-tier budget check
  • src/cost/governance.py — fail-open enforcement + reconciliation
  • src/cost/failures.py — recovery strategies + retry/backoff
  • runbook/cost-incident-response.py — "Postgres unreachable" playbook
  • sla.yaml — budget engine availability target (99.9%)
  • ADR-004 (per-request detail + daily rollup pattern this design rolls up into)
Built into the project

This decision shipped as part of AI Cost Optimization — see the full architecture, starter kit, and 4 more ADRs.

Open project →
Press Cmd+K to open