Context
A full-stack RAG platform makes hundreds of design decisions — chunk size, embedding model, retrieval strategy, judge model, cache TTL, rate limits. Without a single source of truth for what good looks like, those decisions get made locally by whichever engineer has the keyboard, and the platform ends up optimized for nothing in particular.
We've shipped two RAG systems before this one without a system contract. The pattern was always:
- Sprint 1-2 — fast iteration, every decision feels obvious
- Sprint 3-4 — first stakeholder asks "is the answer fresh?" / "how slow is too slow?" — nobody knows
- Sprint 5+ — argument about whether the system is "ready" without a shared definition of ready
A SystemContract turns that argument into a config file.
Three options on the table:
- Option A: No contract. Make decisions locally, justify in PRs. (What we did before.)
- Option B: Lightweight SLO list in a wiki. (Most teams' compromise.)
- Option C: Code-level dataclass that gates real CI checks. Decisions reference contract values. Misses block merges.
Decision
Adopt Option C. The SystemContract is a Python dataclass shipped in system_contract.py (Module 01) — it's the first artifact in the project before any data flows.
# system_contract.py
@dataclass(frozen=True)
class SystemContract:
"""Platform SLAs declared upfront. Every component gates against these."""
# Freshness — how stale can ingested data be?
freshness_hours: int # 1-168 (1 hour to 1 week)
# Latency — p50/p95 for the end-to-end query path
latency_p50_seconds: float # 0.5-30
latency_p95_seconds: float # 0.5-30
# Quality — minimum scores on the gold-set eval
correctness_floor: float # 0.5-1.0 (recall@5 against gold)
coverage_floor: float # 0.5-1.0 (% of gold queries with grounded answer)
# Sources — declared inputs (each must have a manifest)
sources: list[SourceSpec]
def __post_init__(self):
# Hard validation — invalid contract = system won't boot
assert 1 <= self.freshness_hours <= 168
assert 0.5 <= self.correctness_floor <= 1.0
# ... etc
CI gates use it directly:
# scripts/check_contract.py
def assert_contract_compliance(contract: SystemContract, eval_results: EvalRun) -> None:
if eval_results.recall_at_5 < contract.correctness_floor:
raise ContractBreach(
f"recall@5 = {eval_results.recall_at_5} < contract floor {contract.correctness_floor}"
)
# ... and the same shape for latency, freshness, coverage
The ReleaseGate in M05 reads from the contract directly — no copy-paste of thresholds.
Tradeoffs we accept
| Lever | Alternative | Chosen |
|---|---|---|
| Speed of decision | Wiki-only SLOs | Code-level dataclass — gates real CI checks |
| Flexibility | "Soft targets, revisit quarterly" | Hard floors that block merges |
| Discovery | Stakeholders learn what's slow | Stakeholders sign the contract before sprint 1 |
| Discoverability | Buried in confluence | Top of repo, frozen dataclass |
Consequences (positive)
- Every PR review can ask one question: "does this respect the contract?" — no more case-by-case debate.
- M05's release gate reads the contract directly. We can't ship a regression past a floor without explicitly amending the contract first (which is a separate PR, separately reviewed).
- New engineers find the contract in their first hour and understand the platform's constraints before reading any code.
- The cost model (M06) routes against contract latency budget —
latency_p95_secondsis the cap that decides "use Haiku or Sonnet" per query.
Consequences (negative)
- Contract amendments are PRs. That's friction, by design — but real friction. A PM who wants to relax a floor for a launch deadline has to argue it on a public diff.
- The contract has to be honest. We've seen teams default
correctness_floor: 0.5to make CI green; that's worse than no contract because it provides false comfort. Mitigation: M05's eval rolls up both contract compliance and delta from the previous release, so floors that drift down get flagged. - Some constraints are hard to express in a frozen dataclass (e.g. "freshness depends on source"). We model this with a
SourceSpeclist inside the contract, but it's a leaky abstraction.
Reversal plan
Drop the contract → SLO wiki: ~1 engineer-day. Migrate the dataclass values into a YAML file the team reviews quarterly. Triggers:
- Contract amendments cross ~5/quarter (signal that the floors aren't load-bearing — they're paper).
- The team finds itself defaulting all floors to permissive values (signal that the contract is theater).
- Stakeholders stop reading contract changes (signal that the contract isn't where decisions get made anymore).
Soft tightening (move floors over time): ~0.5 engineer-day per amendment. The path we expect — start at a permissive contract, ratchet up as the system matures.
References
system_contract.py— the dataclass + validationscripts/check_contract.py— CI compliance checkevaluation/release_gate.py— M05 reads from the contractserving/cost_guardrail.py— M06 reads from the contract for latency budget- ADR-002 (retrieval choices reference
correctness_floor) - ADR-004 (failure cascade respects
latency_p95_seconds)