Skip to content
Back to Full-Stack AI Platform

SystemContract as the platform's north star (declared upfront, not derived)

✓ AcceptedFull-Stack AI Platform01 — Data Foundation & Ingestion
By AI-DE Engineering Team·Stakeholders: ML engineer, platform owner, eng manager, product

Context

A full-stack RAG platform makes hundreds of design decisions — chunk size, embedding model, retrieval strategy, judge model, cache TTL, rate limits. Without a single source of truth for what good looks like, those decisions get made locally by whichever engineer has the keyboard, and the platform ends up optimized for nothing in particular.

We've shipped two RAG systems before this one without a system contract. The pattern was always:

  1. Sprint 1-2 — fast iteration, every decision feels obvious
  2. Sprint 3-4 — first stakeholder asks "is the answer fresh?" / "how slow is too slow?" — nobody knows
  3. Sprint 5+ — argument about whether the system is "ready" without a shared definition of ready

A SystemContract turns that argument into a config file.

Three options on the table:

  • Option A: No contract. Make decisions locally, justify in PRs. (What we did before.)
  • Option B: Lightweight SLO list in a wiki. (Most teams' compromise.)
  • Option C: Code-level dataclass that gates real CI checks. Decisions reference contract values. Misses block merges.

Decision

Adopt Option C. The SystemContract is a Python dataclass shipped in system_contract.py (Module 01) — it's the first artifact in the project before any data flows.

# system_contract.py
@dataclass(frozen=True)
class SystemContract:
    """Platform SLAs declared upfront. Every component gates against these."""

    # Freshness — how stale can ingested data be?
    freshness_hours: int             # 1-168 (1 hour to 1 week)

    # Latency — p50/p95 for the end-to-end query path
    latency_p50_seconds: float       # 0.5-30
    latency_p95_seconds: float       # 0.5-30

    # Quality — minimum scores on the gold-set eval
    correctness_floor: float         # 0.5-1.0 (recall@5 against gold)
    coverage_floor: float            # 0.5-1.0 (% of gold queries with grounded answer)

    # Sources — declared inputs (each must have a manifest)
    sources: list[SourceSpec]

    def __post_init__(self):
        # Hard validation — invalid contract = system won't boot
        assert 1 <= self.freshness_hours <= 168
        assert 0.5 <= self.correctness_floor <= 1.0
        # ... etc

CI gates use it directly:

# scripts/check_contract.py
def assert_contract_compliance(contract: SystemContract, eval_results: EvalRun) -> None:
    if eval_results.recall_at_5 < contract.correctness_floor:
        raise ContractBreach(
            f"recall@5 = {eval_results.recall_at_5} < contract floor {contract.correctness_floor}"
        )
    # ... and the same shape for latency, freshness, coverage

The ReleaseGate in M05 reads from the contract directly — no copy-paste of thresholds.

Tradeoffs we accept

LeverAlternativeChosen
Speed of decisionWiki-only SLOsCode-level dataclass — gates real CI checks
Flexibility"Soft targets, revisit quarterly"Hard floors that block merges
DiscoveryStakeholders learn what's slowStakeholders sign the contract before sprint 1
DiscoverabilityBuried in confluenceTop of repo, frozen dataclass

Consequences (positive)

  • Every PR review can ask one question: "does this respect the contract?" — no more case-by-case debate.
  • M05's release gate reads the contract directly. We can't ship a regression past a floor without explicitly amending the contract first (which is a separate PR, separately reviewed).
  • New engineers find the contract in their first hour and understand the platform's constraints before reading any code.
  • The cost model (M06) routes against contract latency budget — latency_p95_seconds is the cap that decides "use Haiku or Sonnet" per query.

Consequences (negative)

  • Contract amendments are PRs. That's friction, by design — but real friction. A PM who wants to relax a floor for a launch deadline has to argue it on a public diff.
  • The contract has to be honest. We've seen teams default correctness_floor: 0.5 to make CI green; that's worse than no contract because it provides false comfort. Mitigation: M05's eval rolls up both contract compliance and delta from the previous release, so floors that drift down get flagged.
  • Some constraints are hard to express in a frozen dataclass (e.g. "freshness depends on source"). We model this with a SourceSpec list inside the contract, but it's a leaky abstraction.

Reversal plan

Drop the contract → SLO wiki: ~1 engineer-day. Migrate the dataclass values into a YAML file the team reviews quarterly. Triggers:

  1. Contract amendments cross ~5/quarter (signal that the floors aren't load-bearing — they're paper).
  2. The team finds itself defaulting all floors to permissive values (signal that the contract is theater).
  3. Stakeholders stop reading contract changes (signal that the contract isn't where decisions get made anymore).

Soft tightening (move floors over time): ~0.5 engineer-day per amendment. The path we expect — start at a permissive contract, ratchet up as the system matures.

References

  • system_contract.py — the dataclass + validation
  • scripts/check_contract.py — CI compliance check
  • evaluation/release_gate.py — M05 reads from the contract
  • serving/cost_guardrail.py — M06 reads from the contract for latency budget
  • ADR-002 (retrieval choices reference correctness_floor)
  • ADR-004 (failure cascade respects latency_p95_seconds)
Built into the project

This decision shipped as part of Full-Stack AI Platform — see the full architecture, starter kit, and 4 more ADRs.

Open project →
Press Cmd+K to open