# ADR-001 — SystemContract as the platform's north star (declared upfront, not derived)

- **Status:** Accepted
- **Date:** 2026-04-02
- **Module:** 01 — Data Foundation & Ingestion
- **Stakeholders:** ML engineer, platform owner, eng manager, product

## Context

A full-stack RAG platform makes hundreds of design decisions — chunk size, embedding model, retrieval strategy, judge model, cache TTL, rate limits. Without a single source of truth for _what good looks like_, those decisions get made locally by whichever engineer has the keyboard, and the platform ends up optimized for nothing in particular.

We've shipped two RAG systems before this one without a system contract. The pattern was always:

1. Sprint 1-2 — fast iteration, every decision feels obvious
2. Sprint 3-4 — first stakeholder asks "is the answer fresh?" / "how slow is too slow?" — nobody knows
3. Sprint 5+ — argument about whether the system is "ready" without a shared definition of ready

A SystemContract turns that argument into a config file.

Three options on the table:

- **Option A:** No contract. Make decisions locally, justify in PRs. (What we did before.)
- **Option B:** Lightweight SLO list in a wiki. (Most teams' compromise.)
- **Option C:** Code-level dataclass that gates real CI checks. Decisions reference contract values. Misses block merges.

## Decision

**Adopt Option C.** The SystemContract is a Python dataclass shipped in `system_contract.py` (Module 01) — it's the _first artifact_ in the project before any data flows.

```python
# system_contract.py
@dataclass(frozen=True)
class SystemContract:
    """Platform SLAs declared upfront. Every component gates against these."""

    # Freshness — how stale can ingested data be?
    freshness_hours: int             # 1-168 (1 hour to 1 week)

    # Latency — p50/p95 for the end-to-end query path
    latency_p50_seconds: float       # 0.5-30
    latency_p95_seconds: float       # 0.5-30

    # Quality — minimum scores on the gold-set eval
    correctness_floor: float         # 0.5-1.0 (recall@5 against gold)
    coverage_floor: float            # 0.5-1.0 (% of gold queries with grounded answer)

    # Sources — declared inputs (each must have a manifest)
    sources: list[SourceSpec]

    def __post_init__(self):
        # Hard validation — invalid contract = system won't boot
        assert 1 <= self.freshness_hours <= 168
        assert 0.5 <= self.correctness_floor <= 1.0
        # ... etc
```

CI gates use it directly:

```python
# scripts/check_contract.py
def assert_contract_compliance(contract: SystemContract, eval_results: EvalRun) -> None:
    if eval_results.recall_at_5 < contract.correctness_floor:
        raise ContractBreach(
            f"recall@5 = {eval_results.recall_at_5} < contract floor {contract.correctness_floor}"
        )
    # ... and the same shape for latency, freshness, coverage
```

The `ReleaseGate` in M05 reads from the contract directly — no copy-paste of thresholds.

## Tradeoffs we accept

| Lever             | Alternative                       | Chosen                                         |
| ----------------- | --------------------------------- | ---------------------------------------------- |
| Speed of decision | Wiki-only SLOs                    | Code-level dataclass — gates real CI checks    |
| Flexibility       | "Soft targets, revisit quarterly" | Hard floors that block merges                  |
| Discovery         | Stakeholders learn what's slow    | Stakeholders sign the contract before sprint 1 |
| Discoverability   | Buried in confluence              | Top of repo, frozen dataclass                  |

## Consequences (positive)

- Every PR review can ask one question: "does this respect the contract?" — no more case-by-case debate.
- M05's release gate reads the contract directly. We can't ship a regression past a floor without explicitly amending the contract first (which is a separate PR, separately reviewed).
- New engineers find the contract in their first hour and understand the platform's constraints before reading any code.
- The cost model (M06) routes against contract latency budget — `latency_p95_seconds` is the cap that decides "use Haiku or Sonnet" per query.

## Consequences (negative)

- Contract amendments are PRs. That's friction, by design — but real friction. A PM who wants to relax a floor for a launch deadline has to argue it on a public diff.
- The contract has to be honest. We've seen teams default `correctness_floor: 0.5` to make CI green; that's worse than no contract because it provides false comfort. Mitigation: M05's eval rolls up _both_ contract compliance _and_ delta from the previous release, so floors that drift down get flagged.
- Some constraints are hard to express in a frozen dataclass (e.g. "freshness depends on source"). We model this with a `SourceSpec` list inside the contract, but it's a leaky abstraction.

## Reversal plan

**Drop the contract → SLO wiki:** ~1 engineer-day. Migrate the dataclass values into a YAML file the team reviews quarterly. Triggers:

1. Contract amendments cross ~5/quarter (signal that the floors aren't load-bearing — they're paper).
2. The team finds itself defaulting all floors to permissive values (signal that the contract is theater).
3. Stakeholders stop reading contract changes (signal that the contract isn't where decisions get made anymore).

**Soft tightening (move floors over time):** ~0.5 engineer-day per amendment. The path we expect — start at a permissive contract, ratchet up as the system matures.

## References

- `system_contract.py` — the dataclass + validation
- `scripts/check_contract.py` — CI compliance check
- `evaluation/release_gate.py` — M05 reads from the contract
- `serving/cost_guardrail.py` — M06 reads from the contract for latency budget
- ADR-002 (retrieval choices reference `correctness_floor`)
- ADR-004 (failure cascade respects `latency_p95_seconds`)