Skip to content

Data Contracts Explained: What They Are and How They Work

A data contract is a versioned YAML agreement between a data producer and its consumers. It defines schema, quality rules, SLAs, and ownership — stored in Git and enforced in CI/CD. When a producer tries to make a breaking change, the CI check blocks the PR until consumers are notified and updated.

Minimal ODCS data contract (YAML)

# contracts/orders.yml
apiVersion: v2.2.2
kind: DataContract
id: orders-v1
dataset: orders
version: 1.0.0
owner:
  team: data-platform
  contact: data-platform@company.com
sla:
  freshness_hours: 1
  min_rows: 1000
schema:
  - name: order_id
    type: integer
    nullable: false

The 5 Components of a Data Contract

🪪

Identity

Dataset name, unique ID, semantic version, and ODCS API version. The version follows semver: patch for descriptions, minor for additions, major for breaking changes.

id · dataset · version · apiVersion

👤

Ownership

Team name, contact email, and PagerDuty rotation. Without a named owner, nobody is accountable when the contract is violated.

owner.team · owner.contact · owner.oncall

📐

Schema

Column definitions with name, type, nullability, and PII/sensitivity tags. PII tags drive access control policies and compliance checks.

schema[].name · type · nullable · pii · sensitivity

🎯

SLA

Freshness window, minimum row count, uptime target, and alert channel. These values drive both CI/CD checks and runtime quality monitoring.

sla.freshness_hours · min_rows · uptime_percent · alert_channel

🔐

Compatibility Policy

Explicit list of breaking vs non-breaking changes. This is what the CI/CD diff script uses to decide whether to block a pull request.

compatibility.breaking_changes · non_breaking

Breaking vs Non-Breaking Changes

The compatibility policy is the heart of the contract — it defines what the CI/CD diff script will block.

Breaking (CI blocks PR)

  • Column removal
  • Column rename
  • Type narrowing (bigint → int)
  • Nullable → required (existing rows fail)
  • Primary key change

Non-Breaking (CI passes)

  • Adding a nullable column
  • Updating a description
  • Type widening (int → bigint)
  • Required → nullable
  • Adding a PII tag

Common Mistakes

Contracts without CI enforcement

A YAML file without a CI check is documentation, not a contract. The enforcement — the diff script in GitHub Actions — is what makes it a contract. Write the workflow before you write your second contract.

Rolling out to every table at once

Start with 3–5 critical datasets at team boundaries. Contracts add overhead. Use them where breaking changes have caused real incidents.

No version bump on changes

Without semantic versioning, the diff script cannot tell what changed. Patch for descriptions, minor for additions, major for breaking changes — enforce this in the CI workflow itself.

FAQ

What is a data contract?
A versioned YAML agreement between a data producer and its consumers defining schema, quality rules, SLAs, and ownership — stored in Git and enforced in CI/CD to prevent breaking changes.
What is ODCS?
Open Data Contract Standard — a vendor-neutral YAML schema for data contracts. Supported by Soda, Great Expectations, and custom CI tooling. No vendor lock-in.
What makes a change "breaking"?
Breaking changes cause consumer code to fail: column removal, rename, type narrowing, or required-to-nullable change. Non-breaking: new nullable columns, description updates, type widening. Contracts codify this and enforce it in CI.
Do I need contracts for every table?
No. Use contracts at team boundaries where breaking changes have caused incidents. Start with 3–5 critical cross-team datasets. Internal tables owned by one team can rely on dbt tests.

Related

Press Cmd+K to open