Data Contracts Explained: What They Are and How They Work
A data contract is a versioned YAML agreement between a data producer and its consumers. It defines schema, quality rules, SLAs, and ownership — stored in Git and enforced in CI/CD. When a producer tries to make a breaking change, the CI check blocks the PR until consumers are notified and updated.
Minimal ODCS data contract (YAML)
# contracts/orders.yml
apiVersion: v2.2.2
kind: DataContract
id: orders-v1
dataset: orders
version: 1.0.0
owner:
team: data-platform
contact: data-platform@company.com
sla:
freshness_hours: 1
min_rows: 1000
schema:
- name: order_id
type: integer
nullable: false
The 5 Components of a Data Contract
Identity
Dataset name, unique ID, semantic version, and ODCS API version. The version follows semver: patch for descriptions, minor for additions, major for breaking changes.
id · dataset · version · apiVersion
Ownership
Team name, contact email, and PagerDuty rotation. Without a named owner, nobody is accountable when the contract is violated.
owner.team · owner.contact · owner.oncall
Schema
Column definitions with name, type, nullability, and PII/sensitivity tags. PII tags drive access control policies and compliance checks.
schema[].name · type · nullable · pii · sensitivity
SLA
Freshness window, minimum row count, uptime target, and alert channel. These values drive both CI/CD checks and runtime quality monitoring.
sla.freshness_hours · min_rows · uptime_percent · alert_channel
Compatibility Policy
Explicit list of breaking vs non-breaking changes. This is what the CI/CD diff script uses to decide whether to block a pull request.
compatibility.breaking_changes · non_breaking
Breaking vs Non-Breaking Changes
The compatibility policy is the heart of the contract — it defines what the CI/CD diff script will block.
Breaking (CI blocks PR)
- ✗ Column removal
- ✗ Column rename
- ✗ Type narrowing (bigint → int)
- ✗ Nullable → required (existing rows fail)
- ✗ Primary key change
Non-Breaking (CI passes)
- ✓ Adding a nullable column
- ✓ Updating a description
- ✓ Type widening (int → bigint)
- ✓ Required → nullable
- ✓ Adding a PII tag
Common Mistakes
Contracts without CI enforcement
A YAML file without a CI check is documentation, not a contract. The enforcement — the diff script in GitHub Actions — is what makes it a contract. Write the workflow before you write your second contract.
Rolling out to every table at once
Start with 3–5 critical datasets at team boundaries. Contracts add overhead. Use them where breaking changes have caused real incidents.
No version bump on changes
Without semantic versioning, the diff script cannot tell what changed. Patch for descriptions, minor for additions, major for breaking changes — enforce this in the CI workflow itself.
FAQ
- What is a data contract?
- A versioned YAML agreement between a data producer and its consumers defining schema, quality rules, SLAs, and ownership — stored in Git and enforced in CI/CD to prevent breaking changes.
- What is ODCS?
- Open Data Contract Standard — a vendor-neutral YAML schema for data contracts. Supported by Soda, Great Expectations, and custom CI tooling. No vendor lock-in.
- What makes a change "breaking"?
- Breaking changes cause consumer code to fail: column removal, rename, type narrowing, or required-to-nullable change. Non-breaking: new nullable columns, description updates, type widening. Contracts codify this and enforce it in CI.
- Do I need contracts for every table?
- No. Use contracts at team boundaries where breaking changes have caused incidents. Start with 3–5 critical cross-team datasets. Internal tables owned by one team can rely on dbt tests.