What is a Data Contract? (2026)

Quick answer

Data contracts are versioned YAML files that define what a dataset promises — its schema, quality rules, freshness SLA, and owner. Stored in source control and enforced in CI/CD, contracts prevent breaking changes from reaching downstream consumers and create accountability at team boundaries. They turn implicit assumptions ("the orders table will always have an order_id") into explicit, enforceable agreements. Learn contracts hands-on at /learn/governance or build /projects/schema-evolution-contracts.

What is a data contract?

A data contract is a formal, versioned agreement between a data producer (the team that owns and publishes a dataset) and its consumers (the teams or systems that read it). Contracts define not just the structure of the data, but what it promises — quality levels, SLAs, ownership, and breaking-change policy. They transform implicit assumptions into explicit, enforceable agreements.

Without a contract, a column rename in an upstream service silently breaks five downstream dashboards. With a contract, the rename is flagged as a breaking change in CI/CD, blocks the PR, and notifies consumer teams before any data ever moves.

The dominant format in 2026 is ODCS (Open Data Contract Standard) — a vendor-neutral YAML schema covering dataset identity, columns, quality expectations, SLAs, and ownership. Contracts live next to source code, version with Git, and are validated like any other piece of production configuration.

SKILL · DATA-CONTRACTS

Master data contracts in 4 hours, hands-on.

Write ODCS YAML, wire breaking-change CI checks, enforce PII classification, and connect contracts to Avro Schema Registry for streaming producers.

Start learning →

Why do data contracts matter?

Breaking changes are blocked at the PR — consumers never see a surprise schema change
Every dataset has a named owner — on-call routes correctly instead of guessing
PII columns carry sensitivity tags — compliance scans pass in CI, not in audit
Freshness SLAs are codified — alerts fire when promises are broken, not when stakeholders complain
Producer and consumer teams have a shared, versioned source of truth
SOC2, GDPR, and HIPAA audits get auto-generated lineage and ownership reports

How do data contracts work?

A production contract system runs four layers continuously: define (write the YAML), validate (compatibility check on PR), enforce (block merge on breaking changes), and monitor (runtime alerts when SLAs slip).

The contract itself is a single YAML file in ODCS format, checked into the producer's repo alongside the code that emits the dataset:

# contracts/orders.yml — ODCS format
apiVersion: v2.2.2
kind: DataContract
id: orders-v1
dataset: orders
version: 1.2.0
owner:
  team: data-platform
  contact: data-platform@company.com
sla:
  freshness_hours: 1
  uptime_percent: 99.5
schema:
  - name: order_id
    type: integer
    nullable: false
  - name: customer_email
    type: string
    pii: true
    sensitivity: HIGH

A CI workflow then diffs each PR's contract against the main branch and fails the build on any breaking change (column removed, type narrowed, nullability tightened on a consumed field):

# .github/workflows/contract-check.yml
on: [pull_request]
jobs:
  validate-contracts:
    steps:
      - name: Check breaking changes
        run: |
          python scripts/contract_diff.py \
            --base origin/main \
            --head HEAD \
            --fail-on-breaking

At runtime, the same contract file drives Soda or Great Expectations checks against the produced data, and Avro/Protobuf schemas pushed to Confluent Schema Registry for streaming producers.

Data contracts vs schemas vs dbt tests

Dimension	Schema only	dbt tests	Data contract
Scope	Column names and types	Pipeline-time assertions	Schema + quality + SLA + ownership
When enforced	At read/write time	When pipeline runs	CI/CD + pipeline + registry
Breaking change protection	None	Post-hoc test failure	Blocking CI check on PR
Ownership	Implicit	In dbt project	Explicit — named team and on-call
PII / compliance	None	Via custom meta tags	Column-level sensitivity tags
Cross-team accountability	No	Limited	Yes — producer commits to SLAs

Schemas describe shape. dbt tests assert quality at pipeline time. Data contracts wrap both and add SLAs, owners, and pre-merge breaking-change protection. The three layer cleanly: write a contract, generate dbt tests from it, and use the schema portion to push types to Avro Registry for upstream producers.

What data contracts cover

A mature contract goes far beyond columns and types. The fields that produce most of the operational value:

Schema change protection — block breaking changes at CI/CD before they reach production and break downstream consumers
Cross-team SLAs — codify freshness windows, row count bounds, and quality thresholds between producer and consumer teams
PII classification — tag columns with sensitivity tiers (LOW / MEDIUM / HIGH / RESTRICTED) and enforce role-based access via policy-as-code
Compliance documentation — auto-generate data lineage reports for SOC2, GDPR, and HIPAA audits from contract metadata
Backward compatibility — enforce Avro / Protobuf schema evolution rules via Confluent Schema Registry
Contract registry — version and publish contracts to a central registry so consumers can discover and subscribe to datasets

PROJECT · SCHEMA-EVOLUTION-CONTRACTS

Build a real governance platform with ODCS + GitHub Actions.

Ship YAML contracts, breaking-change CI gates, Great Expectations validation, Avro Schema Registry integration, and a SOC2 audit trail. Mentor-reviewed.

Open project →

Common mistakes (and what to do instead)

Starting with too many contracts — begin with 3-5 critical datasets at team boundaries, not every table in the warehouse. Cover the highest-blast-radius producers first.
Contracts without enforcement — a YAML file nobody checks is just documentation. Wire it into CI/CD from day one with a blocking compatibility check.
No versioning policy — define what counts as breaking vs non-breaking before you write your first contract, not after the first incident. Codify the rules in a script that runs in CI.
Treating contracts as a governance team's job — contracts work when the producing team owns them like production code. Central governance teams write the template; producers own the file.
Ignoring runtime drift — a contract only blocks PRs. Pair it with continuous freshness, volume, and distribution checks so violations at runtime are caught too.

Who are data contracts for?

Data contracts are a platform discipline — they pay off when multiple teams produce and consume data and need a shared, enforceable agreement. Teams that benefit most:

Junior data engineers — write ODCS contracts, run validation scripts, add quality checks. Contracts are increasingly required at mid-level interviews.
Senior data engineers — design contract frameworks, build CI/CD enforcement, implement Schema Registry compatibility, own producer-consumer SLAs end-to-end.
Staff / platform engineers — define org-wide contract standards, build registries, set PII classification policy, ensure SOC2 / GDPR / HIPAA audit requirements are satisfied.
Analytics engineers — consume contracts via auto-generated dbt schema YAML, so models break loudly when an upstream contract changes.
Compliance and security teams — use contracts as the source of truth for PII inventory, access policy, and data lineage in audits.

Frequently asked questions

What is a data contract?

A data contract is a formal, versioned agreement between a data producer and its consumers that defines the schema (columns, types, nullability), quality rules (freshness SLAs, row count bounds, value constraints), ownership (team, on-call rotation), and compatibility policy (breaking vs non-breaking change rules). Contracts are stored as YAML files in source control and enforced in CI/CD pipelines.

What is the Open Data Contract Standard (ODCS)?

ODCS (Open Data Contract Standard) is a vendor-neutral YAML schema for defining data contracts. It standardizes fields for dataset identity, schema definitions, quality expectations, SLAs, and ownership. ODCS contracts can be validated by tools like Soda, Great Expectations, and custom CI scripts, making them portable across platforms and teams.

What is the difference between a data contract and a data schema?

A schema defines the structure of data — column names and types. A data contract includes the schema plus quality rules, SLAs, ownership, and versioning policy. A schema tells you what the data looks like; a contract tells you what the data promises — and creates accountability when those promises are broken.

How are data contracts enforced?

Data contracts are enforced at three layers: CI/CD validation (a workflow runs schema compatibility checks on every PR, blocking merges that introduce breaking changes), pipeline-time validation (Soda or Great Expectations run contract checks after each pipeline run), and Schema Registry (for Kafka-based producers, Avro schema evolution rules are enforced by Confluent Schema Registry).

When should I use data contracts?

Use data contracts when multiple teams consume the same dataset, schema changes break downstream pipelines regularly, you need PII classification and access control enforced at the column level, or you are subject to compliance requirements (SOC2, GDPR, HIPAA) that require lineage and ownership documentation. Contracts are most valuable at team boundaries where producer and consumer are different engineering teams.

What to do next

Start shipping.

Three steps from a guide to a job-ready portfolio. Pick one and start now — the rest will follow.

01 · LEARN

Take the skill

Self-paced module with code, exercises, and a deliverable. Free preview, paid completion.

Start S0X · Data Contracts →

02 · BUILD

Ship the project

Production-grade build with starter kit + mentor code review. The artifact that gets you interviews.

Open P0X · Schema Evolution Contracts →

03 · COMMIT

Pick a career path

The full progression — skills + projects + interview prep — for the role you actually want.

See paths →

What is a data contract?

Master data contracts in 4 hours, hands-on.

Why do data contracts matter?

How do data contracts work?

Data contracts vs schemas vs dbt tests

What data contracts cover

Build a real governance platform with ODCS + GitHub Actions.

Common mistakes (and what to do instead)

Who are data contracts for?

Frequently asked questions

Start shipping.

Take the skill

Ship the project

Pick a career path

Related guides

What is Data Observability?

What is DataOps?

What is dbt? The complete guide for data engineers