How to Write a Data Contract
Write a data contract in 5 steps: define identity and ownership → define the schema with PII tags → set SLAs and quality rules → specify compatibility policy → wire enforcement into CI/CD. Use ODCS YAML format and store contracts in source control alongside your pipeline code.
Define Identity and Ownership
Start with the contract header — API version, kind, a unique ID, the dataset name, semantic version, and the owning team. Ownership is non-negotiable: without a named owner, nobody is accountable when the contract is violated.
contracts/orders.yml — header
# ODCS v2.2.2 data contract
apiVersion: v2.2.2
kind: DataContract
id: orders-v1
dataset: orders
version: 1.2.0
owner:
team: data-platform
contact: data-platform@company.com
oncall: pagerduty://data-platform
Define the Schema
List every column with name, type, nullable flag, and any PII or sensitivity tags. PII tags are critical — they drive access control policies and compliance checks downstream.
contracts/orders.yml — schema block
schema:
- name: order_id
type: integer
nullable: false
description: Primary key — unique per order
- name: customer_email
type: string
nullable: false
pii: true
sensitivity: HIGH
- name: amount
type: decimal(10,2)
nullable: false
Set SLAs and Quality Rules
Define what "healthy" means for this dataset — freshness window, minimum row count, and uptime target. These values drive both CI/CD checks and runtime quality monitoring.
contracts/orders.yml — sla block
sla:
freshness_hours: 1
min_rows: 1000
uptime_percent: 99.5
alert_channel: "#data-on-call"
quality:
- column: amount
check: min >= 0
- column: order_id
check: unique and not null
Specify Compatibility Policy
Define what changes are breaking vs non-breaking. This is the most important section for CI/CD enforcement — it tells the diff script which changes to block.
contracts/orders.yml — compatibility block
compatibility:
backward_compatible: true
breaking_changes:
- column_removal
- type_narrowing
- nullable_to_required
non_breaking:
- column_addition
- description_update
Wire Enforcement into CI/CD
Add a GitHub Actions workflow that runs a contract diff script on every pull request. The script compares the contract YAML on the PR branch against the main branch and fails the build if breaking changes are detected.
.github/workflows/contract-check.yml
on: [pull_request]
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Check breaking changes
run: python scripts/contract_diff.py --fail-on-breaking
When to Write a Data Contract
- → A dataset is consumed by more than one team
- → Breaking schema changes have caused incidents in the past
- → The dataset contains PII that needs classification and access control
- → Compliance (SOC2, GDPR, HIPAA) requires documented ownership and lineage
Common Issues
Contract not enforced in CI — just documentation
A YAML file without a CI check is a suggestion, not a contract. Add the GitHub Actions workflow before writing your second contract.
Version not bumped on changes
Use semantic versioning: patch for description updates, minor for column additions, major for breaking changes. Without versioning, the diff script can't detect what changed.
No PII tags on sensitive columns
PII tags must be set at contract creation time. Retroactively scanning 200 tables for PII is a painful audit exercise. Tag from the start.
FAQ
- What format should I use to write a data contract?
- Use ODCS (Open Data Contract Standard) YAML. It's vendor-neutral, supported by Soda and Great Expectations, and can be validated with standard YAML tooling. Store in /contracts in your data platform repo.
- What should a data contract include?
- Identity (name, version, ID), ownership (team, contact, on-call), schema (columns, types, nullability, PII tags), SLAs (freshness, row bounds, uptime), and compatibility policy (breaking vs non-breaking change rules).
- How do I enforce a data contract in CI/CD?
- Add a GitHub Actions workflow triggered on pull_request that runs a contract diff script. The script compares the PR branch contract against main and fails the build on breaking changes.