Skip to content

How to Write a Data Contract

Write a data contract in 5 steps: define identity and ownership → define the schema with PII tags → set SLAs and quality rules → specify compatibility policy → wire enforcement into CI/CD. Use ODCS YAML format and store contracts in source control alongside your pipeline code.

1

Define Identity and Ownership

Start with the contract header — API version, kind, a unique ID, the dataset name, semantic version, and the owning team. Ownership is non-negotiable: without a named owner, nobody is accountable when the contract is violated.

contracts/orders.yml — header

# ODCS v2.2.2 data contract
apiVersion: v2.2.2
kind: DataContract
id: orders-v1
dataset: orders
version: 1.2.0
owner:
  team: data-platform
  contact: data-platform@company.com
  oncall: pagerduty://data-platform
2

Define the Schema

List every column with name, type, nullable flag, and any PII or sensitivity tags. PII tags are critical — they drive access control policies and compliance checks downstream.

contracts/orders.yml — schema block

schema:
  - name: order_id
    type: integer
    nullable: false
    description: Primary key — unique per order
  - name: customer_email
    type: string
    nullable: false
    pii: true
    sensitivity: HIGH
  - name: amount
    type: decimal(10,2)
    nullable: false
3

Set SLAs and Quality Rules

Define what "healthy" means for this dataset — freshness window, minimum row count, and uptime target. These values drive both CI/CD checks and runtime quality monitoring.

contracts/orders.yml — sla block

sla:
  freshness_hours: 1
  min_rows: 1000
  uptime_percent: 99.5
  alert_channel: "#data-on-call"
quality:
  - column: amount
    check: min >= 0
  - column: order_id
    check: unique and not null
4

Specify Compatibility Policy

Define what changes are breaking vs non-breaking. This is the most important section for CI/CD enforcement — it tells the diff script which changes to block.

contracts/orders.yml — compatibility block

compatibility:
  backward_compatible: true
  breaking_changes:
    - column_removal
    - type_narrowing
    - nullable_to_required
  non_breaking:
    - column_addition
    - description_update
5

Wire Enforcement into CI/CD

Add a GitHub Actions workflow that runs a contract diff script on every pull request. The script compares the contract YAML on the PR branch against the main branch and fails the build if breaking changes are detected.

.github/workflows/contract-check.yml

on: [pull_request]
jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0
      - name: Check breaking changes
        run: python scripts/contract_diff.py --fail-on-breaking

When to Write a Data Contract

  • A dataset is consumed by more than one team
  • Breaking schema changes have caused incidents in the past
  • The dataset contains PII that needs classification and access control
  • Compliance (SOC2, GDPR, HIPAA) requires documented ownership and lineage

Common Issues

Contract not enforced in CI — just documentation

A YAML file without a CI check is a suggestion, not a contract. Add the GitHub Actions workflow before writing your second contract.

Version not bumped on changes

Use semantic versioning: patch for description updates, minor for column additions, major for breaking changes. Without versioning, the diff script can't detect what changed.

No PII tags on sensitive columns

PII tags must be set at contract creation time. Retroactively scanning 200 tables for PII is a painful audit exercise. Tag from the start.

FAQ

What format should I use to write a data contract?
Use ODCS (Open Data Contract Standard) YAML. It's vendor-neutral, supported by Soda and Great Expectations, and can be validated with standard YAML tooling. Store in /contracts in your data platform repo.
What should a data contract include?
Identity (name, version, ID), ownership (team, contact, on-call), schema (columns, types, nullability, PII tags), SLAs (freshness, row bounds, uptime), and compatibility policy (breaking vs non-breaking change rules).
How do I enforce a data contract in CI/CD?
Add a GitHub Actions workflow triggered on pull_request that runs a contract diff script. The script compares the PR branch contract against main and fails the build on breaking changes.

Related

Press Cmd+K to open