Data Governance & Contracts

Name: Data Governance & Contracts
Price: 29 USD
Availability: InStock
Author: AI-DE Engineering Team

Upstream changed a column. Your dashboard showed zeros at 2am. Contracts, drift detection, and lineage prevent that — and make the blast radius visible before anyone merges.

Without contracts, every upstream schema change is a midnight pager. The mature data teams (Spotify, GoCardless, Convoy, Airbnb) ship faster precisely because producers and consumers agree in writing, drift gets caught in CI, and lineage tells you the blast radius before you merge. This path teaches the contract, lineage, and policy automation those teams run.

What you’ll be able to do

Write and version ODCS data contracts with CI-gated backward / forward / full compatibility checks that block breaking changes before they ship
Build drift detection that fails closed — alerting on column renames, type widening, null-rate spikes, and cardinality collapses before downstream consumers notice
Instrument column-level lineage with OpenLineage, propagate PII tags, and run blast-radius analysis on any proposed schema change
Roll out a governance program real producers adopt — RACI, escalation paths, governance KPIs, and the change-management playbook for a multi-team org

Curriculum

Phase 1: Foundations: When Governance Fails

Why ungoverned pipelines break. Walk through a real schema-change incident, then learn the producer-side causes and the drift-detection patterns that catch them before downstream consumers do.

Your Pipeline Broke Because of a Schema Change

A realistic incident walkthrough — an upstream rename quietly turns a dashboard into garbage, then PII into a leak. The forensic timeline that motivates every other module in the path.

Why Schema Changes Break Pipelines

Backward / forward / full compatibility modes, how schema registries (Confluent, Apicurio) gate producer changes, evolution strategies that keep consumers green, and the cost of skipping the contract step.

Detect Schema Drift Before It Breaks You

Drift detection patterns — column-level type / cardinality / null-rate alarms, automated schema diffs in CI, fail-closed gates on incoming data, and the runbook for triaging a real drift event.

Phase 2: Contracts & Lineage

Data contracts as the producer–consumer interface, plus the lineage and access controls that make a contract enforceable across teams.

Design Data Contracts That Don't Break

Open Data Contract Standard (ODCS), schema + freshness + quality + ownership clauses, contract versioning, compatibility-mode selection, the producer-consumer negotiation pattern, and contract test fixtures.

Lineage, Debugging & Access Control

Column-level lineage capture (OpenLineage, dbt exposures), impact analysis for proposed schema changes, PII classification + tagging propagation, and access-control models that ride on top of lineage metadata.

Phase 3: Production Governance Programs

Enforce contracts in production, run a governance program a real data team will actually use, and scale that program to a multi-domain data mesh.

Enforce Contracts in Production

CI-gated compatibility checks, producer-side schema validation (DLQs, outbox pattern), runtime contract enforcement at ingestion + transform layers, escalation paths, and SLA scoring for contract violations.

Governance for Real Data Teams

Stakeholder map (producers, consumers, platform, security, legal), RACI for schema decisions, change-management process, governance KPIs that actually move, and how to roll out a program without killing velocity.

Scaling Governance with Data Mesh

Federated governance model, per-domain ownership + cross-domain contracts, computational policies (OPA, Privacera, Immuta), data product certification levels, and the platform team's role as enabler not gatekeeper.

Phase 4: Frontier & Capstone

The AI/LLM governance frontier, then a capstone that puts every layer together to rescue a broken production data platform.

AI & LLM Data Governance

Training-data provenance + lineage, prompt + response logging, PII redaction in eval datasets, model + dataset cards, RAG corpus governance, and audit requirements for AI systems under EU AI Act / SOC 2.

Capstone: Fix a Broken Data Platform

An end-to-end remediation: triage a degraded platform, ship a contract-spec PR, add the drift-detection job, instrument lineage, define the governance program, and present the rollout plan to leadership.

What you’ll build

Data contract spec (ODCS) with CI-gated backward-compatibility checks that block breaking changes at PR time
Drift-detection job that fails closed on schema / type / null-rate anomalies with a paging runbook
Column-level lineage + PII classification with an impact-analysis tool for proposed schema changes
Production contract-enforcement pipeline (DLQ + outbox + escalation) plus a governance program rollout doc

Your dashboards looked fine yesterday… and today half of them silently show garbage.

Without governance + contracts, you risk:

An upstream rename quietly breaks 40 downstream dashboards and nobody notices for a week
PII leaks because classification was a Notion doc and nobody enforced it at the data layer
A data scientist fights stale tables for 40% of their week because no one owns freshness
EU AI Act / SOC 2 auditor asks for training-data provenance and nobody can produce the lineage

What is Data Governance & Contracts?

Data governance is the set of practices, policies, and tools that ensure data is reliable, discoverable, and compliant across an organization. Data contracts formalize the interface between data producers and consumers. Together, they prevent the schema drift, quality degradation, and compliance failures that plague growing data teams.

Why this matters in production

Without governance, data teams spend 40-60% of their time on data quality issues. At companies like Spotify and GoCardless, data contracts reduced pipeline failures by over 80%. Production governance means automating schema validation, drift detection, and access control so teams can ship confidently.

Common use cases

Implementing schema evolution strategies that prevent breaking downstream consumers
Building drift detection systems that alert on unexpected schema or data changes
Defining data contracts between producers and consumers with SLA enforcement
Automating governance policies for access control and compliance
Managing metadata catalogs for data discovery and documentation
Scaling governance practices across multiple teams and data domains

Data Governance vs alternatives

Data Governance vs Data Quality

Data quality tests (Great Expectations, dbt tests, Soda) validate that arrived data meets a rule — they fire after the fact, downstream. Data contracts are a producer-side agreement that blocks the bad data from shipping in the first place. Mature teams run both: contracts gate the producer boundary, quality tests are the safety net when something slips through. Governance owns the organizational layer that decides which rules exist, who enforces them, and what escalates when SLAs breach.

Data Governance vs Data Mesh

Data mesh is an organizational model where domain teams own their data products. Governance is the policy layer that makes decentralized ownership safe. Without cross-domain contracts, mesh devolves into a siloed lake where nobody agrees on naming, PII handling, or freshness SLAs. Federated governance (OPA, Privacera, Immuta) enforces cross-domain standards computationally so the platform team is an enabler, not the bottleneck.

Data Governance vs Data Catalogs

Catalogs (Datahub, Amundsen, Atlan) are discovery and metadata UIs. Governance is the enforcement layer that catalogs surface but don't implement. A catalog tells you what tables exist; a contract tells you what a table promises and who is accountable when a column disappears. Catalogs become useful inputs to governance automation — lineage, ownership, classification — but a catalog alone does not prevent 2am pages.

Related skills

Governance policies are monitored through observability practices in Data Observability.
Event schema governance applies the same contract principles from Event-Driven Design.
dbt enforces data contracts through tests and documentation in dbt & Analytics Engineering.
Iceberg's native schema evolution, snapshot isolation, and hidden-partitioning model implement the table-level contract guarantees taught in Apache Iceberg.

Why this skill matters

Governance + contracts is the platform-engineering specialty that maps to Senior + Staff DE roles at data-mature orgs. Spotify, GoCardless, Convoy, Stitch Fix, and Airbnb hire specifically for engineers who can defend contract enforcement strategy, lineage scope decisions, and policy-automation tradeoffs — the exact decisions this path makes you defensible on.

Common questions about Data Governance

What are data contracts?

Data contracts are versioned, machine-readable agreements between data producers and consumers that specify schema, freshness SLAs, quality thresholds, and ownership. The Open Data Contract Standard (ODCS) is the emerging spec. Contracts live in source control alongside the pipeline code, get validated in CI on every PR, and block incompatible schema changes before they reach downstream consumers. When a contract is violated in production — a column drops, a type widens unexpectedly — the contract owner is paged, not a random on-call engineer who didn't write the upstream table.

Why is data governance important?

Without governance, schema changes are oral agreements that evaporate the moment the engineer who made them leaves the team. Data teams at Spotify, GoCardless, and Convoy found that ungoverned pipelines spent 40–60% of engineering time on data-quality fires. Governance reduces that by making the producer-consumer interface explicit: contracts define what's promised, drift detection catches deviations in CI, lineage maps the blast radius before anyone merges, and access control ensures PII never lands somewhere it shouldn't. The result is that teams ship faster, not slower — because they stop debugging mystery data issues at 2am.

How long does it take to implement data governance?

A maturity ladder is more useful than a single timeline. Week 1–2: schema validation at ingestion and a contract spec for the highest-traffic table. Month 1: CI-gated compatibility checks on all producer schemas, a DLQ for contract violations, and a schema registry (Confluent or Apicurio) that blocks incompatible publishes. Month 2–3: column-level lineage with OpenLineage, PII classification and tag propagation, and impact-analysis tooling for proposed schema changes. Month 4–6: federated governance across 3+ domains with OPA or Privacera policies, data product certification levels, and governance KPIs that leadership reviews. Full multi-domain governance with AI/LLM provenance coverage typically takes 6–9 months at orgs that are starting from scratch.

Do data engineers need governance skills?

Yes — governance has moved from "data steward job" to "senior DE expectation" at data-mature orgs. Engineers are now expected to write ODCS contract specs, instrument OpenLineage in their pipelines, configure schema registry compatibility modes (backward / forward / full), author OPA policies for column-level access control, and defend their contract enforcement strategy in architecture reviews. Job descriptions at Spotify, Airbnb, Convoy, and Stitch Fix explicitly list data contracts and lineage as required skills for senior and staff DE roles — not nice-to-haves.

What is schema drift?

Schema drift is when an upstream data source changes its structure without notifying downstream consumers: a column is renamed, a type is widened from INT to BIGINT, a nullable field becomes required, a high-cardinality string column suddenly contains only two values, or a column is quietly dropped. Each change can silently corrupt downstream aggregations, break type casts, or cause joins to produce wrong results. Drift detection catches these by diffing the incoming schema against the last-known-good contract and failing closed — rejecting the batch or alerting on the stream — rather than letting garbage land in production tables. Tools like Great Expectations schema tests, dbt source freshness checks, and Soda provide the detection layer; the contract defines what "drift" means for each field.

What is the difference between data contracts and data quality tests?

Data quality tests run after data has already landed — they catch problems in the consumer's table. Data contracts are agreements at the producer side that block incompatible changes before they ship. Quality tests detect symptoms; contracts prevent causes. Mature data orgs run both — contracts at the producer boundary, tests as the safety net downstream.

How do data contracts work in a data mesh?

In a data mesh, each domain owns its data products and the contracts that describe them. Federated governance defines the cross-domain standards (naming, PII tagging, freshness SLAs, compatibility modes) and computational policies (OPA, Privacera) enforce them automatically. The platform team builds the contract registry and CI gates; the domain teams write and version their own contracts.

ai-de.net/Learn/Data Governance & Contracts

QualityPhase 1 freeFull access in Professional

Data Governance & Contracts

Upstream changed a column. Your dashboard showed zeros at 2am. Contracts, drift detection, and lineage prevent that — and make the blast radius visible before anyone merges.

Last updated 2026-05-22By AI-DE Engineering Team

Phases

Modules

Time

~28h video + labs

Continue Learning View phases

Jump to:P1Foundations: When Governance Fails P2Contracts & Lineage P3Production Governance Programs P4Frontier & Capstone

What you'll do

What you'll be able to do.

Write and version ODCS data contracts with CI-gated backward / forward / full compatibility checks that block breaking changes before they ship
Build drift detection that fails closed — alerting on column renames, type widening, null-rate spikes, and cardinality collapses before downstream consumers notice
Instrument column-level lineage with OpenLineage, propagate PII tags, and run blast-radius analysis on any proposed schema change
Roll out a governance program real producers adopt — RACI, escalation paths, governance KPIs, and the change-management playbook for a multi-team org

Phase roadmap.

Phase 1PRO REQUIRED

Foundations: When Governance Fails

Why ungoverned pipelines break. Walk through a real schema-change incident, then learn the producer-side causes and the drift-detection patterns that catch them before downstream consumers do.

1.1

✓Your Pipeline Broke Because of a Schema Change

A realistic incident walkthrough — an upstream rename quietly turns a dashboard into garbage, then PII into a leak. The forensic timeline that motivates every other module in the path.

Open →

1.2

✓Why Schema Changes Break Pipelines

Open →

1.3

✓Detect Schema Drift Before It Breaks You

Drift detection patterns — column-level type / cardinality / null-rate alarms, automated schema diffs in CI, fail-closed gates on incoming data, and the runbook for triaging a real drift event.

Open →

Used in:P11 — Data Governance & Contracts P10 — DataGuard Observability

Start Phase 1 →

Phase 2PRO REQUIRED

Contracts & Lineage

Data contracts as the producer–consumer interface, plus the lineage and access controls that make a contract enforceable across teams.

2.1

⊘Design Data Contracts That Don't Break

Locked

2.2

⊘Lineage, Debugging & Access Control

Locked

Used in:P11 — Data Governance & Contracts P10 — DataGuard Observability

Unlock Phase 2 →

Phase 3PRO REQUIRED

Production Governance Programs

Enforce contracts in production, run a governance program a real data team will actually use, and scale that program to a multi-domain data mesh.

3.1

⊘Enforce Contracts in Production

Locked

3.2

⊘Governance for Real Data Teams

Locked

3.3

⊘Scaling Governance with Data Mesh

Locked

Used in:P11 — Data Governance & Contracts P27 — Data Access Control P12 — CI/CD Data Platform

Unlock Phase 3 →

Phase 4PRO REQUIRED

Frontier & Capstone

The AI/LLM governance frontier, then a capstone that puts every layer together to rescue a broken production data platform.

4.1

⊘AI & LLM Data Governance

Locked

4.2

⊘Capstone: Fix a Broken Data Platform

Locked

Used in:P27 — Data Access Control P12 — CI/CD Data Platform

Unlock Phase 4 →

Your dashboards looked fine yesterday… and today half of them silently show garbage.

Without governance + contracts, you risk:

An upstream rename quietly breaks 40 downstream dashboards and nobody notices for a week
PII leaks because classification was a Notion doc and nobody enforced it at the data layer
A data scientist fights stale tables for 40% of their week because no one owns freshness
EU AI Act / SOC 2 auditor asks for training-data provenance and nobody can produce the lineage

Unlock the full governance program path

What you'll ship

What you'll build.

Data contract spec (ODCS) with CI-gated backward-compatibility checks that block breaking changes at PR time
Drift-detection job that fails closed on schema / type / null-rate anomalies with a paging runbook
Column-level lineage + PII classification with an impact-analysis tool for proposed schema changes
Production contract-enforcement pipeline (DLQ + outbox + escalation) plus a governance program rollout doc

Definition

What is Data Governance & Contracts?

Production context

Why this matters in production.

Use cases

Common use cases.

Implementing schema evolution strategies that prevent breaking downstream consumers
Building drift detection systems that alert on unexpected schema or data changes
Defining data contracts between producers and consumers with SLA enforcement
Automating governance policies for access control and compliance
Managing metadata catalogs for data discovery and documentation
Scaling governance practices across multiple teams and data domains

Compare

Data Governance vs alternatives.

Data GovernancevsData Quality

Data GovernancevsData Mesh

Data GovernancevsData Catalogs

Related curriculum

Related skills.

Build with this skill

Build real systems.

Data Governance & Contracts DataGuard Observability Data Access Control CI/CD Data Platform

Why this matters

Why this skill matters.

FAQ

Common questions about Data.

Data Governance & ContractsStart Phase 1