ai-de.net/Projects/P23 · Schema Evolution & Data Contracts

Last updated 2026-05-22By AI-DE Engineering Team

PRO · part 01 free previewStreaming trackP23

Build a
schema governance
platform from registry to incident runbook

FastAPI + PostgreSQL + Redis schema registry with canonical hashing. Runtime enforcement via API middleware + Great Expectations + dbt contracts. GitHub Actions PR gate with diff bot. Column-level lineage with NetworkX + OpenLineage driving auto-block. Chaos-injecting incident simulator with a 5-stage runbook — applied to one orders + user-events domain.

Timeline

20-26 hours

Difficulty

Senior+

Stack

FastAPI · Avro · Kafka · dbt · NetworkX

See PRO benefits

This is the platform-design question asked at Confluent, Spotify, GoCardless, Netflix and any company running schema-heavy data on Kafka or dbt.

By the end you will have

A FastAPI schema registry with PostgreSQL storage, Redis caching, and canonical-hash deduplication
Runtime enforcement layered across API middleware, Great Expectations checkpoints, dbt v1.5+ contracts, and a Kafka consumer with DLQ
A GitHub Actions PR gate: schema diff, breaking-change detection, and a Markdown PR-comment bot
Column-level lineage built with NetworkX, ingested from OpenLineage facets, with severity-based auto-block
A pytest-asyncio chaos test harness with a 5-stage incident runbook (Detect → Assess → Contain → Recover → Post-mortem) and a platform SLA spec

PREREQBuilt for senior+ engineers. Comfortable with event design, FastAPI middleware, Kafka producers/consumers, and dbt models. Not a tutorial — assumes you’ve shipped pipelines before.

schema-registry · orders.v3 · contract enforced

lineage

Schema

Registry

Enforce

Govern

orders.v3.avsc

users.v2.json

contract.yaml

ownership.yaml

JSON Schema + Avro

producer-sdk

fastapi-registryPOST /schemas/{topic} · canonical-hash · v3

postgressource of truth

rediscached lookup

content-addressed

api-middleware

ge-checkpoint

dbt-contract

kafka + DLQ

4 enforcement points

gh-actions / pr-bot

networkx-lineagecolumn-level · OpenLineage

severity → auto-block

5-stage runbook

governance spine

# Canonical-hash storage

POST /schemas/orders { schema_json }

sha256(canonicalize(json)) → 8c4f2e

identical payload deduped · version auto-incremented

→ Redis-cached lookup, Postgres source of truth

● Lineage-driven auto-block

nx.descendants("orders.user_id")

→ 3 dbt models · 2 dashboards

severity = CRITICAL · PR blocked + approval routed

→ Slack/PagerDuty notify · 5-stage runbook armed

Layered parts

Real tools

5-stage

Runbook

Why this matters in 2026

Schema incidents are the #1 cause of production data outages.

The patterns you ship in this project — registry as source of truth, PR-gated diffs, lineage-driven blast-radius checks — are the ones every platform team is now expected to operate.

Confluent Schema Registry is table stakes

Every Kafka shop runs one. Building your own teaches you the canonical-hash + content-addressable-version model that makes the real one debuggable.

Shift-left for data contracts

Breaking changes need to fail in PR review, not at 3am when the dashboard goes blank. A diff-bot in CI is the difference between a 5-minute review and a 5-hour incident.

Lineage drives enforcement, not docs

Column-level lineage stopped being a wiki page and became a control plane. Removing a field auto-blocks if 3 dbt models depend on it.

Data mesh needs cross-team contracts

Domain teams own their schemas; platforms own the contract layer. The registry + PR gate + ownership YAML is the contract a platform team signs with every domain.

Curriculum · 6 parts · 20-26 hours

Part 01 is free. The rest unlocks with PRO.

Try the first 2-3 hours — define schemas as code, write your first contract with an ownership model, and learn the breaking-change taxonomy. If it clicks, upgrade to unlock the registry, enforcement, CI/CD, lineage, and incident-simulation parts.

P23 · 20-26 hours · 6 parts

Free preview PRO required

Part 01 is free — no card required. Get a feel for the schema + contract patterns before paying.

M01

✓Schema as code: JSON Schema, Avro, and your first contract

Define schemas with JSON Schema and Avro (union types + defaults), model the breaking-change taxonomy (required-add, drop, type change), implement semantic versioning, and write a contract YAML with quality rules and ownership.

2-3hFREE PREVIEW

Start →

M02

⊘FastAPI registry: canonical hashing, Redis cache, producer SDK

Build a FastAPI schema registry with PostgreSQL storage, Redis caching, and canonical-JSON content addressing. Wire a Pydantic-typed producer SDK with edge enforcement and Kafka integration. Compatibility checks at write time.

3-4hPRO TIER

Unlock with PRO →

M03

⊘Runtime enforcement: middleware, Great Expectations, dbt contracts

Reject invalid payloads with FastAPI middleware. Run Great Expectations SimpleCheckpoints with batch requests. Enforce column-level constraints with dbt v1.5+ model contracts. Add a Kafka consumer with a dead-letter queue.

3-4hPRO TIER

Unlock with PRO →

M04

⊘CI/CD gate: schema diff, PR-comment bot, multi-env promotion

Build a git-based schema diff tool (uses git show ref:path for fast history). Block PRs with breaking changes via a GitHub Actions quality gate. Post a Markdown PR-comment with batch + streaming impact tables. Add a promotion CLI for dev → staging → prod.

3-4hPRO TIER

Unlock with PRO →

M05

⊘Column-level lineage: NetworkX + OpenLineage + impact engine

Build a NetworkX DiGraph indexed by column FQN. Ingest OpenLineage events (columnLineage facet). Traverse downstream/upstream with nx.descendants/ancestors. Classify impact severity (CRITICAL / HIGH / MEDIUM / LOW), auto-block on critical assets, route notifications via Jinja2 templates.

3-4hPRO TIER

Unlock with PRO →

M06

⊘Incident simulation + capstone: chaos, runbook, SLA spec

Run pytest-asyncio chaos tests injecting Kafka lag, registry unavailability, and deserialization errors. Write a 5-stage incident runbook (Detect / Assess / Contain / Recover / Post-mortem). Ship a platform SLA YAML and the staff-level design doc that ties all 6 parts together.

3-4hPRO TIER

Unlock with PRO →

5 parts locked · Unlock all PRO content for $29/mo

Upgrade to PRO →

Backed by curriculum

Event Design & Data Contracts

8 modules·7.5 hours·event modeling·schema contracts·validation·governance·evolution

Open curriculum→

This curriculum is the spine of the project. PRO subscribers get full access to every module — and the project applies what the curriculum teaches in working code.

The build, in 3 phases

Three sprints. Three checkpoints. One governance platform.

Each phase ends with a tagged commit and a working artifact. Registry first, then enforcement + CI gate, then lineage + incident simulation.

01~6h

Stand up the registry

Schemas defined as code (JSON Schema + Avro). Contracts with ownership. FastAPI registry with PostgreSQL storage, Redis caching, and a Kafka-integrated producer SDK.

✓Versioned .avsc + .json + contract YAML artifacts
✓FastAPI registry API with canonical-hash storage
✓Producer SDK + Kafka integration scaffold

02~7h

Enforce + gate

Runtime enforcement chain across API + batch + streaming + dbt. GitHub Actions PR gate that blocks breaking changes and posts a Markdown impact comment. Multi-env promotion CLI.

✓FastAPI middleware + GE checkpoint + dbt contract + Kafka DLQ
✓GitHub Actions schema-check workflow + diff/gate/bot scripts
✓promote.py CLI + env_config.yaml for dev → staging → prod

03~7h

Lineage + simulate

Column-level lineage graph driving auto-block + notifications. Chaos-injected incident simulations. 5-stage runbook + SLA spec + the staff-level capstone design doc.

✓NetworkX lineage graph + OpenLineage ingest + severity router
✓pytest-asyncio chaos test harness + IncidentSimulator
✓5-stage runbook YAML + platform_sla.yaml + design doc

Project setup · 10 minutes

One command. Local registry + Kafka + Postgres + Redis.

You get a real stack on day one — FastAPI registry, PostgreSQL for schema storage, Redis for cache, and Kafka for the producer/consumer examples used in parts 02–05.

What lives in the repo

Everything you need to stand up a schema-governance reference on your laptop, plus the seed schemas, sample contracts, and CI workflow used across parts 01–06.

schemas/ — versioned .avsc + .json schemas (orders, user_events)
contracts/ — contract YAML with quality rules + ownership
schema_registry/ — FastAPI app: registry, compat, cache, SDK, CLI
.github/workflows/ — schema-check.yml + diff/gate/bot/promote scripts
src/lineage/ — NetworkX lineage + OpenLineage ingest + impact engine
simulate/ — chaos injector + 5-stage runbook + SLA spec

Download · Starter Kit

Schema Governance Starter Kit

Pre-configured FastAPI registry, sample schemas, contract YAMLs, and the CI workflow scaffolds. Skip the boilerplate, start on part 01.

65 files · ~140 KB · 1,050 sample events · PRO required

~/projects/schema-evolution-contracts — zsh

1. Clone and start the stack

$ git clone github.com/ai-de/p23-schema-evolution-contracts

$ cd p23-schema-evolution-contracts && make up

2. Apply migrations + seed schemas

$ make migrate && make seed

3. Register a schema via the SDK

$ python -m schema_registry.cli register schemas/orders/v1.0.0.avsc --topic orders

4. Run the CI diff against main

$ python scripts/diff.py --base origin/main --head HEAD

1,050

Sample events

Versioned schemas

Domains (orders, users)

Files in starter

Production hardening

The same governance layer — but built for the 10x case.

Most schema-governance tutorials show you the POST /schemas. This one shows what changes when there are 50+ topics, 200+ dbt models, and the registry itself is in the critical path.

Reference scaffoldWhat you ship in part 02–06

Compatibility rules

Canonical hash + interface stub

Producer SDK

Edge-enforcement interface

Multi-env promotion

CLI + YAML config

Approval workflow

State-machine scaffold + templates

Drift detection

Single function, single window

Incident response

Simulator + 5-stage runbook YAML

Registry availability

Single FastAPI process

Production registryWhat this prepares you for

✓

Compatibility rules

Full Avro spec rules: field-removal, type-widening, nullable + default semantics

✓

Producer SDK

Async batching, retry, backoff, in-process registry cache with ETag

✓

Multi-env promotion

Live registry sync via PUT /schemas/{id} + signed approval webhooks

✓

Approval workflow

State persisted in Postgres + Slack interactive callbacks + timeout escalation

✓

Drift detection

Continuous monitoring with sliding-window stats + per-field distribution tests

✓

Incident response

Runbook tied to live oncall paging + executable rollback.sh

✓

Registry availability

HA replicas behind a load balancer with p95 < 50ms and 99.9% target

PRO benefit · code review

Real review from senior engineers who shipped this stack.

Submit your repo, get line-by-line feedback within 48 hours. The kind of review that's quietly worth thousands of dollars in time-to-staff.

4 reviews / month

Submit a repo, a PR, or a refactor proposal. Reviewer is matched to your domain — schema governance, Kafka, and dbt for this project. Async, comments inline, average turnaround 31 hours.

31h

avg turnaround

9.2/10

helpfulness

94%

return next month

2 office hours / month

Live 30-min sessions with a senior data engineer. Architecture questions, whiteboard a registry rollout, mock a system-design round on data contracts. Group sessions also available.

30 min

per session

2 / mo

included

+ group

unlimited

What PRO unlocks

One subscription. 15+ projects, all curriculum, code review.

PRO is built for senior+ engineers who want production-pattern builds and feedback loops — not more tutorials.

What you getFREEPROEXPERT

Projects

Production-pattern builds

15+

Curriculum modules

All 7 tracks

Phase 1 only

All

All + bonus

Code review credits

Senior engineer review

4 / month

Unlimited

Career path access

5 paths × full plans

1 path

All 5

All 5 + 1:1

Certificate

Verifiable on LinkedIn

—

Yes

Yes + portfolio review

Community

Discord + office hours

Read-only

Full + 2/mo

Full + 4/mo

$29/mo

billed monthly · cancel anytime

or annual

$249/yr save 28%

Upgrade to PRO →

Who this is for

Pick this if you’re operating a streaming + dbt stack, not learning to.

Senior data engineers

You ship Kafka producers and dbt models. You've debugged a schema mismatch at 2am and want the registry + PR gate that keeps it from happening again.

Staff / tech leads

You're driving the data-contracts initiative. You need to understand the failure modes, the rollout sequence, and what 'governance' actually means in code before signing off.

Platform engineers

You run the registry and the lineage graph for 10+ teams. You need to see how column-level lineage drives auto-block — and what the runbook looks like when it doesn't.

Data architects

You design the contract layer between domains. This is the contract a platform team signs with every domain team — registry, ownership, approval, escalation.

Related curriculum

Going deeper? Three tracks back this project.

Event design is the spine. These three curriculums let you go deeper on the layers around it — Kafka internals, scheduling, and the dbt model layer the contracts gate.

FAQ

Quick answers.

How is part 01 different from a free schema-design tutorial?+

Part 01 (free) walks you through the breaking-change taxonomy, JSON Schema vs Avro tradeoffs, semantic versioning for schemas, and writing your first contract YAML with ownership. Most free tutorials show you a syntax cheat-sheet; this one builds the mental model you'll need across the next 5 parts.

What's NOT in scope?+

This is a layered reference implementation — not a deployed distributed platform. There's no Docker Compose orchestrating the full stack, no benchmarked p95 latency under load, no live HA registry, and no real Slack/PagerDuty webhooks firing. Compatibility rule engine, multi-env registry sync, and approval workflows ship as scaffolds with the production patterns called out in the hardening section.

How is this different from P11 data-governance-contracts?+

P11 narrows in on contract testing in CI — Buf, protobuf compatibility, breaking-change detection at PR-time. P23 (this project) is the full layered platform: registry as source of truth, runtime enforcement chain, CI gate, column-level lineage driving auto-block, and incident simulation. Pick P11 if you want CI depth on protobuf; pick P23 if you want the platform-architecture story end-to-end.

Does this include Confluent Schema Registry, Buf, or protobuf?+

No — and the catalog used to advertise that, which was misleading. The registry you build here is FastAPI + PostgreSQL + Redis with canonical-hash content addressing — the same model Confluent's registry uses internally. Schemas are JSON Schema + Avro, not protobuf. The patterns transfer if you swap implementations.

Do I need AWS / Confluent Cloud credentials to do this?+

No. Everything runs locally with Docker — FastAPI, PostgreSQL, Redis, Kafka. The patterns transfer cleanly to managed services with config changes only. The starter kit ships 1,050 sample events with seeded contract violations so you can run the enforcement chain end-to-end without external data.

What does PRO actually unlock for $29/mo?+

All 15+ PRO projects, 4 code-review credits per month, 2 office-hours sessions, full curriculum across all 7 tracks (including the Event Design path that backs this project), all 5 career paths, certificate of completion, and full community access. Cancel anytime.

Will this help with senior+ data engineering interviews?+

Yes. System-design rounds for senior+ DE and platform roles increasingly assume schema governance — registry design, breaking-change semantics, CI gate architecture, column-level lineage, and incident response. After this you can whiteboard all five and reference the runbook stages.

Related projects

Paired with this project

P11·PAID·quality

Data governance & contracts

ODCS contracts, GE + Soda validation, Avro + Schema Registry PR gate, 4-tier PII + RBAC + hashed audit, SOC2 + GDPR engines.

Explore project →

Ready to ship the governance layer?

Start with part 01 — free, no card. About 2-3 hours. By the end you'll have versioned schemas (JSON Schema + Avro), a contract YAML with ownership, and the breaking-change taxonomy that drives every part after.

See PRO benefits

P23 · Schema Evolution & Data Contracts · PRO · part 01 freeUpgrade to PRO →

Build aschema governanceplatform from registry to incident runbook

Schema incidents are the #1 cause of production data outages.

Confluent Schema Registry is table stakes

Shift-left for data contracts

Lineage drives enforcement, not docs

Data mesh needs cross-team contracts

Part 01 is free. The rest unlocks with PRO.

Event Design & Data Contracts

Three sprints. Three checkpoints. One governance platform.

One command. Local registry + Kafka + Postgres + Redis.

What lives in the repo

Schema Governance Starter Kit

The same governance layer — but built for the 10x case.

Real review from senior engineers who shipped this stack.

4 reviews / month

2 office hours / month

One subscription. 15+ projects, all curriculum, code review.

Pick this if you’re operating a streaming + dbt stack, not learning to.

Senior data engineers

Staff / tech leads

Platform engineers

Data architects

Going deeper? Three tracks back this project.

Kafka Streams Learning Path

airflow

dbt & Analytics Engineering

Quick answers.

Paired with this project

Ready to ship the governance layer?

Build a
schema governance
platform from registry to incident runbook