Skip to content
ai-de.net/Projects/P23 · Schema Evolution & Data Contracts
PRO · part 01 free previewStreaming trackP23

Build a
schema governance
platform from registry to incident runbook

FastAPI + PostgreSQL + Redis schema registry with canonical hashing. Runtime enforcement via API middleware + Great Expectations + dbt contracts. GitHub Actions PR gate with diff bot. Column-level lineage with NetworkX + OpenLineage driving auto-block. Chaos-injecting incident simulator with a 5-stage runbook — applied to one orders + user-events domain.

Timeline
20-26 hours
Difficulty
Senior+
Stack
FastAPI · Avro · Kafka · dbt · NetworkX

This is the platform-design question asked at Confluent, Spotify, GoCardless, Netflix and any company running schema-heavy data on Kafka or dbt.

By the end you will have
  • A FastAPI schema registry with PostgreSQL storage, Redis caching, and canonical-hash deduplication
  • Runtime enforcement layered across API middleware, Great Expectations checkpoints, dbt v1.5+ contracts, and a Kafka consumer with DLQ
  • A GitHub Actions PR gate: schema diff, breaking-change detection, and a Markdown PR-comment bot
  • Column-level lineage built with NetworkX, ingested from OpenLineage facets, with severity-based auto-block
  • A pytest-asyncio chaos test harness with a 5-stage incident runbook (Detect → Assess → Contain → Recover → Post-mortem) and a platform SLA spec
PREREQBuilt for senior+ engineers. Comfortable with event design, FastAPI middleware, Kafka producers/consumers, and dbt models. Not a tutorial — assumes you’ve shipped pipelines before.
schema-registry · orders.v3 · contract enforced
lineage
Schema
Registry
Enforce
Govern
orders.v3.avsc
users.v2.json
contract.yaml
ownership.yaml
JSON Schema + Avro
producer-sdk
fastapi-registryPOST /schemas/{topic} · canonical-hash · v3
postgressource of truth
rediscached lookup
content-addressed
api-middleware
ge-checkpoint
dbt-contract
kafka + DLQ
4 enforcement points
gh-actions / pr-bot
networkx-lineagecolumn-level · OpenLineage
severity → auto-block
5-stage runbook
governance spine
# Canonical-hash storage
POST /schemas/orders { schema_json }
sha256(canonicalize(json)) → 8c4f2e
identical payload deduped · version auto-incremented
→ Redis-cached lookup, Postgres source of truth
● Lineage-driven auto-block
nx.descendants("orders.user_id")
→ 3 dbt models · 2 dashboards
severity = CRITICAL · PR blocked + approval routed
→ Slack/PagerDuty notify · 5-stage runbook armed
6
Layered parts
14
Real tools
5-stage
Runbook
Why this matters in 2026

Schema incidents are the #1 cause of production data outages.

The patterns you ship in this project — registry as source of truth, PR-gated diffs, lineage-driven blast-radius checks — are the ones every platform team is now expected to operate.

Confluent Schema Registry is table stakes

Every Kafka shop runs one. Building your own teaches you the canonical-hash + content-addressable-version model that makes the real one debuggable.

Shift-left for data contracts

Breaking changes need to fail in PR review, not at 3am when the dashboard goes blank. A diff-bot in CI is the difference between a 5-minute review and a 5-hour incident.

Lineage drives enforcement, not docs

Column-level lineage stopped being a wiki page and became a control plane. Removing a field auto-blocks if 3 dbt models depend on it.

Data mesh needs cross-team contracts

Domain teams own their schemas; platforms own the contract layer. The registry + PR gate + ownership YAML is the contract a platform team signs with every domain.

Curriculum · 6 parts · 20-26 hours

Part 01 is free. The rest unlocks with PRO.

Try the first 2-3 hours — define schemas as code, write your first contract with an ownership model, and learn the breaking-change taxonomy. If it clicks, upgrade to unlock the registry, enforcement, CI/CD, lineage, and incident-simulation parts.

P23 · 20-26 hours · 6 parts
Free preview PRO required
Part 01 is free — no card required. Get a feel for the schema + contract patterns before paying.
M01
Schema as code: JSON Schema, Avro, and your first contract
Define schemas with JSON Schema and Avro (union types + defaults), model the breaking-change taxonomy (required-add, drop, type change), implement semantic versioning, and write a contract YAML with quality rules and ownership.
2-3hFREE PREVIEW
Start →
M02
FastAPI registry: canonical hashing, Redis cache, producer SDK
Build a FastAPI schema registry with PostgreSQL storage, Redis caching, and canonical-JSON content addressing. Wire a Pydantic-typed producer SDK with edge enforcement and Kafka integration. Compatibility checks at write time.
3-4hPRO TIER
Unlock with PRO →
M03
Runtime enforcement: middleware, Great Expectations, dbt contracts
Reject invalid payloads with FastAPI middleware. Run Great Expectations SimpleCheckpoints with batch requests. Enforce column-level constraints with dbt v1.5+ model contracts. Add a Kafka consumer with a dead-letter queue.
3-4hPRO TIER
Unlock with PRO →
M04
CI/CD gate: schema diff, PR-comment bot, multi-env promotion
Build a git-based schema diff tool (uses git show ref:path for fast history). Block PRs with breaking changes via a GitHub Actions quality gate. Post a Markdown PR-comment with batch + streaming impact tables. Add a promotion CLI for dev → staging → prod.
3-4hPRO TIER
Unlock with PRO →
M05
Column-level lineage: NetworkX + OpenLineage + impact engine
Build a NetworkX DiGraph indexed by column FQN. Ingest OpenLineage events (columnLineage facet). Traverse downstream/upstream with nx.descendants/ancestors. Classify impact severity (CRITICAL / HIGH / MEDIUM / LOW), auto-block on critical assets, route notifications via Jinja2 templates.
3-4hPRO TIER
Unlock with PRO →
M06
Incident simulation + capstone: chaos, runbook, SLA spec
Run pytest-asyncio chaos tests injecting Kafka lag, registry unavailability, and deserialization errors. Write a 5-stage incident runbook (Detect / Assess / Contain / Recover / Post-mortem). Ship a platform SLA YAML and the staff-level design doc that ties all 6 parts together.
3-4hPRO TIER
Unlock with PRO →
5 parts locked · Unlock all PRO content for $29/mo
Upgrade to PRO →
Backed by curriculum

Event Design & Data Contracts

8 modules·7.5 hours·event modeling·schema contracts·validation·governance·evolution
Open curriculum

This curriculum is the spine of the project. PRO subscribers get full access to every module — and the project applies what the curriculum teaches in working code.

The build, in 3 phases

Three sprints. Three checkpoints. One governance platform.

Each phase ends with a tagged commit and a working artifact. Registry first, then enforcement + CI gate, then lineage + incident simulation.

01~6h
Stand up the registry

Schemas defined as code (JSON Schema + Avro). Contracts with ownership. FastAPI registry with PostgreSQL storage, Redis caching, and a Kafka-integrated producer SDK.

  • Versioned .avsc + .json + contract YAML artifacts
  • FastAPI registry API with canonical-hash storage
  • Producer SDK + Kafka integration scaffold
02~7h
Enforce + gate

Runtime enforcement chain across API + batch + streaming + dbt. GitHub Actions PR gate that blocks breaking changes and posts a Markdown impact comment. Multi-env promotion CLI.

  • FastAPI middleware + GE checkpoint + dbt contract + Kafka DLQ
  • GitHub Actions schema-check workflow + diff/gate/bot scripts
  • promote.py CLI + env_config.yaml for dev → staging → prod
03~7h
Lineage + simulate

Column-level lineage graph driving auto-block + notifications. Chaos-injected incident simulations. 5-stage runbook + SLA spec + the staff-level capstone design doc.

  • NetworkX lineage graph + OpenLineage ingest + severity router
  • pytest-asyncio chaos test harness + IncidentSimulator
  • 5-stage runbook YAML + platform_sla.yaml + design doc
Project setup · 10 minutes

One command. Local registry + Kafka + Postgres + Redis.

You get a real stack on day one — FastAPI registry, PostgreSQL for schema storage, Redis for cache, and Kafka for the producer/consumer examples used in parts 02–05.

What lives in the repo

Everything you need to stand up a schema-governance reference on your laptop, plus the seed schemas, sample contracts, and CI workflow used across parts 01–06.

  • schemas/ — versioned .avsc + .json schemas (orders, user_events)
  • contracts/ — contract YAML with quality rules + ownership
  • schema_registry/ — FastAPI app: registry, compat, cache, SDK, CLI
  • .github/workflows/ — schema-check.yml + diff/gate/bot/promote scripts
  • src/lineage/ — NetworkX lineage + OpenLineage ingest + impact engine
  • simulate/ — chaos injector + 5-stage runbook + SLA spec
Download · Starter Kit

Schema Governance Starter Kit

Pre-configured FastAPI registry, sample schemas, contract YAMLs, and the CI workflow scaffolds. Skip the boilerplate, start on part 01.

65 files · ~140 KB · 1,050 sample events · PRO required
~/projects/schema-evolution-contracts — zsh
1. Clone and start the stack
$ git clone github.com/ai-de/p23-schema-evolution-contracts
$ cd p23-schema-evolution-contracts && make up
2. Apply migrations + seed schemas
$ make migrate && make seed
3. Register a schema via the SDK
$ python -m schema_registry.cli register schemas/orders/v1.0.0.avsc --topic orders
4. Run the CI diff against main
$ python scripts/diff.py --base origin/main --head HEAD
1,050
Sample events
5+
Versioned schemas
2
Domains (orders, users)
65
Files in starter
Production hardening

The same governance layer — but built for the 10x case.

Most schema-governance tutorials show you the POST /schemas. This one shows what changes when there are 50+ topics, 200+ dbt models, and the registry itself is in the critical path.

Reference scaffoldWhat you ship in part 02–06
×
Compatibility rules
Canonical hash + interface stub
×
Producer SDK
Edge-enforcement interface
×
Multi-env promotion
CLI + YAML config
×
Approval workflow
State-machine scaffold + templates
×
Drift detection
Single function, single window
×
Incident response
Simulator + 5-stage runbook YAML
×
Registry availability
Single FastAPI process
Production registryWhat this prepares you for
Compatibility rules
Full Avro spec rules: field-removal, type-widening, nullable + default semantics
Producer SDK
Async batching, retry, backoff, in-process registry cache with ETag
Multi-env promotion
Live registry sync via PUT /schemas/{id} + signed approval webhooks
Approval workflow
State persisted in Postgres + Slack interactive callbacks + timeout escalation
Drift detection
Continuous monitoring with sliding-window stats + per-field distribution tests
Incident response
Runbook tied to live oncall paging + executable rollback.sh
Registry availability
HA replicas behind a load balancer with p95 < 50ms and 99.9% target
PRO benefit · code review

Real review from senior engineers who shipped this stack.

Submit your repo, get line-by-line feedback within 48 hours. The kind of review that's quietly worth thousands of dollars in time-to-staff.

CR

4 reviews / month

Submit a repo, a PR, or a refactor proposal. Reviewer is matched to your domain — schema governance, Kafka, and dbt for this project. Async, comments inline, average turnaround 31 hours.

31h
avg turnaround
9.2/10
helpfulness
94%
return next month
OH

2 office hours / month

Live 30-min sessions with a senior data engineer. Architecture questions, whiteboard a registry rollout, mock a system-design round on data contracts. Group sessions also available.

30 min
per session
2 / mo
included
+ group
unlimited
What PRO unlocks

One subscription. 15+ projects, all curriculum, code review.

PRO is built for senior+ engineers who want production-pattern builds and feedback loops — not more tutorials.

What you getFREEPROEXPERT
Projects
Production-pattern builds
2
15+
8
Curriculum modules
All 7 tracks
Phase 1 only
All
All + bonus
Code review credits
Senior engineer review
0
4 / month
Unlimited
Career path access
5 paths × full plans
1 path
All 5
All 5 + 1:1
Certificate
Verifiable on LinkedIn
Yes
Yes + portfolio review
Community
Discord + office hours
Read-only
Full + 2/mo
Full + 4/mo
$29/mo
billed monthly · cancel anytime
or annual
$249/yr save 28%
Upgrade to PRO
Who this is for

Pick this if you’re operating a streaming + dbt stack, not learning to.

SR

Senior data engineers

You ship Kafka producers and dbt models. You've debugged a schema mismatch at 2am and want the registry + PR gate that keeps it from happening again.

ST

Staff / tech leads

You're driving the data-contracts initiative. You need to understand the failure modes, the rollout sequence, and what 'governance' actually means in code before signing off.

PE

Platform engineers

You run the registry and the lineage graph for 10+ teams. You need to see how column-level lineage drives auto-block — and what the runbook looks like when it doesn't.

DA

Data architects

You design the contract layer between domains. This is the contract a platform team signs with every domain team — registry, ownership, approval, escalation.

FAQ

Quick answers.

Part 01 (free) walks you through the breaking-change taxonomy, JSON Schema vs Avro tradeoffs, semantic versioning for schemas, and writing your first contract YAML with ownership. Most free tutorials show you a syntax cheat-sheet; this one builds the mental model you'll need across the next 5 parts.
This is a layered reference implementation — not a deployed distributed platform. There's no Docker Compose orchestrating the full stack, no benchmarked p95 latency under load, no live HA registry, and no real Slack/PagerDuty webhooks firing. Compatibility rule engine, multi-env registry sync, and approval workflows ship as scaffolds with the production patterns called out in the hardening section.
P11 narrows in on contract testing in CI — Buf, protobuf compatibility, breaking-change detection at PR-time. P23 (this project) is the full layered platform: registry as source of truth, runtime enforcement chain, CI gate, column-level lineage driving auto-block, and incident simulation. Pick P11 if you want CI depth on protobuf; pick P23 if you want the platform-architecture story end-to-end.
No — and the catalog used to advertise that, which was misleading. The registry you build here is FastAPI + PostgreSQL + Redis with canonical-hash content addressing — the same model Confluent's registry uses internally. Schemas are JSON Schema + Avro, not protobuf. The patterns transfer if you swap implementations.
No. Everything runs locally with Docker — FastAPI, PostgreSQL, Redis, Kafka. The patterns transfer cleanly to managed services with config changes only. The starter kit ships 1,050 sample events with seeded contract violations so you can run the enforcement chain end-to-end without external data.
All 15+ PRO projects, 4 code-review credits per month, 2 office-hours sessions, full curriculum across all 7 tracks (including the Event Design path that backs this project), all 5 career paths, certificate of completion, and full community access. Cancel anytime.
Yes. System-design rounds for senior+ DE and platform roles increasingly assume schema governance — registry design, breaking-change semantics, CI gate architecture, column-level lineage, and incident response. After this you can whiteboard all five and reference the runbook stages.

Ready to ship the governance layer?

Start with part 01 — free, no card. About 2-3 hours. By the end you'll have versioned schemas (JSON Schema + Avro), a contract YAML with ownership, and the breaking-change taxonomy that drives every part after.

P23 · Schema Evolution & Data Contracts · PRO · part 01 freeUpgrade to PRO →
Press Cmd+K to open