Build a
schema governance
platform from registry to incident runbook
FastAPI + PostgreSQL + Redis schema registry with canonical hashing. Runtime enforcement via API middleware + Great Expectations + dbt contracts. GitHub Actions PR gate with diff bot. Column-level lineage with NetworkX + OpenLineage driving auto-block. Chaos-injecting incident simulator with a 5-stage runbook — applied to one orders + user-events domain.
This is the platform-design question asked at Confluent, Spotify, GoCardless, Netflix and any company running schema-heavy data on Kafka or dbt.
- A FastAPI schema registry with PostgreSQL storage, Redis caching, and canonical-hash deduplication
- Runtime enforcement layered across API middleware, Great Expectations checkpoints, dbt v1.5+ contracts, and a Kafka consumer with DLQ
- A GitHub Actions PR gate: schema diff, breaking-change detection, and a Markdown PR-comment bot
- Column-level lineage built with NetworkX, ingested from OpenLineage facets, with severity-based auto-block
- A pytest-asyncio chaos test harness with a 5-stage incident runbook (Detect → Assess → Contain → Recover → Post-mortem) and a platform SLA spec
Schema incidents are the #1 cause of production data outages.
The patterns you ship in this project — registry as source of truth, PR-gated diffs, lineage-driven blast-radius checks — are the ones every platform team is now expected to operate.
Confluent Schema Registry is table stakes
Every Kafka shop runs one. Building your own teaches you the canonical-hash + content-addressable-version model that makes the real one debuggable.
Shift-left for data contracts
Breaking changes need to fail in PR review, not at 3am when the dashboard goes blank. A diff-bot in CI is the difference between a 5-minute review and a 5-hour incident.
Lineage drives enforcement, not docs
Column-level lineage stopped being a wiki page and became a control plane. Removing a field auto-blocks if 3 dbt models depend on it.
Data mesh needs cross-team contracts
Domain teams own their schemas; platforms own the contract layer. The registry + PR gate + ownership YAML is the contract a platform team signs with every domain.
Part 01 is free. The rest unlocks with PRO.
Try the first 2-3 hours — define schemas as code, write your first contract with an ownership model, and learn the breaking-change taxonomy. If it clicks, upgrade to unlock the registry, enforcement, CI/CD, lineage, and incident-simulation parts.
Event Design & Data Contracts
This curriculum is the spine of the project. PRO subscribers get full access to every module — and the project applies what the curriculum teaches in working code.
Three sprints. Three checkpoints. One governance platform.
Each phase ends with a tagged commit and a working artifact. Registry first, then enforcement + CI gate, then lineage + incident simulation.
Schemas defined as code (JSON Schema + Avro). Contracts with ownership. FastAPI registry with PostgreSQL storage, Redis caching, and a Kafka-integrated producer SDK.
- ✓Versioned .avsc + .json + contract YAML artifacts
- ✓FastAPI registry API with canonical-hash storage
- ✓Producer SDK + Kafka integration scaffold
Runtime enforcement chain across API + batch + streaming + dbt. GitHub Actions PR gate that blocks breaking changes and posts a Markdown impact comment. Multi-env promotion CLI.
- ✓FastAPI middleware + GE checkpoint + dbt contract + Kafka DLQ
- ✓GitHub Actions schema-check workflow + diff/gate/bot scripts
- ✓promote.py CLI + env_config.yaml for dev → staging → prod
Column-level lineage graph driving auto-block + notifications. Chaos-injected incident simulations. 5-stage runbook + SLA spec + the staff-level capstone design doc.
- ✓NetworkX lineage graph + OpenLineage ingest + severity router
- ✓pytest-asyncio chaos test harness + IncidentSimulator
- ✓5-stage runbook YAML + platform_sla.yaml + design doc
One command. Local registry + Kafka + Postgres + Redis.
You get a real stack on day one — FastAPI registry, PostgreSQL for schema storage, Redis for cache, and Kafka for the producer/consumer examples used in parts 02–05.
What lives in the repo
Everything you need to stand up a schema-governance reference on your laptop, plus the seed schemas, sample contracts, and CI workflow used across parts 01–06.
- schemas/ — versioned .avsc + .json schemas (orders, user_events)
- contracts/ — contract YAML with quality rules + ownership
- schema_registry/ — FastAPI app: registry, compat, cache, SDK, CLI
- .github/workflows/ — schema-check.yml + diff/gate/bot/promote scripts
- src/lineage/ — NetworkX lineage + OpenLineage ingest + impact engine
- simulate/ — chaos injector + 5-stage runbook + SLA spec
Schema Governance Starter Kit
Pre-configured FastAPI registry, sample schemas, contract YAMLs, and the CI workflow scaffolds. Skip the boilerplate, start on part 01.
The same governance layer — but built for the 10x case.
Most schema-governance tutorials show you the POST /schemas. This one shows what changes when there are 50+ topics, 200+ dbt models, and the registry itself is in the critical path.
nullable + default semanticsETagPUT /schemas/{id} + signed approval webhooksrollback.shp95 < 50ms and 99.9% targetReal review from senior engineers who shipped this stack.
Submit your repo, get line-by-line feedback within 48 hours. The kind of review that's quietly worth thousands of dollars in time-to-staff.
4 reviews / month
Submit a repo, a PR, or a refactor proposal. Reviewer is matched to your domain — schema governance, Kafka, and dbt for this project. Async, comments inline, average turnaround 31 hours.
2 office hours / month
Live 30-min sessions with a senior data engineer. Architecture questions, whiteboard a registry rollout, mock a system-design round on data contracts. Group sessions also available.
One subscription. 15+ projects, all curriculum, code review.
PRO is built for senior+ engineers who want production-pattern builds and feedback loops — not more tutorials.
Pick this if you’re operating a streaming + dbt stack, not learning to.
Senior data engineers
You ship Kafka producers and dbt models. You've debugged a schema mismatch at 2am and want the registry + PR gate that keeps it from happening again.
Staff / tech leads
You're driving the data-contracts initiative. You need to understand the failure modes, the rollout sequence, and what 'governance' actually means in code before signing off.
Platform engineers
You run the registry and the lineage graph for 10+ teams. You need to see how column-level lineage drives auto-block — and what the runbook looks like when it doesn't.
Data architects
You design the contract layer between domains. This is the contract a platform team signs with every domain team — registry, ownership, approval, escalation.
Going deeper? Three tracks back this project.
Event design is the spine. These three curriculums let you go deeper on the layers around it — Kafka internals, scheduling, and the dbt model layer the contracts gate.
Quick answers.
Ready to ship the governance layer?
Start with part 01 — free, no card. About 2-3 hours. By the end you'll have versioned schemas (JSON Schema + Avro), a contract YAML with ownership, and the breaking-change taxonomy that drives every part after.