Build a
production-grade
data-contract platform on ODCS + Schema Registry
ODCS v2.2 contracts with semantic versioning, dual validation (Great Expectations + Soda), Avro + Confluent Schema Registry with BACKWARD/FORWARD/FULL compatibility enforced in GitHub Actions, PII detection feeding a 4-tier classification model, RBAC policy-as-code with row-level security, an append-only audit log with integrity hashing, and SOC2 + GDPR check engines wired to a governance bot — all on a payments + risk-assessments domain across 3 teams.
This is the platform-design question asked at Stripe, Airbnb, Spotify, GoCardless and any company running shared data across producer/consumer teams under SOC2 or GDPR.
- Two ODCS v2.2 YAML contracts (payments_events, risk_assessments) with semantic versioning and a contract registry
- Dual validation pipeline: Great Expectations expectations + Soda checks producing a single PASS/WARN/FAIL gate decision
- Avro + Confluent Schema Registry with BACKWARD/FORWARD/FULL compatibility, plus a CDC schema-evolution handler with a dead-letter queue
- GitHub Actions PR gate that blocks breaking changes before merge — same pattern Confluent and Spotify run
- RBAC policy-as-code (4 roles × 4 sensitivity tiers) with row-level filtering, plus an append-only audit log with cryptographic integrity hashing
- SOC2 (CC6.1/CC7.2/CC7.3/CC6.5) + GDPR (Art 5/7/15/17) check engines, a governance bot that reviews PRs, and a written governance charter with a domain-ownership map
Schema breaks and PII leaks are the top-two incidents platform teams ship for in 2026.
The patterns you wire here — contracts as code, dual validation, compatibility-gated CI, classification-driven RBAC, hashed audit logs, compliance bots — are what every senior data-engineering rubric now checks for.
ODCS is the emerging contract standard
Bitol's Open Data Contract Standard (v2.2 in this project) is the format Spotify, Airbnb, and GoCardless converged on. Building one teaches you what the spec actually solves.
Compatibility-gated CI is table stakes
Confluent Schema Registry with BACKWARD compatibility blocks breaking changes at the registry, not at 3am. The CI gate is the difference between a 5-min PR review and a 5-hour incident.
Classification drives access, not docs
A column tagged RESTRICTED is automatically denied to the analyst role — without a Jira ticket, without a wiki page. Policy-as-code is what shifts governance left.
SOC2 + GDPR are no longer optional
Auditors now ask for evidence, not promises. A hashed audit log + automated CC6.1 / Art 17 checks is the evidence they want to see in your Type II report.
Module 01 is free. The rest unlocks with PRO.
Try the first 3-4 hours — author your first ODCS contract, wire the Great Expectations + Soda dual gate, and watch the drift detector catch a renamed column. If it clicks, upgrade to unlock enforcement, classification, and compliance modules.
Governance & Data Contracts
This curriculum is the foundation for the project — not a sales add-on. PRO subscribers get full access to every module.
Three sprints. Three checkpoints. One governance platform.
Each phase ships runnable artifacts, not slides. Tagged commits at every checkpoint.
Two ODCS contracts with semantic versioning. ContractRegistry, VersionManager, MigrationManager. GE + Soda dual gate emitting a single PASS/WARN/FAIL decision. SchemaDriftDetector with severity classification.
- ✓contracts/payments_events.yaml + risk_assessments.yaml (ODCS v2.2)
- ✓ContractRegistry + SemanticVersion + ChangeType bump recommender
- ✓PipelineValidator (GE + Soda) → GateDecision + DriftAlerter to Slack
Confluent Schema Registry stack via docker-compose. Avro evolution with BACKWARD/FORWARD/FULL compatibility. GitHub Actions PR gate. Blast-radius analysis across Risk + Analytics consumers. CDC handler with DLQ for unsafe transformations.
- ✓docker-compose.yml + RegistryClient + AvroSchemaEvolution
- ✓.github/workflows/contract-check.yml (PR gate)
- ✓BreakingChangeSimulator + ConsumerImpactAnalyzer + CDCSchemaEvolutionHandler
4-tier PII classification with RBAC + row-level security. Hashed audit log. SOC2 + GDPR check engines (8 controls). Governance bot reviewing PRs. Auto-remediation router. Slack Block Kit notifier. Grafana KPI dashboard. Written governance charter + domain-ownership map.
- ✓PIIDetector + DataClassifier + ColumnLineageGraph + BlastRadiusCalculator
- ✓PolicyEngine + RowSecurityEngine + AuditLogger (with integrity hash)
- ✓UnifiedComplianceEngine + GovernanceBot + AutoRemediationEngine + governance_charter.md
One starter kit. 67 pre-built files. Sample data with intentional quality issues.
The starter kit ships every module wired and importable — ODCS contracts, Python modules for registry / validation / drift / classification / compliance, an Avro schema, GitHub Actions workflow, docker-compose stack, and synthetic CSVs with planted quality bugs so the validators have something to fail.
What lives in the repo
Everything you need to run the four modules locally — including the Confluent Platform docker stack, the GitHub Actions workflow, and the sample datasets that exercise drift, PII, and compatibility paths.
- contracts/ — 2 ODCS v2.2 YAML contracts (payments_events, risk_assessments)
- src/ — ContractRegistry, VersionManager, drift_detector, break_simulator, impact_analyzer, compliance_engine, governance_bot
- governance/ — PIIDetector, DataClassifier, LineageGraph, RBAC PolicyEngine, AuditLogger, policies as YAML
- validation/ — Great Expectations suite + SodaCL checks + PipelineValidator
- schemas/ + docker/ — Avro schema + docker-compose for Confluent Platform 7.5
- .github/workflows/ — contract-check.yml: schema discovery, compatibility check, GE validation on every PR
Data Governance & Contracts Starter Kit
Pre-built repo with all 4 modules wired — 2 ODCS contracts, the Python registry + validators + drift detector + PII detector + RBAC engine + compliance checkers, the Avro schema, the GitHub Actions workflow, the docker-compose stack, and 4 synthetic CSVs with intentional quality bugs.
The same governance — but built for the cross-team case.
Most governance tutorials show you a great_expectations.yml in isolation. This one shows what changes when three teams share the same contract and an auditor wants evidence — not promises.
contracts/*.yaml ODCS v2.2 + ContractRegistry — the registry is the spec, not the READMEBACKWARD compat enforced at Schema Registry + GH Actions PR gate — caught at git push, not at 3amPIIScanner + 4-tier DataClassifier + tier-aware policy — evaluated, not documentedPolicyEngine + RowSecurityEngine from YAML — policy-as-code, evaluated per queryAuditLogger append-only with cryptographic integrity_hash — tamper-evidentUnifiedComplianceEngine emits timestamped JSON for CC6.1/CC7.2/CC7.3/CC6.5 + Art 5/7/15/17 on every runReal review from senior engineers who shipped this stack.
Submit your repo, get line-by-line feedback within 48 hours. The kind of review that's quietly worth thousands of dollars in time-to-staff.
4 reviews / month
Submit a repo, a PR, or a contract design proposal. Reviewer is matched to your domain — governance / contracts / compliance for this project. Async, comments inline, average turnaround 31 hours.
2 office hours / month
Live 30-min sessions with a senior platform engineer. Walk a tricky contract design, mock a SOC2 readiness review, whiteboard an incident-runbook with policy-as-code. Group sessions also available.
One subscription. 15+ projects, all curriculum, code review.
PRO is built for engineers who want production-grade builds and feedback loops — not more tutorials.
Pick this if you’re the engineer the auditor and the producer’s PR both end up asking.
Platform / data-platform engineers
You own the contract layer between domain teams. This gives you the registry, PR gate, lineage, and compliance bot — the four levers a platform team actually pulls.
Senior data engineers prepping interviews
Cross-team contracts and compliance show up in every senior+ system-design round. After this you can defend a contract architecture without hand-waving evidence.
Compliance / governance engineers
You're trying to get out of spreadsheets. Module 04 wires SOC2 (CC6.1/CC7.2/CC7.3/CC6.5) and GDPR (Art 5/7/15/17) into automated checks — the evidence Type II reviewers actually want.
Tech leads scaling data teams
Three teams, one shared dataset, no contract → outage. The governance charter + domain-ownership YAML in Module 04 is the operating model you can adopt verbatim.
Going deeper? Four tracks back this project.
Governance is the spine. These four curriculums let you go deeper on the layers that show up in modules 02-04 — modeling for the contract shape, observability for the gate signals, dbt for the producer side, and cloud-fundamentals for the production deploy.
Quick answers.
Ready to build a real governance platform?
Start with module 01 — free, no card. About 3-4 hours. By the end you'll have two ODCS contracts authored, the GE + Soda dual gate emitting decisions, and the drift detector catching a renamed column with a Slack alert.