Skip to content
Back to Projects
Data Governance~15 hours
Enterprise Project — Implement Data Contracts Across 3 Teams

Enterprise Data Governance & Contracts

Implement data contracts between software engineers and data engineers to prevent upstream schema changes from breaking downstream pipelines.

CONTRACTSSchema Definitions
VALIDATIONQuality Gates
ENFORCEMENTCI/CD + Registry
COMPLIANCEPII + RBAC + Audit
DataFlow Corp Governance \u2014 Progressive Build
1
Part 1: Schema Contracts & Validation
6/24
2
Part 2: Cross-Team Contract Enforcement
12/24
3
Part 3: PII Classification & Access Control
18/24
4
Part 4: Enterprise Governance & Compliance
LIVE Governance Platform

What You'll Build

A complete data governance platform for DataFlow Corp, a fintech company with 3 data teams (Payments, Risk, Analytics) that need contract enforcement.

Schema Contracts

YAML-based data contracts using the Open Data Contract Standard with semantic versioning and validation rules

ODCS Spec

Breaking Change Detection

Simulate and catch destructive schema changes before they reach production with automated compatibility checks

94% fewer incidents

PII Detection & Classification

Scan columns for sensitive data patterns, classify by sensitivity tier, and enforce data handling policies

4-tier system

RBAC & Policy-as-Code

Role-based access control with policies defined in code, not spreadsheets. Automated enforcement across all datasets

Zero manual gates

Column-Level Lineage

Track data flow at the column level using OpenLineage to understand blast radius of any schema change

OpenLineage

SOC2 & GDPR Compliance

Automated compliance checks for audit logging, encryption, right-to-deletion, and data subject access requests

Audit-ready

Progressive Build Path

Each part builds on the previous. You'll go from defining contracts to running a production governance platform.

Part 13–4 hours

Foundation — Schema Contracts & Validation Framework

Define YAML data contracts using the ODCS spec, build a schema validation framework with Great Expectations and Soda, and implement automated drift detection across DataFlow Corp’s three team boundaries.

YAML data contracts (ODCS format)Schema versioning with semantic rulesGreat Expectations validation suiteSoda quality checks pipeline+2 more
6/6 items complete — Validation framework ready
Part 24–5 hours

Enforcement — Breaking Changes & Cross-Team Contracts

Simulate breaking schema changes across producer-consumer boundaries, enforce backward compatibility with Avro and the Confluent Schema Registry, and build CI/CD contract validation in GitHub Actions.

Breaking change simulation engineAvro schema evolution rulesSchema Registry compatibility checksCI/CD contract validation pipeline+2 more
12/12 items complete — Cross-team enforcement live
Part 33–4 hours

Classification — PII Tagging & Access Control

Build a PII detection pipeline that scans column metadata and content patterns, implement sensitivity classification tiers, and enforce role-based access control with policy-as-code.

PII detection engine (regex + NLP)Data classification framework (4 tiers)Column-level lineage graphRBAC policy engine+2 more
18/18 items complete — PII governance operational
Part 43–4 hours

Production — Enterprise Governance & Compliance

Deploy the full governance platform with SOC2/GDPR compliance checks, build a monitoring dashboard, implement governance automation, and ship a production-ready enterprise governance charter.

SOC2 & GDPR compliance engineGovernance monitoring dashboardAutomated policy enforcement botEnterprise governance charter+2 more
LIVE governance platform deployed
Total Time: ~15 hours

Download Sample Data

DataFlow Corp's data — 410K+ records across 4 team datasets

payments_events.csv
250K records · 35 MB
Payment transaction events from the Payments team
risk_assessments.csv
80K records · 12 MB
Risk scoring and fraud flag data from the Risk team
analytics_reports.csv
50K records · 8 MB
Aggregated analytics datasets from the Analytics team
customer_profiles.csv
30K records · 5 MB
Customer PII data for classification testing

Or generate synthetic data using our Python script

Tech Stack You'll Master

PythonLanguage
dbtContracts
Great ExpectationsValidation
Soda CoreQuality
AvroSerialization
Schema RegistryVersioning
OpenLineageLineage
GitHub ActionsCI/CD
DockerInfrastructure
KafkaStreaming
SnowflakeWarehouse
DataHubMetadata

Why Data Governance?

Data governance is the #1 gap between "data engineer" and "senior data engineer." Companies like Stripe, Netflix, and Uber invest heavily in contract enforcement.

Staff-Level Differentiator

Governance projects demonstrate enterprise thinking. 78% of staff+ DE job postings mention data quality or governance.

Compliance is Non-Negotiable

SOC2, HIPAA, GDPR, CCPA. Every regulated industry needs engineers who understand compliance architecture.

Cross-Team Leadership

Governance spans team boundaries. This project proves you can design systems that coordinate 3+ teams.

Resume-Ready Portfolio Project

Add these bullet points to your resume after completing the project:

  • Built enterprise data governance platform enforcing schema contracts across 3 teams, reducing breaking changes by 94%
  • Designed PII detection and classification pipeline processing 400K+ records with automated RBAC enforcement
  • Implemented CI/CD contract validation with Avro Schema Registry, achieving zero breaking deployments over 6 months
  • Created SOC2/GDPR compliance engine with automated audit trails, policy-as-code, and real-time governance dashboards
Completion certificate included

Prerequisites

Python & SQL Proficiency

Required

Functions, classes, type hints. Comfortable writing SQL DDL and DML statements.

Basic Pipeline Experience

Required

Familiarity with dbt, Airflow, or Spark. Understanding of batch vs streaming data flows.

Docker & Git

Required

Comfortable running Docker containers and using Git for version control.

Enterprise Compliance Exposure

Helpful

Prior experience with SOC2, HIPAA, or GDPR is helpful but not required.

Ready to Build Enterprise Governance?

Start with Part 1: Schema Contracts & Validation Framework

Press Cmd+K to open