Skip to content
ai-de.net/Projects/P02 · Uber Event Platform: Staff Design Portfolio
PRO · module 01 free previewStreaming trackP02

Design
Uber's event platform
from 10K to 1B events/day

A four-part guided redesign: stakeholder requirements → six-layer architecture → tiered SLAs + cost model → observability + DR + mock staff review. You ship a 69-artifact design portfolio you can carry into staff-level system-design rounds.

Timeline
~12h reading · 8-12h self-design
Difficulty
Staff-level
Format
Whiteboard · 69 artifacts

This is the system-design question asked at Uber, DoorDash, Lyft, Stripe and any company routing 100M+ events/day.

By the end you will have
  • 13 requirements (FR-001..005, NFR-001..008) decomposed from 4 stakeholder interviews
  • Capacity model from 10K → 1B events/day with a $100/mo → $34K/mo per-component breakdown
  • Six-layer architecture blueprint — Kafka topics, Iceberg tiers, Flink, Redis, Trino, observability
  • Schema Registry contracts (Avro + Protobuf) and a hot/warm/cold storage tiering YAML
  • Five-pillar observability with SLO + burn-rate alert rules and a DR plan (RTO 5 min, RPO 0)
  • An INDEX of ~30 staff-interview questions mapped to specific portfolio artifacts
PREREQBuilt for senior+ engineers prepping for the staff loop. Comfortable with event-driven architecture, Kafka basics, and one warehouse or lakehouse engine. Not a tutorial — assumes you’ve shipped data platforms before.
uber.event.platform · 6 layers · whiteboard design
1B/day target
Sources
Ingestion
Processing
Storage · serve
rider.app
driver.app
payments
maps.api
4 domains · 1B/d
rider.events.raw
driver.events.raw
maps.location.raw
payments.tx
10 topics · 100 partitions
flink-streamEOS · windowing
flink-enrichjoins · broadcast
spark-batchmedallion fill
dbt-transformgold metrics
EOS · windowing
gold.metricsIceberg · GDPR · Glacier
silver.eventsdedup · S3 IA
redis · MLp99 < 100ms
trino · OLAPp95 < 5s
dbt · metrics< 30 min
5 PB · 3 serving patterns
# Scale path · 100,000×
10K events/d → 1M → 100M → 1B/d
$100/mo → $1.5K → $8K → $34K/mo
27 brokers · 600 partitions · 5 PB Iceberg
→ unit economics defended at every phase
● Tiered SLA · 3 levels
T1: 99.99% · T2: 99.95% · T3: 99.9%
burn-rate alerts: 14.4× · 6× · 1×
RTO 5 min · RPO 0 · GDPR < 72h
→ ~$200K/yr saved by not over-provisioning
1B/d
Peak event target
6
Architecture layers
99.99%
Tier-1 SLA
Why this matters in 2026

Senior+ system-design rounds got harder.

Staff-track loops at the streaming-heavy companies now expect capacity math, SLA tiering, cost reasoning, and erasure design — not just a whiteboard sketch. This project ships the vocabulary you&rsquo;re missing.

System design ≠ box drawing

Uber, DoorDash, Stripe, Confluent loops now ask for partition math, error-budget tradeoffs, and unit economics. Anyone can draw boxes; they pay for the reasoning underneath.

Streaming-first orgs run on event platforms

Uber processes 1T+ events/day; LinkedIn, Stripe, DoorDash all run on Kafka-shaped platforms. The shape of the question is the same — only the constants change.

Cost is the new tradeoff currency

&lsquo;Scalable&rsquo; is table stakes. The real conversation is unit economics — $/event, $/query, $/SLA-tier — and how you defend the next $20K/mo line item.

GDPR + erasure changed the storage stack

Right-to-be-forgotten at 1B events/day forces row-level deletes into the architecture from day one — Iceberg over Hive, BACKWARD-compatible schemas, lineage on by default.

Curriculum · 4 modules · ~12 hours reading

Module 01 is free. The rest unlocks with PRO.

Try the first 2-3 hours — interview the 4 stakeholders, decompose 13 requirements, run the capacity math, score Lambda vs Kappa vs Lakehouse. If the rigor lands, upgrade for the ingestion, serving, and operations modules.

P02 · ~12 hours · 4 modules
Free preview PRO required
Module 01 is free — no card required. Get a feel for the rigor before paying.
M01
Requirements & architecture blueprint
Interview 4 stakeholders (VP Data, Head of ML, Head of Analytics, Compliance). Decompose 13 requirements (FR-001..005, NFR-001..008). Run capacity estimation from 10K to 1B events/day. Score Lambda vs Kappa vs Lakehouse on a weighted matrix. Ship a six-layer blueprint with technology selections.
~3h13 lessonsFREE PREVIEW
Start →
M02
Ingestion & storage architecture
Design 10 Kafka topics with naming convention + partition strategy. Pick CDC patterns for MySQL/Postgres/DynamoDB/HTTP sources. Write Schema Registry contracts with BACKWARD compatibility. Tier storage hot/warm/cold with lifecycle policies. Bake GDPR erasure into the design via Iceberg row-level deletes.
~3h18 lessonsPRO TIER
Unlock with PRO →
M03
Serving layer & scale engineering
Architect three serving patterns: real-time (<100ms via Redis feature store), interactive (<5s via Trino), batch (<30min via Spark + dbt). Design the entity-key schema for ML features. Model the cost from 100K to 1B events/day with per-component breakdown ($34K/mo at 1B/day).
~3h20 lessonsPRO TIER
Unlock with PRO →
M04
Production operations & defense
Design the five-pillar observability stack (freshness, volume, schema, distribution, lineage). Define 3 SLOs with burn-rate alerts (14.4x / 6x / 1x). Write the DR plan, incident runbook, and tiered alert matrix. Defend the whole design in a mock staff-level review with anticipated Q&A.
~3h18 lessonsPRO TIER
Unlock with PRO →
3 modules locked · Unlock all PRO content for $29/mo
Upgrade to PRO →
Backed by curriculum

System Design for Data Engineers

10 modules·16 hours·capacity estimation·SLA tiering·decision matrices·six-layer architecture·cost modeling
Open curriculum

This curriculum is the design vocabulary for the project — not a sales add-on. PRO subscribers get full access to every module.

The design, in 3 checkpoints

Three sprints. Three checkpoints. One defended platform design.

Each phase ends with a tagged set of artifacts you can hand to a reviewer. No ambiguity about where you are in the redesign.

01~3h
Requirements & architecture

Stakeholder interviews complete, 13 requirements decomposed, capacity math done from 10K to 1B/day, pattern decision matrix scored, six-layer blueprint drawn.

  • Requirements doc (FR-001..005, NFR-001..008)
  • Capacity model with $/event at every phase
  • Architecture decision matrix (Lambda vs Kappa vs Lakehouse)
  • Six-layer blueprint with technology selections
02~6h
Ingestion → storage → serving

Kafka topology designed, schema contracts written, storage tiered, three-pattern serving layer architected, cost model finalized at every scale.

  • 10 named Kafka topics + partition strategy + CDC plan
  • Avro / Protobuf schema contracts with BACKWARD compatibility
  • Hot/warm/cold storage tiering YAML + GDPR erasure design
  • Three-pattern serving design (Redis · Trino · Spark+dbt)
  • Feature store entity-key schema + freshness config
  • $34K/mo cost model at 1B/day with per-component breakdown
03~3h
Operations & defense

Observability designed, SLOs and burn-rate alerts defined, DR plan written, mock staff review presented and defended. INDEX of interview questions complete.

  • Five-pillar observability design
  • SLO + error-budget framework + alert rules YAML
  • DR plan (RTO 5 min, RPO 0) + incident runbook
  • Mock staff-review presentation with anticipated Q&A
  • INDEX of ~30 interview questions → 69 artifacts
Project setup · 5 minutes

Download your 69-artifact design portfolio.

There&rsquo;s nothing to install — this is whiteboard-only. Grab the case-study bundle and open INDEX.md to start the 6-step interview rehearsal.

What lives in the bundle

Every artifact you&rsquo;ll ship across the four parts, organized by part folder, with an INDEX mapping ~30 staff-level interview questions to the specific files that answer them.

  • part-1/ — 13 artifacts: requirements, capacity model, decision matrix
  • part-2/ — 18 artifacts: topic strategy, schema contracts, storage tiering
  • part-3/ — 20 artifacts: serving design, feature store, cost model
  • part-4/ — 18 artifacts: SLO framework, alert rules, DR plan
  • INDEX.md — ~30 interview questions → artifact map
  • interview-walkthrough.md — 6-step, 60-min staff-level rehearsal
Download · Case-Study Bundle

Uber Event Platform: Design Portfolio

69 artifacts, 11 YAML configs, INDEX of 30 interview questions, 6-step interview walkthrough. Reference material for staff-level system-design rounds.

~80 KB · 69 artifacts · 11 YAML configs · PRO required
~/portfolio/kafka-event-routing — zsh
1. Unzip the bundle
$ unzip kafka-event-routing-case-study.zip
$ cd kafka-event-routing-case-study
2. Open the INDEX
$ open INDEX.md # 30 interview questions → artifacts
3. Walk the interview rehearsal
$ open docs/interview-walkthrough.md # 6-step, 60-min staff drill
4. Browse artifacts by part
$ ls -R part-*/ # 69 artifacts across 4 parts
69
Artifacts
11
YAML configs
30
Interview Q&As
6
Walkthrough steps
Production hardening

The same blueprint — built for the real failure modes.

Most Kafka system-design write-ups stop at the happy path. The table below pairs every simplification we made on the whiteboard with what a real implementation would actually need — the answers a staff principal will press you on.

Tutorial designWhat we drew on the whiteboard
×
Schema compatibility
BACKWARD on every subject
×
Failover
Single Kafka cluster, multi-AZ
×
Storage lifecycle
S3 Standard → IA → Glacier
×
SLO scope
3 platform-wide burn-rate alerts
×
Observability
Five-pillar stack with Prometheus
×
GDPR erasure
Iceberg row-level delete
Production add-onWhat you&rsquo;d ship next
Schema compatibility
FULL on Tier-1, per-subject compatibility-mode enforced in CI
Failover
MirrorMaker2 to passive region + DR runbook with quarterly failover drill
Storage lifecycle
+ cross-region replication + erasure-aware rewrite_data_files
SLO scope
Per-tenant SLOs (one Tier-1 contract per consumer) + budget chargeback
Observability
+ OpenTelemetry distributed tracing for end-to-end event lineage
GDPR erasure
+ erasure-aware materialized-view rebuild + downstream cache eviction
PRO benefit · design review

Real review from staff principals who run event platforms.

Submit your portfolio bundle, get the kind of pushback you&rsquo;d hear in an actual staff loop — partition math, error-budget tradeoffs, vendor decisions, cost defense.

DR

4 design reviews / month

Submit your portfolio bundle, a single artifact, or a redesign proposal. Reviewer is matched to your domain — Kafka / Iceberg / observability for this project. Async, comments inline, average turnaround 31 hours.

31h
avg turnaround
9.2/10
helpfulness
94%
return next month
OH

2 mock staff interviews / month

Live 30-min sessions with a staff-level engineer. Defend your design against the questions you&rsquo;ll actually hear: partition math, EOS guarantees, cost-vs-latency tradeoffs. Group sessions also available.

30 min
per session
2 / mo
included
+ group
unlimited
What PRO unlocks

One subscription. 15+ projects, all curriculum, design review.

PRO is built for senior+ engineers who want production-grade builds and feedback loops — not more tutorials.

What you getFREEPROEXPERT
Projects
Production-grade builds + design
2
15+
8
Curriculum modules
All 7 tracks
Phase 1 only
All
All + bonus
Review credits
Senior+ engineer review
0
4 / month
Unlimited
Career path access
5 paths × full plans
1 path
All 5
All 5 + 1:1
Certificate
Verifiable on LinkedIn
Yes
Yes + portfolio review
Community
Discord + office hours
Read-only
Full + 2/mo
Full + 4/mo
$29/mo
billed monthly · cancel anytime
or annual
$249/yr save 28%
Upgrade to PRO
Who this is for

Pick this if you’re defending designs, not learning them.

ST

Staff-track senior engineers

You&rsquo;re prepping for the Uber / DoorDash / Stripe staff loop. You can ship a feature; what you need is the design vocabulary the system-design panel expects.

TL

Tech leads driving streaming migration

You need to defend a Kafka platform redesign in front of leadership. Capacity math, SLA tiering, cost model — the parts you can&rsquo;t afford to fudge.

PA

Platform architects

You run streaming for 10+ teams. You want a reusable framework for event-platform decisions: topic taxonomy, partition counts, schema policies.

SE

Senior engineers crossing batch → streaming

You know the warehouse cold; the streaming side feels like a different planet. This gives you the architecture grammar for routing 1B events/day.

FAQ

Quick answers.

No — this is a design exercise. You&rsquo;ll ship YAML schema contracts, decision matrices, capacity spreadsheets, alert rule files, and runbook drafts. The accompanying skill toolkits (Kafka Streams, Flink, Iceberg) are where you build the things this project designs.
Yes. Stakeholder interviews, 13 requirements decomposed, capacity estimation, and the architecture decision matrix. About 2-3 hours. By the end you can run the same exercise on a different domain.
P01 is a build project: you write Flink code that runs on Kafka events. This project is the design portfolio — the staff-level reasoning that justifies why a Flink pipeline at all, what topics feed it, and what it costs at 1B/day. They pair: design here, build there.
Each is name-checked in the architecture but not deep-dived — they&rsquo;re tools you reach for in implementation, and this project&rsquo;s lane is the platform-shape decisions above them. The hardening section maps where each one would slot in.
That&rsquo;s the explicit target. The case-study bundle includes an INDEX of ~30 questions ('How do you size partitions for 1B events/day?', 'How do you defend $34K/mo platform cost?') each mapped to a specific artifact you produced. Plus a 6-step, 60-minute interview rehearsal.
All 15+ PRO projects, 4 design-review credits per month, 2 mock-interview sessions, full curriculum across all 7 tracks, all 5 career paths, certificate of completion, and full community access. Cancel anytime.

Ready to architect a real event platform?

Start with module 01 — free, no card. Decompose the requirements, run the capacity math, score Lambda vs Kappa vs Lakehouse on a weighted matrix. About 2-3 hours.

P02 · Uber Event Platform · PRO · module 01 freeUpgrade to PRO →
Press Cmd+K to open