Petabyte-Scale Iceberg Lakehouse
Modernize a traditional data lake by implementing an ACID-compliant table format, allowing time-travel queries and schema evolution at massive scale.
PostgreSQL App Events External APIs
| | |
v v v
Debezium Kafka Producer REST/Batch
| | |
+-------+-------+-------+------+
|
v
Apache Kafka
(Message Broker)
|
+-------+-------+
| | |
v v v
Spark Flink Kafka
Stream SQL Connect
| | |
+-------+-------+
|
v
Apache Iceberg
(Table Format)
[REST Catalog]
|
v
MinIO / S3
(Object Storage)
^
+-------+-------+
| | |
Spark Trino Flink
(Batch) (OLAP) (Stream)Fig 1.1: End-to-end Iceberg lakehouse architecture
What You'll Build
Lakehouse Architecture
Complete multi-service setup with Iceberg, Spark, Flink, Trino, and Kafka
Real-time Sync
CDC pipeline from PostgreSQL to Iceberg via Debezium with exactly-once semantics
Multi-Engine Queries
Same tables queryable from Spark, Flink, and Trino simultaneously
Auto Maintenance
Automated compaction, snapshot expiration, and orphan file cleanup
Business Scenario
IceLake Commerce
IceLake Commerce is a fast-growing e-commerce platform processing millions of transactions daily. Their data infrastructure team needs to modernize the analytics platform from a legacy Hive warehouse to a modern Iceberg lakehouse.
Current Challenges
- -Legacy Hive warehouse with performance issues
- -Batch-only analytics with 24-hour data latency
- -No support for updates/deletes (GDPR compliance)
- -Separate systems for different query engines
Your Mission
- -Real-time inventory and sales analytics
- -Multi-engine support for different teams
- -ML features for personalization engine
- -Cost-effective storage with S3-compatible backend
Progressive Learning Path
Each part builds on the previous. Master Iceberg from foundation to production.
Prerequisites
- Docker Desktop (8GB+ RAM allocated)
- Basic SQL knowledge (SELECT, JOIN, GROUP BY)
- Python familiarity for PySpark scripts
- Conceptual understanding of data lakes (helpful)
Related Learning Path
This capstone project is the culmination of the Iceberg Deep Dive skill toolkit. Complete the prerequisite modules first, or dive straight in if you have prior experience.
View Iceberg Skill ToolkitReady to build?