Streaming Fundamentals

Name: Streaming Fundamentals
Price: 29 USD
Availability: InStock
Author: AI-DE Engineering Team

Event-driven architecture, message brokers, and real-time processing foundations.

Every streaming engine — Kafka, Flink, Spark, Pulsar — implements the same primitives: partitions, watermarks, state, delivery semantics. Learn the foundations once, apply them everywhere.

What you’ll be able to do

Understand streaming vs batch processing trade-offs
Build event-driven pipelines with message brokers
Implement windowing, watermarks, and late-data handling
Design reliable streaming architectures with exactly-once semantics

Curriculum

Phase 1: Streaming First Steps

Core concepts and streaming foundations

Streaming First: Events vs Batches

Three quick exercises: what makes a system "streaming," send your first event to a topic, and contrast event-at-a-time vs micro-batch processing.

Streaming vs Batch Architecture

Streaming vs batch trade-offs: latency, throughput, cost, ordering. Why most production stacks run both side-by-side, and how to choose per workload.

Phase 2: Processing Patterns

Windowing, state, and delivery guarantees

Kafka Core: Partitions, Brokers, Topics

Partition strategy, replication factor, broker failure modes, and consumer groups. The Kafka primitives every streaming engine inherits.

Delivery Guarantees & Semantics

At-most-once vs at-least-once vs exactly-once. Idempotent producers, transactional writes, and the 2PC protocol that makes EOS work across systems.

Time, Windows & Watermarks

Event-time vs processing-time, watermark generation, allowed lateness, and tumbling/sliding/session windows. The 4-knob model for late-data handling.

Stateful Stream Processing

Keyed state, RocksDB-backed stores, checkpoint barriers, and incremental snapshots. How streaming engines survive failover without losing state.

Phase 3: Production Streaming

Scaling, monitoring, and real-world patterns

Event-Driven Architecture at Scale

Multi-cluster topologies, schema evolution, dead-letter queues, and back-pressure. The patterns that keep 10K-events/s pipelines from melting at 100K.

Streaming Platform Operations

Lag monitoring, partition rebalancing, capacity planning, and the SLO model for streaming platforms. What an on-call rotation actually does.

What you’ll build

Event-driven pipeline with exactly-once delivery between Kafka and a sink
Windowed aggregation (tumbling + sliding + session) with watermark tuning
Stateful processor with checkpoint + restore against a key-partitioned topic
Operational runbook covering consumer lag, rebalancing, and partition-skew detection

This works in your test cluster… but loses events in production.

Without streaming foundations, you risk:

Pipelines that double-count events under retry, breaking financial dashboards
Late-arriving data silently dropped because watermarks weren't tuned
State stores that grow unbounded until brokers OOM mid-shift
Topology changes that lose committed offsets and replay days of traffic

What is Streaming Fundamentals?

Streaming fundamentals covers the core concepts of real-time data processing: event-driven architecture, message brokers, windowing, watermarks, and delivery guarantees. These foundations apply to every streaming technology — Kafka, Flink, Spark Streaming — and are essential for building systems that process data as it arrives rather than in batch.

Why this matters in production

Real-time systems power fraud detection at Stripe, ride matching at Uber, and recommendations at Netflix. Production streaming requires understanding exactly-once semantics, late-data handling, and backpressure — concepts that determine whether your system processes events reliably or loses data silently.

Common use cases

Building event-driven pipelines that process data in real-time
Implementing windowed aggregations for real-time dashboards and alerts
Designing message broker architectures with proper delivery guarantees
Handling late-arriving data with watermarks and allowed lateness
Creating exactly-once processing pipelines for financial transactions
Monitoring streaming pipeline health with lag and throughput metrics

Streaming vs alternatives

Streaming vs Batch Processing

Streaming processes events as they arrive with low latency. Batch processes data in scheduled intervals with higher throughput. Most production systems use both — streaming for real-time needs, batch for historical analysis.

Streaming vs Micro-Batch

True streaming processes each event individually. Micro-batch (like Spark Streaming) processes small batches at short intervals. Micro-batch is simpler but adds latency compared to true event-at-a-time processing.

Streaming vs CDC Pipelines

Streaming fundamentals provide the foundation for CDC (Change Data Capture) pipelines. CDC captures database changes as events, which streaming systems process. Understanding streaming concepts is prerequisite to building CDC.

Related skills

These fundamentals prepare you for hands-on Kafka work in Kafka & Stream Processing.
Advanced streaming patterns are implemented with Flink in Apache Flink.
Streaming systems require well-designed event schemas from Event-Driven Design.

Why this skill matters

Streaming foundations are the dividing line between mid and senior data engineers. Once you can reason about partitions, watermarks, and delivery semantics — you can debug any streaming engine in production, not just the one you trained on.

Common questions about Streaming

What is stream processing?

Stream processing analyzes and transforms data continuously as events arrive, rather than waiting for batch intervals. It powers real-time dashboards, fraud detection, and event-driven architectures.

When should I use streaming vs batch?

Use streaming when latency matters — fraud detection, real-time alerts, live dashboards. Use batch for historical analysis, large aggregations, and cost-sensitive workloads. Most teams use both.

How long does it take to learn streaming?

Core concepts like windowing and delivery guarantees take 2-3 weeks. Production-level streaming with state management and exactly-once semantics takes 2-3 months of practice.

What is exactly-once processing?

Exactly-once ensures each event is processed precisely one time, even during failures. It requires coordination between source, processor, and sink. It is critical for financial and transactional data.

Do data engineers need streaming skills?

Yes. Streaming is expected for mid-to-senior data engineers. Even batch-focused roles require understanding event-driven patterns as companies adopt real-time architectures.

What tools are used for stream processing?

Apache Kafka for messaging, Apache Flink for complex event processing, Spark Structured Streaming for batch-streaming unification, and cloud services like Kinesis and Pub/Sub.

ai-de.net/Learn/Streaming Fundamentals

StreamingPhase 1 freeFull access in Professional

Streaming Fundamentals

Event-driven architecture, message brokers, and real-time processing foundations.

Last updated 2026-05-22By AI-DE Engineering Team

Every streaming engine — Kafka, Flink, Spark, Pulsar — implements the same primitives: partitions, watermarks, state, delivery semantics. Learn the foundations once, apply them everywhere.

Phases

Modules

Time

~22h video + labs

Continue Learning View phases

Jump to:P1Streaming First Steps P2Processing Patterns P3Production Streaming

What you'll do

What you'll be able to do.

Understand streaming vs batch processing trade-offs
Build event-driven pipelines with message brokers
Implement windowing, watermarks, and late-data handling
Design reliable streaming architectures with exactly-once semantics

Phase roadmap.

Phase 1PRO REQUIRED

Streaming First Steps

Core concepts and streaming foundations

1.1

✓Streaming First: Events vs Batches

Three quick exercises: what makes a system "streaming," send your first event to a topic, and contrast event-at-a-time vs micro-batch processing.

Open →

1.2

✓Streaming vs Batch Architecture

Streaming vs batch trade-offs: latency, throughput, cost, ordering. Why most production stacks run both side-by-side, and how to choose per workload.

Open →

Used in:P02 — Uber Event Platform (system design)P20 — Real-time fraud on Kafka Streams

Start Phase 1 →

Phase 2PRO REQUIRED

Processing Patterns

Windowing, state, and delivery guarantees

2.1

⊘Kafka Core: Partitions, Brokers, Topics

Partition strategy, replication factor, broker failure modes, and consumer groups. The Kafka primitives every streaming engine inherits.

Locked

2.2

⊘Delivery Guarantees & Semantics

At-most-once vs at-least-once vs exactly-once. Idempotent producers, transactional writes, and the 2PC protocol that makes EOS work across systems.

Locked

2.3

⊘Time, Windows & Watermarks

Event-time vs processing-time, watermark generation, allowed lateness, and tumbling/sliding/session windows. The 4-knob model for late-data handling.

Locked

2.4

⊘Stateful Stream Processing

Keyed state, RocksDB-backed stores, checkpoint barriers, and incremental snapshots. How streaming engines survive failover without losing state.

Locked

Used in:P01 — Flink fraud detection P20 — Real-time fraud on Kafka Streams P24 — Real-time fraud feature store

Unlock Phase 2 →

Phase 3PRO REQUIRED

Production Streaming

Scaling, monitoring, and real-world patterns

3.1

⊘Event-Driven Architecture at Scale

Multi-cluster topologies, schema evolution, dead-letter queues, and back-pressure. The patterns that keep 10K-events/s pipelines from melting at 100K.

Locked

3.2

⊘Streaming Platform Operations

Lag monitoring, partition rebalancing, capacity planning, and the SLO model for streaming platforms. What an on-call rotation actually does.

Locked

Used in:P01 — Flink fraud detection P24 — Real-time fraud feature store

Unlock Phase 3 →

This works in your test cluster… but loses events in production.

Without streaming foundations, you risk:

Pipelines that double-count events under retry, breaking financial dashboards
Late-arriving data silently dropped because watermarks weren't tuned
State stores that grow unbounded until brokers OOM mid-shift
Topology changes that lose committed offsets and replay days of traffic

Learn the foundations

What you'll ship

What you'll build.

Event-driven pipeline with exactly-once delivery between Kafka and a sink
Windowed aggregation (tumbling + sliding + session) with watermark tuning
Stateful processor with checkpoint + restore against a key-partitioned topic
Operational runbook covering consumer lag, rebalancing, and partition-skew detection

Definition

What is Streaming Fundamentals?

Production context

Why this matters in production.

Use cases

Common use cases.

Building event-driven pipelines that process data in real-time
Implementing windowed aggregations for real-time dashboards and alerts
Designing message broker architectures with proper delivery guarantees
Handling late-arriving data with watermarks and allowed lateness
Creating exactly-once processing pipelines for financial transactions
Monitoring streaming pipeline health with lag and throughput metrics

Compare

Streaming vs alternatives.

StreamingvsBatch Processing

StreamingvsMicro-Batch

StreamingvsCDC Pipelines

Related curriculum

Related skills.

Why this matters

Why this skill matters.

FAQ

Common questions about Streaming.

Stream processing analyzes and transforms data continuously as events arrive, rather than waiting for batch intervals. It powers real-time dashboards, fraud detection, and event-driven architectures.

Streaming FundamentalsStart Phase 1