What is Apache Kafka? (2026)

Quick answer

Apache Kafka is a distributed event streaming platform. Producers publish events to topics; consumers subscribe and read them independently. Kafka stores events in a durable, ordered log — consumers can replay historical events at any time. Built for millions of events per second with horizontal scalability and sub-second latency. Learn Kafka hands-on at /learn/kafka-streams or design an event platform from first principles with /projects/kafka-event-routing.

What is Apache Kafka?

Apache Kafka was created at LinkedIn to handle their activity stream — tracking every click, view, and interaction across the platform. It was open-sourced in 2011 and became an Apache top-level project in 2012. Today it processes trillions of events daily across the world's largest data platforms.

Unlike traditional message queues that delete messages after delivery, Kafka retains all events in an append-only log. This makes it both a messaging system and a storage system — consumers can read live events or replay historical ones from any point in time.

A Kafka cluster has three core actors. Producers are applications that write events to topics; they're fully decoupled from consumers. Brokers are the Kafka servers that store topic partitions and serve reads/writes — a production cluster has multiple brokers for fault tolerance. Consumers subscribe to topics and read events; consumer groups share partitions for parallel processing, and each group tracks its own offset independently.

SKILL · KAFKA

Master Kafka in 7 hours, hands-on.

From topics and partitions to consumer groups, exactly-once semantics, Schema Registry, and Kafka Streams. Real cluster, real CDC pipelines.

Start learning →

Why does Kafka matter?

Fully decoupled — producers and consumers evolve independently with zero coordination
Consumers fail and restart without losing events because offsets track position in the log
Replay any historical window from the retained log — invaluable for backfills and bug fixes
New consumers added with zero producer changes — one topic feeds many downstream systems
Real-time pipelines process events within milliseconds, end to end
Horizontal scaling — add brokers and partitions to handle millions of events per second

How does Kafka work?

Events flow from producer → broker (partitioned topic) → consumer group → downstream system. Each topic is split into one or more partitions, and each partition is an ordered, immutable log on disk. Producers choose the partition (by key, round-robin, or custom) when they write. Consumers in a group split partitions among themselves — one consumer per partition at most — and commit their offset after processing.

Modern Kafka clusters run in KRaft mode, which removes the long-standing ZooKeeper dependency and lets Kafka manage its own metadata via the Raft consensus protocol. This simplifies operations significantly versus older ZooKeeper-based deployments.

Producing and consuming events with the Python kafka-python client:

from kafka import KafkaProducer, KafkaConsumer
import json

# Producer: publish events to a topic
producer = KafkaProducer(
    bootstrap_servers=['localhost:9092'],
    value_serializer=lambda v: json.dumps(v).encode(),
)
producer.send('user-events', {'user_id': 42, 'action': 'purchase'})

# Consumer: read events from a topic
consumer = KafkaConsumer(
    'user-events',
    bootstrap_servers=['localhost:9092'],
    group_id='analytics-service',
    auto_offset_reset='earliest',  # replay from beginning
)
for msg in consumer:
    process_event(json.loads(msg.value))

Kafka vs RabbitMQ vs Redis Pub/Sub

Feature	Kafka	RabbitMQ	Redis Pub/Sub
Event durability	Yes (disk-backed log)	Yes (queue)	No (in-memory only)
Event replay	Yes	No	No
Multiple independent consumers	Yes (consumer groups)	Limited	Yes (broadcast, no replay)
Throughput per cluster	Millions/sec	Tens of thousands/sec	Millions/sec (ephemeral)
Ordering guarantees	Per partition	Per queue	None
Best fit	Event streaming, CDC, analytics	Task queues, RPC, low-volume async	Live notifications, presence
Setup complexity	Medium	Low	None

Use Kafka when you need durable event storage, replay, or millions of events per second. Use RabbitMQ for task queues, job processing, and low-volume async messaging with flexible routing. Use Redis Pub/Sub for ephemeral, fire-and-forget notifications where message loss is acceptable.

What can you build with Kafka?

Real-time event streaming — clickstream, user activity, IoT sensors, and application logs at millisecond latency
Change data capture (CDC) — stream Postgres, MySQL, or MongoDB changes via Debezium and propagate to lakes and search indexes
Microservices decoupling — replace synchronous API calls with async event publishing so services evolve independently
Real-time analytics — feed OLAP stores (ClickHouse, Druid, Pinot) and warehouses with live event streams for sub-second dashboards
Log aggregation — collect logs from thousands of services into one searchable stream
Stream processing pipelines — connect Kafka to Flink, Spark Streaming, or Kafka Streams for joins, aggregations, fraud detection, and ML inference

PROJECT · KAFKA-EVENT-ROUTING

Design Uber's event platform from first principles.

Scale from 10K to 1B events/day. Decompose requirements, pick delivery guarantees, design tiered storage, and architect the disaster-recovery story.

Open project →

Common mistakes (and what to do instead)

Too few partitions — partitions are the unit of parallelism. One partition means one consumer in a group can do work. Over-partition rather than under-partition; you can't add partitions to a keyed topic later without rebalancing keys.
Wrong retention for the use case — the default is 7 days. Event sourcing or audit logs may need indefinite retention; ephemeral routing may want hours. Set log.retention.hours explicitly per topic.
Using Kafka as a database — Kafka is not a query engine. No ad-hoc queries, joins, or aggregations. Feed real databases, warehouses, or stream processors (Flink, ksqlDB) for query workloads.
Committing offsets before processing — a crash will skip the message permanently. Commit after successful processing, never before.
One topic for everything — mixing event types in a single topic destroys schema evolution, access control, and consumer filtering. Use one topic per event type or domain entity.
Skipping a schema registry — without a Schema Registry (Confluent, Apicurio), producer changes silently break consumers. Pair Kafka with Avro or Protobuf schemas from day one.

Who is Kafka for?

Kafka is built for data engineers, platform engineers, and backend engineers building event-driven systems. If you need durable, replayable event streams at any non-trivial scale, Kafka is the default choice.

Teams that benefit most:

Streaming teams ingesting clickstream, IoT, or transactional events at millions per second
Platform teams running CDC pipelines from operational databases to lakes and search
Backend teams decoupling microservices and moving to event-driven architecture
ML teams feeding real-time features and inference pipelines with sub-second freshness

Frequently asked questions

What is Apache Kafka?

Apache Kafka is a distributed event streaming platform originally built at LinkedIn in 2011. It acts as a durable, high-throughput log that producers write events to and consumers read from. Kafka decouples data producers from consumers, enabling real-time pipelines that can handle millions of events per second.

What is a Kafka topic?

A Kafka topic is a named category or feed where events are published. Topics are split into partitions for parallelism — each partition is an ordered, immutable log of events. Consumers read from topics by tracking their offset (position) in each partition. Topics can retain events for hours, days, or indefinitely.

What is the difference between Kafka and a traditional message queue?

Traditional message queues (RabbitMQ, ActiveMQ) delete messages after they are consumed. Kafka retains all events in an ordered log for a configurable retention period. This means multiple consumer groups can independently read the same events, you can replay historical data, and events are never lost on consumer failure.

What is Kafka used for in data engineering?

Kafka is used for real-time event streaming, change data capture (CDC) from databases, decoupling microservices, log aggregation, clickstream analytics, fraud detection pipelines, and feeding data lakes with real-time events. It is the backbone of most modern streaming data architectures.

What is the difference between Kafka and Spark Streaming?

Kafka is the transport layer — it ingests, stores, and delivers events. Spark Streaming (or Flink) is the processing layer — it reads from Kafka and applies transformations, aggregations, and joins. They are complementary: Kafka is the queue, Spark/Flink is the compute engine.

What to do next

Start shipping.

Three steps from a guide to a job-ready portfolio. Pick one and start now — the rest will follow.

01 · LEARN

Take the skill

Self-paced module with code, exercises, and a deliverable. Free preview, paid completion.

Start S0X · Kafka →

02 · BUILD

Ship the project

Production-grade build with starter kit + mentor code review. The artifact that gets you interviews.

Open P0X · Kafka Event Routing →

03 · COMMIT

Pick a career path

The full progression — skills + projects + interview prep — for the role you actually want.

See paths →

What is Apache Kafka?

Master Kafka in 7 hours, hands-on.

Why does Kafka matter?

How does Kafka work?

Kafka vs RabbitMQ vs Redis Pub/Sub

What can you build with Kafka?

Design Uber's event platform from first principles.

Common mistakes (and what to do instead)

Who is Kafka for?

Frequently asked questions

Start shipping.

Take the skill

Ship the project

Pick a career path

Related guides

What is Apache Flink?

What is Apache Spark?

What is Data Engineering System Design?