Skip to content
ai-de.net/Projects/P20 · Real-time fraud detection on Kafka Streams
PRO · module 01 free previewStreaming trackP20

Build a
real-time
fraud detection topology on Kafka Streams

Ship a stateful Kafka Streams pipeline with KRaft + Avro + Schema Registry, 5-tier risk-score branching, KStream-KTable enrichment joins, Spring Boot Interactive Queries REST, and Strimzi K8s manifests with Exactly-Once Semantics v2 — no Flink cluster required.

Timeline
11-14 hours
Difficulty
Mid → Senior
Stack
Kafka Streams · Strimzi · Avro · Spring Boot

This is the system-design question every streaming team gets asked at Pinterest, Walmart, NYT and any Kafka-centric shop: how do you build a stateful, exactly-once fraud detection pipeline without standing up a separate Flink cluster?

By the end you will have
  • Single-broker KRaft Kafka + Confluent Schema Registry + Avro Transaction schema running in Docker
  • 5-tier risk-score branch topology (RuleFilters + VelocityChecker 5-min tumbling + BranchProcessor) emitting 12+ output topics
  • DLQ pattern with custom Processor for validation errors + alert severity routing (CRITICAL / HIGH / MEDIUM)
  • KStream-KTable enrichment joins (customer profile + merchant reputation) with materialized state stores
  • Spring Boot Interactive Queries REST API querying RocksDB-backed state stores
  • Strimzi 3-broker kafka-cluster.yaml + EOSConfiguration.java (EOS v2) + Grafana dashboard JSON + AlertManager rules
PREREQJava fundamentals (classes, streams API, lambda expressions), Docker basics, Kafka basics (topics, producers, consumers), and Kubernetes concepts (pods, services). Not an introduction to Kafka — assumes you know what a partition is.
streamguard.fraud_detection · run 8c4f2e
EOS v2
Producers
Kafka · Avro
Topology
State · sinks
checkout.svc
auth.svc
merchant.svc
session.svc
Avro · auto-register
kraft-brokerno zookeeper
schema-registry4 Avro subjects
transactionsinput topic
KRaft · TLS · ACL
RuleFiltersstateless
VelocityChk5-min tumb
BranchProc5 risk tiers
DLQHandlervalidation
KStream · KTable
RocksDBKTable · sessions
REST · IQspring-boot
12+ topicsalerts · dlq
Connect · ES · PG
Connect JSON · ES · PG
# EOS v2 · transactional commit
PROCESSING_GUARANTEE_CONFIG=
EXACTLY_ONCE_V2
idempotent producer · read_committed
→ no duplicate fraud alerts on rebalance
● 5-tier risk branch · split()
score > 90 → critical-fraud-alerts
70..89 → manual-review-queue
40..69 → require-authentication
→ 12+ topics · DLQ · severity routing
5-tier
Risk-score branch
12+
Output topics
EOS v2
Transactional commit
Why this matters in 2026

Kafka Streams is the library answer to streaming.

Most teams reach for Flink and inherit a separate cluster, separate scheduler, separate ops surface. Kafka Streams runs in-process — same JAR, same K8s deployment — and that's the choice Pinterest, NYT, and Walmart made.

In-process > separate cluster

Kafka Streams is a library, not a runtime. You package one Java app, deploy it on K8s, and the topology is the program. No JobManager, no TaskManagers, no separate infra to operate.

Stateful processing without Flink

RocksDB state stores + changelog topics give you durable, queryable state. KTable joins, session windows, and tumbling aggregations all work without a checkpoint coordinator.

Exactly-once that ships

EOS v2 = transactional producers + read_committed consumers + idempotent state. PROCESSING_GUARANTEE_CONFIG=EXACTLY_ONCE_V2 is one config line; the rest is correctness.

Strimzi turns the messy parts into YAML

3 brokers, transaction-state replication, TLS listeners, HPA on the streams app — all declarative. The K8s operator pattern is how Kafka actually runs in production now.

Curriculum · 4 modules · 11-14 hours

Module 01 is free. The rest unlocks with PRO.

Try the first 2-3 hours — stand up KRaft Kafka, register an Avro schema, ship a producer + consumer in real Java. If it clicks, upgrade to unlock the fraud topology, KTable joins, and the Strimzi packaging.

P20 · 11-14 hours · 4 modules
Free preview PRO required
Module 01 is free — no card required. KRaft + Schema Registry + your first Kafka Streams topology in Java.
M01
Local setup: KRaft Kafka + Schema Registry + Avro producer
Stand up Kafka in KRaft mode (no ZooKeeper) via Docker Compose with Confluent Schema Registry and Kafka UI. Author the Transaction.avsc Avro schema. Ship a TransactionProducer with auto-registration and a BasicConsumer with a windowed aggregation by category.
2.5h8 lessonsFREE PREVIEW
Start →
M02
Stateful fraud detection: rules, windows, branches, DLQ
Build RuleFilters (stateless predicates), VelocityChecker (5-minute tumbling windows with groupByKey().count()), BranchProcessor (5-tier risk score routing via split().branch()), and a custom DLQ Processor for validation errors. Ship 12+ output topics including critical-fraud-alerts.
2.5h9 lessonsPRO TIER
Unlock with PRO →
M03
Enrichment: KTable joins, sessions, Interactive Queries REST
Build CustomerProfileTable + MerchantReputationTable as KTables backed by changelog topics. Author EnrichmentJoins with KStream-KTable leftJoin for context-aware fraud scoring. Add a Spring Boot REST controller exposing the materialized state stores via Interactive Queries. Ship Kafka Connect JSON configs for JDBC + Elasticsearch sinks (configs ship; live sinks are a guided exercise).
3h9 lessonsPRO TIER
Unlock with PRO →
M04
Production packaging: Strimzi + EOS v2 + Grafana + AlertManager
Author kafka-cluster.yaml as a Strimzi Kafka custom resource (3 brokers, transaction-state replication, TLS). Wire EOSConfiguration.java with PROCESSING_GUARANTEE_CONFIG=EXACTLY_ONCE_V2 + idempotent producer + read_committed consumer. Ship a Grafana dashboard JSON, AlertManager routing rules, and a deploy.yml CI/CD pipeline. Failure scenarios shipped as documented runbooks.
3h8 lessonsPRO TIER
Unlock with PRO →
3 modules locked · Unlock all PRO content for $29/mo
Upgrade to PRO →
Backed by curriculum

Kafka Streams Learning Path

8 modules·~12 hours·KRaft + Avro·windowed aggregations·KTable joins·session windows·EOS v2
Open curriculum

This curriculum is the foundation for the project — not a sales add-on. PRO subscribers get full access to every module.

The build, in 3 phases

Three sprints. Three checkpoints. One fraud detection topology.

Each phase ends with a tagged commit and a working artifact. No ambiguity about where you are in the build.

01~5h
Ingest + detect (parts 1 + 2)

KRaft Kafka + Schema Registry + Avro producer running locally. 5-tier risk-score branch topology with VelocityChecker (5-min tumbling) + DLQ + alert severity routing emitting to 12+ topics.

  • docker-compose.yml + Transaction.avsc + TransactionProducer
  • RuleFilters + VelocityChecker + BranchProcessor + DLQHandler
  • AlertStream merging 3 alert streams by severity
02~3h
Enrich + query (part 3)

KTable joins for customer + merchant context. Spring Boot Interactive Queries REST API exposing materialized state stores. Kafka Connect JSON configs for JDBC + ES sinks.

  • CustomerProfileTable + MerchantReputationTable
  • EnrichmentJoins with leftJoin + combined fraud score
  • Spring REST QueryController + Connect connector JSONs
03~3h
Package for K8s (part 4)

Strimzi 3-broker kafka-cluster.yaml + EOSConfiguration.java with EOS v2 + Grafana dashboard JSON + AlertManager routing rules + deploy.yml CI/CD.

  • Strimzi Kafka custom resource + HPA
  • EOSConfiguration.java (EXACTLY_ONCE_V2)
  • Grafana JSON + AlertManager rules + deploy.yml
Project setup · 10 minutes

One command. Local KRaft Kafka + Schema Registry + sample fixtures.

You get the full streaming stack on day one — KRaft Kafka (no ZooKeeper), Confluent Schema Registry, Kafka UI for inspection, and 1k pre-built sample transactions across the schema you'll author in module 01.

What lives in the repo

Everything you need to run the Kafka Streams fraud topology on your laptop, plus the Avro schemas and verification fixtures used in modules 02–04.

  • docker-compose.yml — KRaft Kafka, Schema Registry, Kafka UI
  • src/main/avro/ — Transaction.avsc + 4 Avro schemas
  • src/main/java/com/streamguard/ — producers, consumers, fraud topology
  • src/main/java/com/streamguard/fraud/ — RuleFilters, VelocityChecker, BranchProcessor, DLQHandler
  • k8s/ — Strimzi kafka-cluster.yaml + HPA + ConfigMap manifests
  • monitoring/ — Grafana dashboard JSON + AlertManager routing rules
Download · Starter Kit

StreamCart Analytics Starter Kit

Kafka Streams DSL scaffolds, 4 Avro schemas, Schema Registry registration helper, 1k sample transactions, K8s + Strimzi manifests, Grafana dashboard JSON, and a smoke-test script. `bash scripts/smoke_test.sh` runs after `docker compose up -d`.

Java sources + Avro + K8s YAML + Grafana JSON · PRO required
~/projects/streamcart-analytics — zsh
1. Clone and start the stack
$ git clone github.com/ai-de/p20-streamcart-analytics
$ cd p20-streamcart-analytics && docker compose up -d
2. Compile + register Avro schemas
$ mvn compile
$ bash scripts/register_schemas.sh
$ # ✓ Transaction, CustomerProfile, MerchantProfile, Alert registered
3. Run the producer + smoke test
$ mvn exec:java -Dexec.mainClass=com.streamguard.TransactionProducer &
$ bash scripts/smoke_test.sh
$ # ✓ 1,000 events produced · 5-tier branching ✓
4. Open Kafka UI / Schema Registry
$ open http://localhost:8080 # Kafka UI
$ open http://localhost:8081/subjects # Schema Registry
1,000
Sample transactions
4
Avro schemas
12+
Output topics
5-tier
Risk-score branch
Production hardening

The same topology — but built for the 10x case.

Most Kafka Streams tutorials hand you a KStream.foreach(). This one shows what changes when EOS v2 has to survive a broker reboot, the schema registry rejects a renamed field, and the alert routing actually pages a human.

Tutorial-gradeWhat you have today
×
Kafka cluster
Single-broker KRaft container
×
Notification channels
Routing rules + adapter skeletons
×
Deploy
K8s YAML applied by hand
×
State store recovery
Single replica, in-memory restore
×
Schema Registry
Local container, no auth
×
Chaos labs
Documented runbooks
Production-gradeModule 03–04
Kafka cluster
3-broker Strimzi MSK with rack-aware partitions and min.insync.replicas=2
Notification channels
PagerDuty / Slack / Email clients with retry + circuit breaker per channel
Deploy
ArgoCD GitOps with progressive delivery and kubectl rollout status gates
State store recovery
Standby replicas (num.standby.replicas=1) + remote checkpoints on EBS
Schema Registry
Confluent Cloud with subject ACLs + FULL_TRANSITIVE compatibility
Chaos labs
Scheduled chaos-mesh experiments: broker kill, network partition, disk full
PRO benefit · code review

Real review from senior engineers who shipped this stack.

Submit your repo, get line-by-line feedback within 48 hours. The kind of review that's quietly worth thousands of dollars in time-to-staff.

CR

4 reviews / month

Submit a repo, a PR, or a refactor proposal. Reviewer is matched to your domain — Kafka Streams / Strimzi / EOS for this project. Async, comments inline, average turnaround 31 hours.

31h
avg turnaround
9.2/10
helpfulness
94%
return next month
OH

2 office hours / month

Live 30-min sessions with a senior data engineer. Architecture questions, whiteboard a tricky migration, mock a system-design interview. Group sessions also available.

30 min
per session
2 / mo
included
+ group
unlimited
What PRO unlocks

One subscription. 15+ projects, all curriculum, code review.

PRO is built for engineers who want production-shaped builds and feedback loops — not more tutorials.

What you getFREEPROEXPERT
Projects
Production-grade builds
2
15+
8
Curriculum modules
All 7 tracks
Phase 1 only
All
All + bonus
Code review credits
Senior engineer review
0
4 / month
Unlimited
Career path access
5 paths × full plans
1 path
All 5
All 5 + 1:1
Certificate
Verifiable on LinkedIn
Yes
Yes + portfolio review
Community
Discord + office hours
Read-only
Full + 2/mo
Full + 4/mo
$29/mo
billed monthly · cancel anytime
or annual
$249/yr save 28%
Upgrade to PRO
Who this is for

Pick this if you’re shipping streaming in Java, not just learning Kafka concepts.

SE

Streaming engineers

You’ve consumed from a Kafka topic, but the stateful side — KTable joins, sessions, EOS — is still abstract. This makes it concrete in real Java.

DE

Data engineers

You run a batch warehouse but the org wants real-time fraud scoring. You need a streaming pattern you can defend without buying into the Flink operator stack.

PE

Platform engineers

You operate Kafka for 5+ teams. You want a Strimzi-on-K8s reference your downstream teams can copy without burning your ops budget on cluster sprawl.

BE

Backend engineers crossing over

You write Java services. The data side is opaque. This bridges from Spring Boot REST → Kafka Streams topology in language you already speak.

FAQ

Quick answers.

Module 01 ships KRaft Kafka + Confluent Schema Registry + Avro Transaction schema + a real Java TransactionProducer with auto-registration + a BasicConsumer with a windowed groupBy aggregation. 8 lessons, real `mvn compile`-able code. Most free Kafka tutorials show you `kafka-console-producer.sh` and stop there.
No live cloud (no real AWS / Confluent Cloud — everything runs in Docker). No Flink (we used to advertise that in metadata, which was misleading; this project is 100% Kafka Streams). No live PagerDuty / Slack / SMS / Email integrations — alert ROUTING ships, channel client implementations are guided exercises. K8s blue/green ships as Strimzi YAML + a deploy.yml; chaos labs ship as documented runbooks, not scripted experiments.
P01 is Flink-on-Kubernetes with the Flink K8s Operator, RocksDB checkpointing, 2PC for exactly-once, and ZK HA — a separate runtime. This project is Kafka Streams (an in-process Java library) on Strimzi-managed Kafka with EOS v2 + transactional producers. Same problem (real-time fraud), opposite ops surface: separate cluster vs library-in-the-app.
No. Everything runs locally — KRaft Kafka + Schema Registry + Kafka UI + Spring Boot REST in Docker. Strimzi YAML is shown for K8s but doesn't require a live cluster to read and reason about. Patterns transfer cleanly to MSK / Confluent Cloud with config changes only.
EOS v2 is real on the producer side: PROCESSING_GUARANTEE_CONFIG=EXACTLY_ONCE_V2 sets idempotent producer + transactional commits + read_committed consumer. Whether the downstream consumer is end-to-end exactly-once depends on its commit semantics (Kafka Connect sinks have their own delivery semantics). We show the producer/topology pattern, document the assumptions, and don't claim end-to-end E2E exactly-once across heterogeneous sinks.
All 15+ PRO projects, 4 code-review credits per month, 2 office-hours sessions, full curriculum across all 7 tracks, all 5 career paths, certificate of completion, and full community access. Cancel anytime.

Ready to ship a real streaming topology?

Start with module 01 — free, no card. About 2-3 hours. By the end you'll have KRaft Kafka + Schema Registry + your first Avro-serialized Kafka Streams topology running locally with Java.

P20 · Real-time fraud detection on Kafka Streams · PRO · module 01 freeUpgrade to PRO →
Press Cmd+K to open