Build the
streaming feature
spine StreamGuard runs on.
FinServe needs millisecond features for its fraud model. Build the full real-time feature spine: Feast 0.37 with 22 features across customer / account / merchant entities, Spark batch + Parquet for offline, Redis 7 + FastAPI for online (p99 < 10ms validated by Locust), Kafka 7.5 + Confluent Schema Registry + Avro contracts + Spark Structured Streaming with watermarks and 1m / 5m / 1h sliding windows, Evidently AI drift detection, and a Helm / K8s deployment with GitHub Actions feature-validation CI.
The “walk me through how you’d ship features to a fraud model in milliseconds” question — asked at any fintech with real-time risk decisions on the critical path (Stripe, Plaid, Chime, Robinhood).
- A Feast 0.37 project with 22 features across 3 entities (customer / account / merchant)
- Spark batch + Parquet offline store and Redis 7 + FastAPI online (p99 < 10ms validated by Locust)
- Kafka 7.5 + Confluent Schema Registry + Avro contracts + a 6-partition transaction topic
- Spark Structured Streaming with 1m / 5m / 1h sliding windows and 2-minute watermarks for late events
- Evidently AI drift detection (PSI + KS) + Prometheus exporter + Grafana dashboards
- Helm chart + Kubernetes manifests + a GitHub Actions feature-validation CI workflow
Real-time features are now their own engineering specialty.
Fraud, trust-and-safety, and personalization teams all need features in milliseconds. The bar isn't writing Spark jobs — it's owning the streaming spine: Schema Registry contracts that don't break, watermarks that handle late events, sliding windows that match the fraud model's expectations, and drift detection that catches feature decay before the model does. That's the engineer companies pay senior+ for in 2026.
Real-time feature engineering is its own discipline
It's not the data engineer who writes the dbt model and not the ML engineer who trains the classifier. It's the platform engineer who owns the streaming spine in between — and that's the role hiring managers actually call out.
Schema governance stops streaming feature outages
Confluent Schema Registry + Avro + BACKWARD compatibility is what reviewers look for. Hand-typed Kafka payloads aren't — they're the production incident that takes the fraud model offline at 3 AM.
Watermarks + windowing > batch every 5 minutes
If a fraud event arrives 90 seconds late, batch jobs miss it. Spark Structured Streaming with watermarks emits the right window even when the data shows up out-of-order. That's the difference between catching fraud in real-time and catching it in tomorrow's report.
Drift detection is now table stakes
Evidently's PSI + KS-test pattern caught feature decay weeks before models started misbehaving. Without it, you're flying blind. With it, you have a Prometheus gauge + a Grafana panel + an alert rule the security team will sign off on.
Part 01 is free. The rest unlocks with PRO.
Try the first ~3 hours — set up Feast, define entities + feature views, build Spark batch features over Parquet sources, and ship a point-in-time correct training set. If it clicks, upgrade to unlock the Redis serving layer, the Kafka + Schema Registry + Spark Streaming spine, and the production ops layer (Evidently drift + Helm/K8s + GH Actions CI).
Feature Stores for ML
The Feature Stores curriculum covers the primitives — this project shows you how to compose them into a production streaming feature platform on a real fintech dataset.
Three sprints. Three checkpoints. One millisecond feature spine.
Each phase ends with a tagged commit, a passing acceptance gate, and an artifact a senior platform reviewer would actually accept.
Feast registry with 22 features. PIT-correct training set generation. Redis 7 online store with materialization. FastAPI serving validated by Locust at p99 < 10ms.
- ✓Feast project: 3 entities + 3 feature views + 22 features over Parquet sources
- ✓PIT-correct training sets via store.get_historical_features()
- ✓Redis 7 + materialize_incremental + FastAPI + Locust validating p99 < 10ms
Kafka 7.5 + Confluent Schema Registry + Avro contracts. Transaction producer with fraud-burst injection. Spark Structured Streaming with watermarks + 1m / 5m / 1h sliding windows. Stream-batch consistency tests passing.
- ✓Kafka 6-partition transaction topic + Confluent Schema Registry + Avro contracts
- ✓Spark Structured Streaming with withWatermark + sliding windows + Redis sink
- ✓Feature versioning + schema evolution + deprecation handlers
Evidently AI drift detection (PSI + KS) emitting Prometheus gauges. Grafana dashboards. GitHub Actions feature-validation CI. Helm chart deploying the full stack to K8s with readiness probes + smoke tests.
- ✓Evidently drift detection + Prometheus exporter + Grafana dashboards + alert rules
- ✓GitHub Actions feature-validation workflow (feast plan on PR)
- ✓Helm chart + K8s manifests + smoke tests + readiness probes
One starter kit. Feast + Kafka + Spark + Redis + K8s, wired.
The starter kit ships a complete project skeleton — Feast registry with 22 feature definitions, sample Parquet sources (5K customers / 800 merchants / 6K accounts / 90 days), Kafka producer + Confluent Schema Registry config, Helm chart, and the GitHub Actions feature-validation workflow.
What lives in the repo
Everything you need to walk all 4 parts on your laptop — including the FinServe sample dataset that simulates a multi-merchant retailer at the row counts this project actually uses (not the inflated marketing numbers).
- feast-feature-store/ — Feast project with 3 entities + 3 feature views (22 features) + sources.yml
- data/raw/ — sample Parquet sources (5K customers / 800 merchants / 6K accounts / 90 days)
- src/streaming/ — Kafka producer, Avro schemas, Spark Structured Streaming windowing job
- src/serving/ — FastAPI app + connection pool + TTL cache + Locust load test
- monitoring/ — Evidently drift detector + Prometheus exporter + Grafana dashboard JSON
- helm/ + k8s/ — Helm chart + manifests + readiness probes + smoke tests
StreamGuard Feature Spine Starter Kit
Pre-built Feast project with 22 features, sample fintech data (5K customers / 800 merchants / 6K accounts), Kafka + Schema Registry + Avro config, Helm chart, and GitHub Actions workflows. docker-compose up + dbt-style apply in under 10 minutes.
The same feature spine — but built for the real fraud model.
Tutorials run a single materialization, against a single Redis, with hand-typed Kafka payloads and no late-event handling. Production requires watermarks, schema contracts, drift detection, and a Helm chart that survives a rolling upgrade. Here’s the diff, with the real APIs you reach for.
store.materialize_incremental() with watermark tracking via Airflow DAGBACKWARD compatibility enforced in CIwithWatermark("event_timestamp", "2 minutes") + sliding windows emit correct results out-of-ordermax_connections=50 + cachetools.TTLCache in-process cachep99 < 10msagainst the serving endpointDataDriftPreset + PSI computation + Prometheus gauges + alert rule on drift_share > 0.3values.yaml + readiness probes + smoke tests in the CI pipelineReal review from ML platform engineers who’ve owned this spine.
Submit your repo, get line-by-line feedback within 48 hours from someone who has actually owned the streaming feature layer for a fraud model in production. The kind of review that's quietly worth thousands of dollars in time-to-staff.
4 reviews / month
Submit a PR, a refactor proposal, or a full repo. Reviewer is matched to your domain — Feast / Kafka / Spark Streaming for this project. Async, comments inline, average turnaround 31 hours.
2 office hours / month
Live 30-min sessions with a senior ML platform engineer. Architecture questions, whiteboard a streaming windowing decision, mock a system-design interview on real-time feature serving. Group sessions also available.
One subscription. 15+ projects, all curriculum, code review.
PRO is built for engineers who want production-grade builds and feedback loops — not more tutorials.
Pick this if you want to own the millisecond feature layer.
ML platform engineers
You own the layer between data engineering and ML. This project is the streaming feature spine you'll defend in code review and on a system-design whiteboard.
Senior data engineers crossing into ML
You ship dbt + Airflow but the next interview loop wants you to talk Kafka + Spark Streaming + Schema Registry. After this you can defend windowing trade-offs and watermark tuning from first principles.
DEs prepping for streaming + schema interviews
Take-homes increasingly ship a Kafka topic and ask for a real-time feature pipeline. After this you can produce one in an afternoon and walk through schema evolution + late-event handling without hand-waving.
MLOps engineers focusing on the feature layer
You've shipped models. You haven't owned the feature spine that feeds them. This is the project that lets you talk about drift, freshness, and feature CI as concretely as you talk about model rollbacks.
Going deeper? Three tracks back this project.
The Feature Stores curriculum is the spine. These three let you go deeper on the layers it touches — Kafka primitives behind the streaming spine, MLOps platform patterns behind the production layer, and streaming theory behind the watermarking + windowing decisions.
Quick answers.
Ready to build the feature spine?
Start with Part 01 — free, no card. About 3 hours. By the end you'll have Feast running with 22 features across 3 entities, point-in-time correct training sets generated, and pytest validating the registry.