StreamGuard Anomaly Detection
Deploy an end-to-end robust anomaly detection system capable of handling late-arriving events and out-of-order data streams.
What You'll Build
A complete feature store platform for FinServe, a fintech company that needs real-time features for fraud detection, credit scoring, and personalized offers.
Offline Feature Store
Spark-based batch feature engineering with Parquet storage and point-in-time correct joins for training data
45+ featuresOnline Feature Store
Redis-backed low-latency serving with materialization pipelines and consistency guarantees
p99 < 10msStreaming Features
Kafka + Spark Streaming pipeline computing real-time windowed aggregations for fraud detection
Real-time updatesPoint-in-Time Correctness
Prevent data leakage with PIT-correct feature retrieval across 2M+ transaction training sets
Zero leakageFeature Versioning & Backfill
Schema evolution, version tracking, and automated backfill pipelines for historical features
Full lineageProduction Monitoring
Grafana dashboards, drift detection, quality gates, and SLA tracking for feature freshness
99.9% SLAProgressive Build Path
Each part builds on the previous. Watch your feature store grow from a single offline store to a production-grade platform.
Foundation — Offline Feature Store & Point-in-Time Joins
Set up Feast, define entities and feature views, build Spark batch features, and implement point-in-time correct training sets for FinServe’s fraud model.
Online Store — Low-Latency Serving with Redis
Stand up Redis as the online store, build materialization pipelines, and create a REST serving layer that returns features in under 10ms for real-time fraud scoring.
Streaming Features — Kafka + Spark Streaming
Build streaming feature pipelines with Kafka and Spark Structured Streaming. Implement feature versioning, backfill strategies, and schema evolution for production resilience.
Production — Monitoring, CI/CD & Deployment
Deploy the feature store to Kubernetes with CI/CD for feature definitions, Grafana dashboards for observability, and automated quality gates.
Download Sample Data
FinServe's financial data — 2M+ records across 4 files
Or generate synthetic data using our Python script
Tech Stack You'll Master
Why Feature Stores?
Feature stores are the critical bridge between data engineering and ML. Companies like Uber, DoorDash, and Spotify all built dedicated feature platforms. This project teaches you what ML platform engineers build at these companies.
Fastest-Growing ML Infra Skill
65% of ML teams struggle to hire engineers with feature store experience. Average ML platform engineer salary: $185K+
Production Best Practices
Point-in-time correctness, training-serving skew prevention, feature versioning, and low-latency serving
Portfolio Differentiator
Most candidates have notebooks. You'll have a deployed feature store with streaming, monitoring, and CI/CD
Resume-Ready Portfolio Project
Add these bullet points to your resume after completing the project:
- Built production feature store with Feast serving 45+ features across offline (Spark/Parquet) and online (Redis) stores with p99 latency < 10ms
- Implemented point-in-time correct feature retrieval preventing data leakage across 2M+ transaction training sets, achieving training-serving parity
- Designed streaming feature pipeline with Kafka + Spark Structured Streaming computing real-time aggregations (1min/5min/1hr windows) for fraud detection
- Deployed feature store to Kubernetes with CI/CD, automated quality gates, drift detection, and Grafana dashboards monitoring 99.9% feature freshness SLA
Prerequisites
Python Proficiency
RequiredComfortable with pandas, PySpark basics, and writing data pipelines
Docker & Containers
RequiredCan build and run Docker containers, understand docker-compose
ML Fundamentals
HelpfulUnderstand supervised learning, feature engineering concepts, train/test splits
Streaming Basics
HelpfulFamiliarity with Kafka concepts (producers, consumers, topics)
Ready to Build a Production Feature Store?
Start with Part 1: Offline Feature Store & Point-in-Time Joins