Skip to content
Back to Projects
ML Platform~15 hours
Capstone Project — Build a Production Feature Store

StreamGuard Anomaly Detection

Deploy an end-to-end robust anomaly detection system capable of handling late-arriving events and out-of-order data streams.

finserve / feature-store-pipeline
SOURCES
Transactions
Customers
Kafka Events
OFFLINE STORE
Spark Transforms
PIT Joins
Parquet
ONLINE STORE
Redis
Materialization
< 10ms
CONSUMERS
Fraud Model
Credit Score
Offers
FinServe Feature Store — Progressive Build
1
Part 1: Offline Store & PIT Joins
15/45 features
2
Part 2: Online Store & Serving
30/45 features
3
Part 3: Streaming Features
45/45 features
4
Part 4: Deploy to Production
LIVE Feature Store on K8s

What You'll Build

A complete feature store platform for FinServe, a fintech company that needs real-time features for fraud detection, credit scoring, and personalized offers.

Offline Feature Store

Spark-based batch feature engineering with Parquet storage and point-in-time correct joins for training data

45+ features

Online Feature Store

Redis-backed low-latency serving with materialization pipelines and consistency guarantees

p99 < 10ms

Streaming Features

Kafka + Spark Streaming pipeline computing real-time windowed aggregations for fraud detection

Real-time updates

Point-in-Time Correctness

Prevent data leakage with PIT-correct feature retrieval across 2M+ transaction training sets

Zero leakage

Feature Versioning & Backfill

Schema evolution, version tracking, and automated backfill pipelines for historical features

Full lineage

Production Monitoring

Grafana dashboards, drift detection, quality gates, and SLA tracking for feature freshness

99.9% SLA

Progressive Build Path

Each part builds on the previous. Watch your feature store grow from a single offline store to a production-grade platform.

Part 13–4 hours

Foundation — Offline Feature Store & Point-in-Time Joins

Set up Feast, define entities and feature views, build Spark batch features, and implement point-in-time correct training sets for FinServe’s fraud model.

Feast project with FileSource offline storeEntity definitions (customer, account, merchant)Batch feature views with Spark transformsPoint-in-time join pipeline+2 more
Offline store serving 45+ features with PIT-correct training sets
Part 24–5 hours

Online Store — Low-Latency Serving with Redis

Stand up Redis as the online store, build materialization pipelines, and create a REST serving layer that returns features in under 10ms for real-time fraud scoring.

Redis online store configurationMaterialization pipeline (offline → online)REST feature serving endpointFeature freshness monitoring+2 more
Online store serving real-time features at p99 < 10ms
Part 34–5 hours

Streaming Features — Kafka + Spark Streaming

Build streaming feature pipelines with Kafka and Spark Structured Streaming. Implement feature versioning, backfill strategies, and schema evolution for production resilience.

Kafka streaming feature pipelineSpark Structured Streaming transformsWindowed aggregation features (1min, 5min, 1hr)Feature versioning with registry+2 more
Streaming features updating in real-time with backfill support
Part 43–4 hours

Production — Monitoring, CI/CD & Deployment

Deploy the feature store to Kubernetes with CI/CD for feature definitions, Grafana dashboards for observability, and automated quality gates.

Feature quality monitoring pipelineCI/CD pipeline for feature definitionsKubernetes deployment with HelmGrafana dashboards for feature health+2 more
LIVE production feature store on Kubernetes
Total Time: ~15 hours

Download Sample Data

FinServe's financial data — 2M+ records across 4 files

transactions.csv
2M transactions · 45 MB
Financial transaction events with timestamps and amounts
customers.csv
100K customers · 8 MB
Customer profiles with account metadata
merchants.csv
25K merchants · 2 MB
Merchant details with category and risk scores
fraud_labels.csv
50K labeled events · 1 MB
Ground truth fraud labels for model training

Or generate synthetic data using our Python script

Tech Stack You'll Master

Feast 0.38+Feature Store
Apache SparkBatch Processing
KafkaStreaming
RedisOnline Store
Python 3.11+Language
DockerContainers
KubernetesOrchestration
GrafanaMonitoring
PrometheusMetrics
LocustLoad Testing
GitHub ActionsCI/CD
HelmDeployment

Why Feature Stores?

Feature stores are the critical bridge between data engineering and ML. Companies like Uber, DoorDash, and Spotify all built dedicated feature platforms. This project teaches you what ML platform engineers build at these companies.

Fastest-Growing ML Infra Skill

65% of ML teams struggle to hire engineers with feature store experience. Average ML platform engineer salary: $185K+

Production Best Practices

Point-in-time correctness, training-serving skew prevention, feature versioning, and low-latency serving

Portfolio Differentiator

Most candidates have notebooks. You'll have a deployed feature store with streaming, monitoring, and CI/CD

Resume-Ready Portfolio Project

Add these bullet points to your resume after completing the project:

  • Built production feature store with Feast serving 45+ features across offline (Spark/Parquet) and online (Redis) stores with p99 latency < 10ms
  • Implemented point-in-time correct feature retrieval preventing data leakage across 2M+ transaction training sets, achieving training-serving parity
  • Designed streaming feature pipeline with Kafka + Spark Structured Streaming computing real-time aggregations (1min/5min/1hr windows) for fraud detection
  • Deployed feature store to Kubernetes with CI/CD, automated quality gates, drift detection, and Grafana dashboards monitoring 99.9% feature freshness SLA
Completion certificate included

Prerequisites

Python Proficiency

Required

Comfortable with pandas, PySpark basics, and writing data pipelines

Docker & Containers

Required

Can build and run Docker containers, understand docker-compose

ML Fundamentals

Helpful

Understand supervised learning, feature engineering concepts, train/test splits

Streaming Basics

Helpful

Familiarity with Kafka concepts (producers, consumers, topics)

Ready to Build a Production Feature Store?

Start with Part 1: Offline Feature Store & Point-in-Time Joins

Press Cmd+K to open