Feature Stores & Feature Engineering
Offline and online feature serving, streaming features, and production feature platforms.
Training-serving skew is the silent killer of production ML. Feature stores are how serious teams ensure the features used at training time match the features served at inference time — every time, on every model.
What you’ll be able to do
- Build offline and online feature stores for ML systems
- Implement streaming feature pipelines for real-time inference
- Design feature quality monitoring and alerting
- Deploy production feature platforms with fraud detection capstone
Curriculum
Phase 1: Feature Store Foundations
Core concepts and offline features. The conceptual map plus the batch pipeline pattern every later module builds on.
Feature Store Foundations
Why feature stores exist, the online-vs-offline architecture, what a feature is in production vs a notebook, and the platform landscape — Feast deep-dive plus Tecton vs cloud-native vs DIY tradeoffs. The conceptual map every later module builds on.
Offline Feature Engineering
Batch feature engineering with Spark, SQL feature definitions in dbt, materializing to the offline store, the point-in-time-join pattern that prevents leakage, get_historical_features for training-set generation, and Airflow orchestration.
Phase 2: Online & Streaming
Real-time features and streaming pipelines. Where the offline store's training accuracy meets a sub-millisecond latency budget at serve time.
Online Feature Serving
Online store architecture (Redis vs DynamoDB vs in-process), materialization pipelines, the get_online_features serving API, online/offline consistency guarantees, caching + sub-millisecond latency optimization, and the end-to-end serving pipeline.
Streaming Feature Pipelines
Kafka as feature event source, Flink for streaming computation, writing streaming features into the online store, the unified-definition pattern (same code for batch + stream), and backfilling streaming features so models can train on history.
Phase 3: Production Features
Quality monitoring, architecture, and capstone. Where Feast graduates from a library to a service the on-call team can defend.
Feature Quality Monitoring
Training-serving skew detection and prevention, drift monitoring, validation + data-quality gates, the feature catalog for discoverability, RBAC + governance, and the monitoring dashboard you'd actually put on the wall.
Production Architecture
Architecture patterns, scaling from 1k to 10M QPS, cost optimization, disaster recovery + HA, schema evolution + feature versioning, and Infrastructure-as-Code for feature stores — the platform-engineering layer that turns Feast from a library into a service.
Capstone: Fraud Detection
End-to-end fraud-detection feature store: offline pipeline + streaming pipeline + real-time inference serving + monitoring + cost analysis. The capstone that proves you can ship the platform, not just describe it — plus interview prep on the system-design rounds this skill is tested in.
What you’ll build
- Offline feature pipeline (Spark + dbt) with point-in-time joins
- Online feature serving API with sub-millisecond latency
- Streaming feature pipeline (Kafka + Flink) writing to the online store
- Production feature platform with monitoring, governance, and a fraud-detection capstone
This works in your training notebook… but fails the moment the model goes live.
Without a feature store, you risk:
- Training-serving skew that silently degrades model accuracy in production
- Feature pipelines duplicated across teams, drifting subtly out of sync
- Real-time inference that can't get the right feature in under 10 ms
- Streaming features with no backfill story — models train on history they never see at serve time
What is Feature Stores & Feature Engineering?
Feature stores are centralized platforms that manage the computation, storage, and serving of ML features for both training and inference. They solve the training-serving skew problem by ensuring models use identical features in training and production. Used by Uber (Michelangelo), Airbnb, and DoorDash to serve features at millisecond latency for real-time ML.
Why this matters in production
Training-serving skew is one of the most common ML production failures. At Uber, their feature store Michelangelo serves millions of features per second for ride pricing and fraud detection. Production feature stores require both offline (batch) and online (real-time) serving with strict consistency guarantees.
Common use cases
- Building offline feature pipelines for batch model training
- Implementing online feature serving with sub-millisecond latency
- Creating streaming feature pipelines for real-time ML inference
- Monitoring feature quality and detecting distribution drift
- Sharing and reusing features across multiple ML models and teams
- Building fraud detection systems with real-time feature computation
FEATURE vs alternatives
FEATURE vs Feast
Feast is the leading open-source feature store. Managed alternatives like Tecton and Databricks Feature Store add operational features. Feast is a good starting point; managed platforms scale better for large teams.
FEATURE vs Custom Pipeline
Feature stores provide standardized serving, versioning, and monitoring. Custom pipelines offer flexibility but risk training-serving skew. Feature stores are worth the investment once you have multiple models in production.
FEATURE vs Data Warehouse
Feature stores serve features at low latency for real-time inference. Data warehouses are optimized for analytical queries. Feature stores often source from warehouses but serve features through dedicated infrastructure.
Related skills
Why this skill matters
Feature stores are the data-engineering specialty that maps cleanly into ML platform work. This skill proves you can prevent training-serving skew, serve features under SLA, and operate the platform that every production ML model depends on — the role Uber, Airbnb, and DoorDash hire for at staff level.
Common questions about FEATURE
What is a feature store?
A feature store manages ML features from computation through serving. It provides offline features for training and online features for inference, ensuring consistency between the two environments.
Why do ML teams need feature stores?
Feature stores prevent training-serving skew, enable feature reuse across models, and provide monitoring. Without them, teams duplicate feature logic and introduce subtle bugs that degrade model performance.
How long does it take to learn feature stores?
Concepts take 1-2 weeks. Building production feature pipelines with offline and online serving typically takes 6-8 weeks including hands-on practice with tools like Feast.
Do data engineers need feature store skills?
Data engineers on ML teams absolutely need these skills. Feature engineering and serving are data infrastructure problems that require data engineering expertise.
What is training-serving skew?
Training-serving skew occurs when features used in production differ from those used in training. This causes silent model degradation. Feature stores solve this by serving identical features in both environments.