Feature Store Explained
What It Is and How It Works
A feature store has two layers: an offline store (Parquet/S3) for batch training and an online store (Redis) for real-time inference. Here is how each layer works and why both are necessary.
Short Answer
A feature store is a data system that manages ML features for both training and serving. It has an offline store (Parquet on S3) holding full feature history for point-in-time correct training datasets, and an online store (Redis) holding the latest feature values for sub-10ms inference. Materialization is the scheduled job that copies from offline to online. Feast is the leading open-source implementation.
Feast FeatureView Definition
A FeatureView is the core unit in Feast. It connects a data source to an entity, defines the feature schema, and sets the TTL for online store expiry.
from feast import FeatureStore, FeatureView, Entity, Field, FileSource
from feast.types import Float32, Int64
from datetime import timedelta
# --- Data source: Parquet file (offline store) ---
customer_source = FileSource(
path="data/customer_features.parquet",
event_timestamp_column="event_timestamp",
)
# --- Entity: the join key ---
customer = Entity(name="customer_id", value_type=Int64)
# --- FeatureView: groups features with source + TTL ---
customer_features = FeatureView(
name="customer_features",
entities=[customer],
ttl=timedelta(days=7),
schema=[
Field(name="days_since_last_login", dtype=Float32),
Field(name="transaction_count_7d", dtype=Int64),
Field(name="avg_transaction_amount_7d", dtype=Float32),
],
source=customer_source,
)
# Register everything with the registry
# Run: feast applyCore Concepts
Offline Store
Parquet / S3
Holds the complete history of feature values. Used by get_historical_features() to perform point-in-time correct joins for training dataset generation. Handles terabytes of data.
Online Store
Redis
Holds only the latest feature value per entity. Used by get_online_features() at inference time. Optimized for <10ms key-value lookup under high QPS load.
Feature Materialization
feast materialize
The process of reading features from the offline store (Parquet) and writing the latest values into the online store (Redis). Run on a schedule or triggered on demand. Without it, online features are stale or missing.
Feast Concepts Reference
| Concept | Description |
|---|---|
| FeatureView | Groups related features with a data source, entity, TTL, and schema. Core unit of feature definition. Registered with feast apply. |
| Entity | The join key that connects features to model inputs (e.g. customer_id, driver_id). Every FeatureView has one or more entities. |
| FeatureService | A named group of features from one or more FeatureViews that a specific model consumes. Enables per-model feature versioning. |
| DataSource | Where raw feature data lives: FileSource (Parquet/S3), BigQuerySource, RedshiftSource, SparkSource. Feast reads from here for offline and materialization. |
| materialize | The CLI command (feast materialize <start> <end>) that copies feature values from the offline store into the online store (Redis) for a time range. |
Common Mistakes
Calling get_online_features() before running feast materialize — Redis is empty and all features return null
Omitting event_timestamp_column in FileSource — Feast cannot perform point-in-time joins and get_historical_features() silently returns wrong values
Setting TTL shorter than the materialization interval — features expire from Redis between runs, causing null values at inference time during off-peak hours
Frequently Asked Questions
What is the difference between the offline and online store?
The offline store (Parquet/S3) holds full feature history for point-in-time correct training dataset generation via batch retrieval. The online store (Redis) holds only the latest feature values per entity for sub-10ms lookups at inference time. Materialization is the scheduled job that copies from offline to online.
What is feature materialization?
Feature materialization reads feature values from the offline store (Parquet) and writes the latest values into the online store (Redis). It runs on a schedule (e.g., hourly) or on demand. Without materialization, get_online_features() returns null values because Redis has no data.
What is a FeatureView in Feast?
A FeatureView groups related features, connects them to an Entity (join key) and a DataSource (Parquet or BigQuery), and sets a TTL for online store expiry. It is the core unit of feature definition. You define FeatureViews in Python and run feast apply to register them.
What is a FeatureService in Feast?
A FeatureService groups features from multiple FeatureViews into a named set that a specific model will consume. Instead of listing individual feature names at inference time, you pass the FeatureService name, which lets you version the feature set a model uses independently from the underlying FeatureViews.