Skip to content
Step-by-Step Guide

How to Build a Feature Store
with Feast

Define feature views, materialize to offline and online stores, generate point-in-time training datasets, and serve real-time features at <10ms — step by step.

1

Install Feast and Initialize Project

Install Feast with the Redis extra for production online serving. Then scaffold a new feature store project with feast init.

# Install Feast with Redis support
pip install feast[redis]

# Initialize a new feature store project
feast init my_feature_store
cd my_feature_store/feature_repo

This creates the following structure:

my_feature_store/
├── feature_repo/
│   ├── feature_store.yaml   # Online/offline store config
│   ├── example_repo.py      # Sample feature definitions
│   └── data/
│       └── driver_stats.parquet  # Sample data
2

Define Data Sources and Entities

A DataSource points to your raw data (Parquet, BigQuery, etc.). An Entity is the join key that connects features to your model inputs — typically a user or item ID.

from feast import FileSource, Entity
from feast.types import Int64

# Point to your Parquet feature data
customer_source = FileSource(
    path="data/customer_features.parquet",
    event_timestamp_column="event_timestamp",
    # For S3: path="s3://my-bucket/customer_features.parquet"
)

# Define the entity (join key)
customer = Entity(
    name="customer_id",
    value_type=Int64,
    description="Unique customer identifier",
)
3

Define FeatureViews

A FeatureView groups related features, specifies the data source and entity, and sets a TTL (time-to-live) for online store expiry. Each feature is declared with a Field() including its type.

from feast import FeatureView, Field
from feast.types import Float32, Int64
from datetime import timedelta

customer_features = FeatureView(
    name="customer_features",
    entities=[customer],
    ttl=timedelta(days=7),  # Features expire from online store after 7 days
    schema=[
        Field(name="days_since_last_login", dtype=Float32),
        Field(name="transaction_count_7d", dtype=Int64),
        Field(name="avg_transaction_amount_7d", dtype=Float32),
        Field(name="session_count_30d", dtype=Int64),
        Field(name="support_tickets_90d", dtype=Int64),
    ],
    source=customer_source,
    tags={"team": "ml-platform", "model": "churn"},
)
4

Apply the Feature Store

Running feast apply parses all Python files in the repo, registers feature definitions in the registry, and creates the online store schema. Run this every time you modify FeatureViews.

# From inside the feature_repo/ directory
feast apply

# Expected output:
# Registered entity customer_id
# Registered feature view customer_features
# Deploying infrastructure for customer_features

feast apply creates the online store table schema and updates the local registry file. It does not load any data into the online store — that is done by materialization in the next step.

5

Materialize to Online Store and Generate Training Data

Materialization copies the latest feature values from the offline store (Parquet) into the online store (Redis). Run it on a schedule (e.g., hourly) to keep online features fresh. Separately, use get_historical_features() for point-in-time correct training datasets.

# Materialize features from Jan 1 to now into online store
feast materialize 2024-01-01T00:00:00 $(date -u +"%Y-%m-%dT%H:%M:%S")

# Or incrementally materialize since last run
feast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S")
import pandas as pd
from feast import FeatureStore

store = FeatureStore(repo_path=".")

# Point-in-time correct training data
# Feast will join each row to features available BEFORE event_timestamp
entity_df = pd.DataFrame({
    "customer_id": [1001, 1002, 1003, 1004],
    "event_timestamp": pd.to_datetime([
        "2024-01-10", "2024-01-15", "2024-01-20", "2024-01-25"
    ]),
    "label": [1, 0, 1, 0],
})

training_df = store.get_historical_features(
    entity_df=entity_df,
    features=[
        "customer_features:days_since_last_login",
        "customer_features:transaction_count_7d",
        "customer_features:avg_transaction_amount_7d",
        "customer_features:session_count_30d",
    ],
).to_df()

print(training_df.head())
6

Serve Real-Time Features

In your prediction API, call store.get_online_features() to fetch pre-materialized feature values from Redis. This returns a dictionary you can pass directly to your model.

from feast import FeatureStore

store = FeatureStore(repo_path=".")

# Called in your inference API — typically <10ms with Redis
online_features = store.get_online_features(
    features=[
        "customer_features:days_since_last_login",
        "customer_features:transaction_count_7d",
        "customer_features:avg_transaction_amount_7d",
    ],
    entity_rows=[
        {"customer_id": 1001},
        {"customer_id": 1002},
    ],
).to_dict()

# online_features is a dict ready to feed to your model
# {
#   "customer_id": [1001, 1002],
#   "days_since_last_login": [3.0, 14.0],
#   "transaction_count_7d": [12, 2],
#   "avg_transaction_amount_7d": [142.5, 38.2]
# }
prediction = model.predict(online_features)

When to Build a Feature Store

  • You have more than one ML model in production and features overlap between them
  • You notice that model accuracy is lower in production than in offline evaluation (training-serving skew)
  • Your prediction API recomputes features at inference time, causing latency or inconsistency
  • You need auditable, point-in-time correct training data for regulatory or reproducibility reasons

Common Issues

Forgetting to run feast apply after modifying a FeatureView — registry becomes out of sync with your code

Wrong event_timestamp_column name in FileSource — causes feast apply to fail or get_historical_features() to return nulls

Online store not materialized before calling get_online_features() — returns null values for all features

TTL set too short — features expire from Redis before the next materialization run, causing missing values at inference time

Frequently Asked Questions

What is the minimum setup for Feast?

The minimum Feast setup requires: pip install feast, feast init to scaffold the project, one FileSource pointing to a Parquet file, one Entity, one FeatureView, and feast apply to register everything. You can use SQLite as the online store locally — no Redis required. Add feast[redis] when you need production online serving.

How does point-in-time correct training data work?

You provide an entity_df with customer_id and event_timestamp columns. Feast performs an as-of join: for each row, it finds the most recent feature values where the feature timestamp is less than or equal to the event_timestamp. This prevents leaking future data into your training set.

When should I use Redis vs SQLite for the online store?

Use SQLite (the default) for local development and prototyping — no infrastructure required. Use Redis for production: it handles concurrent requests, horizontal scaling, and sub-millisecond latency. Install feast[redis] and set online_store type: redis in feature_store.yaml.

Related

Press Cmd+K to open