Context
The platform must serve point-in-time-correct features for training and sub-100 ms online lookups for inference, across multiple models and teams. Training-serving skew is the failure mode every previous attempt at this team has hit. Feature reuse across teams is a real ask. The classic options:
- Tecton — managed feature store, opinionated, best-in-class PIT correctness.
- Hopsworks — open-source-with-managed feature store, broader scope (full ML platform).
- Feast — open-source feature-store orchestration layer (BYO offline + online + registry stores).
- DIY — Postgres + Redis + a Python
FeatureViewclass.
We are building a reference MLOps platform for tutorial purposes — the choice has to be reproducible by a learner on a laptop in <15 minutes and survive a real production deploy.
Decision
Adopt Feast as the feature-store orchestration layer.
# feature_store.yaml
project: predictflow
provider: local
registry: postgres://...
online_store:
type: redis
connection_string: localhost:6379
offline_store:
type: file # parquet on local FS or S3
# features/customer_features.py
from feast import FeatureView, Entity, Field
from feast.types import Float32, Int64
customer = Entity(name="customer_id", value_type=Int64)
churn_features = FeatureView(
name="churn_features",
entities=[customer],
ttl=timedelta(days=30),
schema=[
Field(name="tenure_months", dtype=Float32),
Field(name="monthly_charges", dtype=Float32),
...
],
source=parquet_source,
)
Tradeoffs we accept
| Lever | Tecton | Hopsworks | DIY | Feast (chosen) |
|---|---|---|---|---|
| Day-1 setup | Vendor onboarding | Self-host or managed | Hours of glue code | pip install feast + feast init |
| PIT correctness | Native | Native | We build it | Native (get_historical_features) |
| Online latency (P99) | <5 ms | <10 ms | <10 ms (Redis) | <10 ms (Redis online store) |
| Cross-team feature reuse | Strong (registry UI) | Strong | Build the registry | Real registry, no UI by default |
| Vendor lock-in | High | Medium | None | None |
| Cost at <100 features | $$$ | $ infra | $ infra | $ infra |
| Streaming sync | Native | Native | We build it | Streaming engine API + Kafka examples |
| Tutorial reproducibility | Cannot (vendor) | Heavy infra | Build everything | One YAML + Python decorators |
We optimize for tutorial reproducibility + production portability. Tecton wins on managed polish but a learner can't reproduce a managed SaaS on their laptop. Hopsworks pulls in a full ML platform — more than this project needs. DIY is what every team starts with and regrets. Feast is the open-source-with-real-receipts middle.
Consequences (positive)
- A learner runs
feast init+ edits two Python files and has a working feature store in <15 minutes (Module 02 ships in this time budget). - Feature definitions are pure Python decorators — a swap to Tecton or Hopsworks is a dependency change, not a rewrite of every feature.
- Feast's PIT correctness is battle-tested — we don't reimplement the hardest bug class in feature engineering.
- The
feature_store.yamlconfig is the single point of vendor swap — online-store, offline-store, and registry are independent dimensions.
Consequences (negative)
- No managed UI for browsing the registry. We build a thin Python
FeatureRegistry(feature_store/registry.py) for the tutorial; in production teams typically front it with a web UI or pull into Atlas. - Streaming sync is BYO orchestration. Feast exposes
write_to_online_store()but doesn't run a Kafka consumer for you. Module 02 ships the consumer infeature_store/sync.py. - No native lineage UI. The
migrations/feature_registry.sqlschema includes afeature_lineagetable; querying it is a SQL exercise, not a UI click. - Backfill orchestration is BYO.
feature_store/backfill.pyshows the pattern; production teams would wrap it in Airflow / Argo.
Reversal plan
The FeatureRegistry interface (register_feature, get_features,
record_lineage) is a thin Python class, and the feature definitions
are Python decorators. Swap is bounded:
- Replace
feature_store.yamlwith the new tool's config. - Translate
FeatureViewdecorators to the new tool's equivalent (Tecton uses similar@stream_feature_view; Hopsworks usesfs.create_feature_group()). - Translate the Kafka sync from
write_to_online_store()to the new tool's online API. - Re-run the Module 03 integration suite to confirm online lookups.
Estimated effort: 2-3 engineer-weeks for a tested swap. Reversible.
References
feature_store.yamlfeatures/{customer_features,behavioral_features}.pyfeature_store/{registry,sync,backfill}.pymigrations/feature_registry.sql- ADR-002 (Redis online store choice — independent of Feast)
- ADR-003 (MLflow + DVC — tracking + data versioning, complements Feast)