Context
The online store serves features at inference time, on the request
path. Latency budget for BentoML.predict() is <50 ms P99 end-to-end
across the gateway, the feature lookup, the model forward pass, and the
output serialization. The feature lookup itself must be <10 ms P99 to
leave room for model compute. Feast supports Redis, DynamoDB, Datastore,
SQLite, Bigtable, Postgres, and Cassandra as online-store backends;
the practical choices for a small-team production deploy are:
- Redis (ElastiCache) — in-memory KV, sub-ms gets, well-understood ops.
- DynamoDB — managed, autoscaling, single-digit-ms gets.
- Postgres direct — reuse the registry/offline-store database.
Decision
Adopt Redis (ElastiCache for production, local Docker for the tutorial).
# feature_store.yaml
online_store:
type: redis
connection_string: localhost:6379 # ElastiCache primary endpoint in prod
# Online lookup (Module 02)
features = store.get_online_features(
features=["churn_features:tenure_months", ...],
entity_rows=[{"customer_id": 12345}],
).to_dict()
# P99 < 5ms in benchmarks shown in part-2
Tradeoffs we accept
| Lever | Redis (chosen) | DynamoDB | Postgres direct |
|---|---|---|---|
| Read latency (P99) | <5 ms | <10 ms | 15-30 ms |
| Write latency | <2 ms | <10 ms | 20-50 ms |
| Throughput ceiling | ~100k ops/sec/node | Autoscaling | ~5k ops/sec |
| Durability | RDB + AOF, replica | Native | WAL |
| Cost at 4M chunks × 4 tenants | ~$54/mo (cache.t4g.small × 2) | Per-RCU pricing, hard to budget | $0 marginal (reuse RDS) |
| Vendor lock-in | None | AWS-only | None |
| Operational complexity | Standard | Zero (managed) | None new |
| Local-dev parity | Single Docker container | LocalStack workaround | Dev DB |
We optimize for read latency + vendor independence + local-dev parity. Redis wins on raw latency by 2-3× over DynamoDB. Postgres direct is enough latency to blow our P99 budget once you stack network
- model compute on top.
Consequences (positive)
- P99 feature-lookup latency consistently <5 ms in part-3's load tests.
- Local development uses the same
redis:7-alpineDocker image as the ElastiCache primary in prod — no behavioral surprises. - Redis pubsub is available for real-time feature freshness signals (used in Module 04's drift detection).
- Reset is one
FLUSHALL— fast iteration during Module 02 hacking. - Cost is bounded: a
cache.t4g.smallprimary + replica handles the 4-tenant load with 50% headroom (see cost-model CSV).
Consequences (negative)
- Memory cap. A Redis instance can hold ~1-2 GB of features per GB
of RAM. At 32M chunks × 1 KB average = ~32 GB working set, which
needs
cache.t4g.medium(3.09 GB) at minimum. Mitigation: per-tenant TTL eviction on long-tail features. - Snapshotting cost. AOF + RDB on a hot Redis can cost 10-15% CPU. Mitigation: replica handles snapshots; primary stays clean.
- No native time-travel. Redis is current-state; PIT correctness
for training uses the offline store (Parquet on S3), not Redis.
This is by design — see
src/training_features.pyfor the offline PIT join. - Cache miss = stale prediction. If Redis is down, the BentoML
service serves stale features (or 503s). Mitigation: liveness probe
in
k8s/deployment.yamlchecks Redis reachability; fallback path is documented in the runbook.
Reversal plan
feature_store.yaml is the single config knob. To swap online stores:
- Update
online_store.typein YAML. - Re-run
feast materializeto populate the new store. - Switch ElastiCache → DynamoDB / Postgres in the deployment manifests.
- Re-run Module 03 integration tests (latency assertions in
tests/integration/test_api.pywill fail if the swap blows the <50 ms P99 budget — fail-loud, not fail-silent).
Estimated effort: 2-4 engineer-days for a tested swap. Reversible.
References
feature_store.yamlfeature_store/sync.py(Kafka → Redis materialization)test_online_features.py(latency benchmark)- ADR-001 (Feast — choice of orchestration layer is independent of online store)