ADR-002: Online store is Redis, not DynamoDB or Postgres direct | PredictFlow Feature Store

Context

The online store serves features at inference time, on the request path. Latency budget for BentoML.predict() is <50 ms P99 end-to-end across the gateway, the feature lookup, the model forward pass, and the output serialization. The feature lookup itself must be <10 ms P99 to leave room for model compute. Feast supports Redis, DynamoDB, Datastore, SQLite, Bigtable, Postgres, and Cassandra as online-store backends; the practical choices for a small-team production deploy are:

Redis (ElastiCache) — in-memory KV, sub-ms gets, well-understood ops.
DynamoDB — managed, autoscaling, single-digit-ms gets.
Postgres direct — reuse the registry/offline-store database.

Decision

Adopt Redis (ElastiCache for production, local Docker for the tutorial).

# feature_store.yaml
online_store:
  type: redis
  connection_string: localhost:6379 # ElastiCache primary endpoint in prod

# Online lookup (Module 02)
features = store.get_online_features(
    features=["churn_features:tenure_months", ...],
    entity_rows=[{"customer_id": 12345}],
).to_dict()
# P99 < 5ms in benchmarks shown in part-2

Tradeoffs we accept

Lever	Redis (chosen)	DynamoDB	Postgres direct
Read latency (P99)	<5 ms	<10 ms	15-30 ms
Write latency	<2 ms	<10 ms	20-50 ms
Throughput ceiling	~100k ops/sec/node	Autoscaling	~5k ops/sec
Durability	RDB + AOF, replica	Native	WAL
Cost at 4M chunks × 4 tenants	~$54/mo (cache.t4g.small × 2)	Per-RCU pricing, hard to budget	$0 marginal (reuse RDS)
Vendor lock-in	None	AWS-only	None
Operational complexity	Standard	Zero (managed)	None new
Local-dev parity	Single Docker container	LocalStack workaround	Dev DB

We optimize for read latency + vendor independence + local-dev parity. Redis wins on raw latency by 2-3× over DynamoDB. Postgres direct is enough latency to blow our P99 budget once you stack network

model compute on top.

Consequences (positive)

P99 feature-lookup latency consistently <5 ms in part-3's load tests.
Local development uses the same redis:7-alpine Docker image as the ElastiCache primary in prod — no behavioral surprises.
Redis pubsub is available for real-time feature freshness signals (used in Module 04's drift detection).
Reset is one FLUSHALL — fast iteration during Module 02 hacking.
Cost is bounded: a cache.t4g.small primary + replica handles the 4-tenant load with 50% headroom (see cost-model CSV).

Consequences (negative)

Memory cap. A Redis instance can hold ~1-2 GB of features per GB of RAM. At 32M chunks × 1 KB average = ~32 GB working set, which needs cache.t4g.medium (3.09 GB) at minimum. Mitigation: per-tenant TTL eviction on long-tail features.
Snapshotting cost. AOF + RDB on a hot Redis can cost 10-15% CPU. Mitigation: replica handles snapshots; primary stays clean.
No native time-travel. Redis is current-state; PIT correctness for training uses the offline store (Parquet on S3), not Redis. This is by design — see src/training_features.py for the offline PIT join.
Cache miss = stale prediction. If Redis is down, the BentoML service serves stale features (or 503s). Mitigation: liveness probe in k8s/deployment.yaml checks Redis reachability; fallback path is documented in the runbook.

Reversal plan

feature_store.yaml is the single config knob. To swap online stores:

Update online_store.type in YAML.
Re-run feast materialize to populate the new store.
Switch ElastiCache → DynamoDB / Postgres in the deployment manifests.
Re-run Module 03 integration tests (latency assertions in tests/integration/test_api.py will fail if the swap blows the <50 ms P99 budget — fail-loud, not fail-silent).

Estimated effort: 2-4 engineer-days for a tested swap. Reversible.

References

feature_store.yaml
feature_store/sync.py (Kafka → Redis materialization)
test_online_features.py (latency benchmark)
ADR-001 (Feast — choice of orchestration layer is independent of online store)