ADR-004: Approval queue is Redis with 24h TTL, not Temporal workflow | Enterprise AI Platform

Context

When a policy engine returns require_approval, the request is parked until a human reviewer accepts or rejects it. We need durability (restarts must not lose pending approvals), a clear TTL (stale approvals must auto-reject), and per-tenant scoping. The classic options:

Temporal workflow — a durable workflow per pending approval; timeouts, signals, and retries are first-class.
AWS Step Functions / Argo Workflows — similar shape, more vendor coupling.
Redis with TTL + a polling worker — store pending approvals as keys with a 24h TTL; reviewers GET + SET accept/reject.

Decision

Adopt Redis with 24h TTL for v1.

# api/approvals.py
KEY = f"approval:{tenant_id}:{request_id}"
redis.setex(KEY, 86400, json.dumps({
    "status": "pending",
    "agent_action": action,
    "context": context,
    "submitted_at": now(),
}))
# reviewer endpoint mutates status; agent_executor checks status

This decision is Accepted with known replacement conditions — see §Reversal. The expectation is that Temporal becomes the right answer once any of these triggers fire:

Approval workflows exceed 24h (e.g. weekend approvals, multi-step reviewer chains)
Workflows need compensating actions on rejection (rollback DB changes, revoke tokens, notify downstream systems)
Approvals need fan-out (multiple required approvers, quorum logic)
Cross-region durability becomes a requirement

Tradeoffs we accept

Lever	Temporal	Redis + TTL (chosen)
Day-1 cost	Temporal cluster + UI + RPC	Zero new infra
Durability across long timeouts	Native (months OK)	24h hard cap
Compensating actions on reject	First-class workflow primitive	Application code
Multi-step approval chains	First-class	Application code (clunky)
Audit trail	Workflow event history	Application audit log + Redis TTL log
Visibility (UI for in-flight workflows)	Built-in Temporal UI	Build it ourselves
Operator skill required	Workflow programming model	Redis already known by every engineer
At-most-once vs at-least-once delivery	Configurable	Application-level semantics

Consequences (positive)

v1 ships in <100 lines of FastAPI + Redis code.
The 24h TTL is itself a feature — stale approvals auto-reject, so the "ghost approvals" problem (a reviewer who left for vacation blocking a request indefinitely) cannot happen.
Audit trail uses the same logger as everything else.
Reviewers and agents see the same Redis state — no cross-system consistency questions.

Consequences (negative)

24h hard cap is real. A weekend approval submitted Friday evening will auto-reject before Monday morning. Mitigation: per-tenant override TTL is configurable; some tenants get 72h.
No compensating actions. A rejected agent action that already pre-fetched data does not roll back the side effects automatically. Mitigation: agent actions in v1 are read-only; mutating actions are out of scope until ADR-???-future.
No multi-approver workflows. First reviewer wins. Mitigation: acceptable at v1 — mutation-class actions don't ship.
No visibility UI. Operators inspect via redis-cli KEYS plus the FastAPI health endpoint. Mitigation: Module 03 Grafana panel shows pending-approval count + age histogram per tenant.

Reversal plan

The replacement is well-scoped because the application contract is small:

Implement temporal_approvals.py with submit(), accept(), reject(), and wait_for_decision() matching the current Redis interface.
Stand up Temporal cluster (managed Temporal Cloud is the path of least resistance — ~$0.30/k actions).
Switch the approval worker via feature flag.
Migrate in-flight Redis approvals: walk approval:* keys, replay into Temporal as workflow signals.
Cut over after a 1-week soak period.

Estimated effort: 3-4 engineer-weeks including Temporal infra setup. Reversible — both engines can run in parallel during the soak.

References

apps/web/public/downloads/enterprise-ai-platform-starter.zip!/api/approvals.py
apps/web/public/downloads/enterprise-ai-platform-starter.zip!/governance/agent_executor.py
ADR-003 (Redis policy store — uses the same Redis instance, separate keyspace)