Skip to content
Back to Enterprise AI Platform

Approval queue is Redis with 24h TTL, not Temporal workflow

✓ AcceptedEnterprise AI Platform02 — Lineage, Policy, Agent Approvals
By AI-DE Engineering Team·Stakeholders: platform engineer, agent platform reviewer, on-call

Context

When a policy engine returns require_approval, the request is parked until a human reviewer accepts or rejects it. We need durability (restarts must not lose pending approvals), a clear TTL (stale approvals must auto-reject), and per-tenant scoping. The classic options:

  1. Temporal workflow — a durable workflow per pending approval; timeouts, signals, and retries are first-class.
  2. AWS Step Functions / Argo Workflows — similar shape, more vendor coupling.
  3. Redis with TTL + a polling worker — store pending approvals as keys with a 24h TTL; reviewers GET + SET accept/reject.

Decision

Adopt Redis with 24h TTL for v1.

# api/approvals.py
KEY = f"approval:{tenant_id}:{request_id}"
redis.setex(KEY, 86400, json.dumps({
    "status": "pending",
    "agent_action": action,
    "context": context,
    "submitted_at": now(),
}))
# reviewer endpoint mutates status; agent_executor checks status

This decision is Accepted with known replacement conditions — see §Reversal. The expectation is that Temporal becomes the right answer once any of these triggers fire:

  • Approval workflows exceed 24h (e.g. weekend approvals, multi-step reviewer chains)
  • Workflows need compensating actions on rejection (rollback DB changes, revoke tokens, notify downstream systems)
  • Approvals need fan-out (multiple required approvers, quorum logic)
  • Cross-region durability becomes a requirement

Tradeoffs we accept

LeverTemporalRedis + TTL (chosen)
Day-1 costTemporal cluster + UI + RPCZero new infra
Durability across long timeoutsNative (months OK)24h hard cap
Compensating actions on rejectFirst-class workflow primitiveApplication code
Multi-step approval chainsFirst-classApplication code (clunky)
Audit trailWorkflow event historyApplication audit log + Redis TTL log
Visibility (UI for in-flight workflows)Built-in Temporal UIBuild it ourselves
Operator skill requiredWorkflow programming modelRedis already known by every engineer
At-most-once vs at-least-once deliveryConfigurableApplication-level semantics

Consequences (positive)

  • v1 ships in <100 lines of FastAPI + Redis code.
  • The 24h TTL is itself a feature — stale approvals auto-reject, so the "ghost approvals" problem (a reviewer who left for vacation blocking a request indefinitely) cannot happen.
  • Audit trail uses the same logger as everything else.
  • Reviewers and agents see the same Redis state — no cross-system consistency questions.

Consequences (negative)

  • 24h hard cap is real. A weekend approval submitted Friday evening will auto-reject before Monday morning. Mitigation: per-tenant override TTL is configurable; some tenants get 72h.
  • No compensating actions. A rejected agent action that already pre-fetched data does not roll back the side effects automatically. Mitigation: agent actions in v1 are read-only; mutating actions are out of scope until ADR-???-future.
  • No multi-approver workflows. First reviewer wins. Mitigation: acceptable at v1 — mutation-class actions don't ship.
  • No visibility UI. Operators inspect via redis-cli KEYS plus the FastAPI health endpoint. Mitigation: Module 03 Grafana panel shows pending-approval count + age histogram per tenant.

Reversal plan

The replacement is well-scoped because the application contract is small:

  1. Implement temporal_approvals.py with submit(), accept(), reject(), and wait_for_decision() matching the current Redis interface.
  2. Stand up Temporal cluster (managed Temporal Cloud is the path of least resistance — ~$0.30/k actions).
  3. Switch the approval worker via feature flag.
  4. Migrate in-flight Redis approvals: walk approval:* keys, replay into Temporal as workflow signals.
  5. Cut over after a 1-week soak period.

Estimated effort: 3-4 engineer-weeks including Temporal infra setup. Reversible — both engines can run in parallel during the soak.

References

  • apps/web/public/downloads/enterprise-ai-platform-starter.zip!/api/approvals.py
  • apps/web/public/downloads/enterprise-ai-platform-starter.zip!/governance/agent_executor.py
  • ADR-003 (Redis policy store — uses the same Redis instance, separate keyspace)
Built into the project

This decision shipped as part of Enterprise AI Platform — see the full architecture, starter kit, and 4 more ADRs.

Open project →
Press Cmd+K to open