Build the
agent execution layer
of modern data platforms
Ship a production agentic pipeline with a LangGraph supervisor + 4 worker agents (Ingestion / Quality / Transform / Loading), Redis checkpointing for time-travel replay, HITL via interrupt_before with Slack actionable approval, ToolCallGuard budget enforcement, RestrictedPython sandbox, and 5 committed ADRs. Modules 01-03 unlock with PRO; the platform unlocks with EXPERT.
The agent-platform system-design portfolio piece for staff AI roles — 5 committed ADRs (one Deprecated documenting a real RBAC reversal), a working LangGraph supervisor-worker pipeline, and a cost model that defends the cascade vs Sonnet-only baseline.
- LangGraph supervisor + 4 worker agents with conditional-edge routing and Redis checkpointing
- RBAC-aware ToolRegistry with per-agent scoped views (post-ADR-005 reversal)
- Production observability: Prometheus + OpenTelemetry + LangSmith + per-agent cost attribution
- HITL approval flow via interrupt_before + Slack actionable buttons (Approve / Deny / Escalate)
- FailureDetector + ToolCallGuard + ContextWindowManager + 24h TimeTravel checkpoint replay
- 5 ADRs (one Deprecated) + cost-model CSV + RestrictedPython sandbox + GitHub PR bot
Modules 01-03 unlock with PRO. The full platform with EXPERT.
Modules 01-03 (~10h) ship a complete working multi-agent pipeline — foundation, 4 production tools, and a LangGraph supervisor + worker system with Redis checkpointing. Included with PRO. Modules 04-06 (~7.5h additional) layer on production observability, hardening (HITL + failure detection + time-travel), and a multi-tenant platform-design capstone. Unlock with EXPERT.
Foundation. Production. Platform.
Each phase ends with a tagged release, a passing integration test, and a passing red-team failure-injection run. No ambiguity about where you are.
Working multi-agent pipeline running locally. LangGraph supervisor + 4 worker agents + 4 tools + Redis checkpointing, end-to-end execution on the seeded sample data.
- ✓RBAC-aware ToolRegistry with 4 tools (post-ADR-005 reversal)
- ✓LangGraph supervisor + workers with conditional-edge routing
- ✓RedisSaver checkpointing + recovery on restart
Observable, deployable, costable. Prometheus + OTel + LangSmith tracing, per-agent cost attribution, FastAPI service, Docker + Kubernetes manifests, agent eval framework.
- ✓Prometheus + OpenTelemetry + LangSmith integration
- ✓Per-agent + per-tool cost attribution
- ✓Docker + Kubernetes deploy + agent eval suite
Hardening + multi-tenant platform design. HITL approval flow, failure detection + recovery, time-travel replay, multi-tenant TenantContext, RestrictedPython sandbox, capstone design.
- ✓HITL via interrupt_before + Slack actionable buttons (ADR-004)
- ✓FailureDetector + ToolCallGuard + TimeTravel replay
- ✓Multi-tenant + RestrictedPython sandbox + GitHub PR bot
One command. Local LangGraph + Postgres + Redis + LangSmith.
What lives in the repo
You get the real agent platform on day one — LangGraph supervisor + 4 worker agents, RBAC-aware ToolRegistry with 4 tools, RedisSaver checkpointing, FastAPI gateway, Prometheus + OpenTelemetry + LangSmith instrumentation, Docker + Kubernetes manifests, plus the M05 hardening (HITL / FailureDetector / TimeTravel) and M06 platform features.
- src/agents/ — supervisor + 4 workers + HITL + FailureDetector + ContextManager
- src/tools/ — 4 production tools + RBAC-aware registry (post-ADR-005)
- src/orchestration/ + src/memory/ — LangGraph StateGraph + RedisSaver + semantic memory
- src/observability/ — Prometheus + OTel + LangSmith + cost-tracking + Slack alerts
- src/api/ + src/scaling/ + Dockerfile + k8s/ — FastAPI + task queue + container + Kubernetes manifests
- docs/adr/ + docs/cost-model/ — 5 committed ADRs (one Deprecated) + the runnable cost-model CSV
Agentic Data Pipeline Starter Kit
Pre-built LangGraph supervisor + 4 worker agents + RBAC tool registry + Redis checkpointing + FastAPI + Docker + Kubernetes. Now bundled: 5 ADR markdown files (docs/adr/) and the runnable cost-model CSV (docs/cost-model/) — unzip and read them straight from the repo.
The same agent demo — but built for the auditable case.
Most agent tutorials show you a single LLM call in a loop. This shows what changes when the supervisor decides for 4 workers, every tool call hits an audit log, the cost model is defensible to a CFO, and a compliance reviewer asks which agent decided that.
LangGraph StateGraph + supervisor-worker conditional edges (ADR-001 + ADR-003)ScopedToolView per agent role (RBAC at registry, post-ADR-005 reversal)FailureDetector + ToolCallGuard + budget cap; cascade detection in M05interrupt_before + Slack approve / deny / escalate (ADR-004)docs/cost-model/TimeTravel reads any checkpoint in last 24h via RedisSaver (ADR-002)Write the ADRs staff engineers actually get judged on.
Five ADRs ship inside the starter-kit zip at docs/adr/, one per major decision in the build, including a real Deprecated ADR documenting the v0 ToolRegistry → RBAC reversal after a real Quality-agent overreach incident. The kind of doc that travels with you to your next role. Preview ADR-001 →
LangGraph chosen over CrewAI / AutoGen / custom orchestrator
WorkerAgent) — orchestrator-replaceableRedis for orchestrator checkpoints; Postgres only for business data
RedisSaver for checkpoints (24h TTL), Postgres for business dataHierarchical supervisor-worker topology, not peer-to-peer agents
add_conditional_edgesHITL via LangGraph `interrupt_before` + Slack actionable buttons
interrupt_before=[...] at compile time + Slack Approve / Deny / Escalate buttons → graph.aupdate_state resumesSingle global ToolRegistry without RBAC scoping (v0)
ScopedToolView with per-agent-role scopes (default-deny) at the registry layerRead the FinOps story, not just the latency one.
Module 04 ships a runnable cost-model CSV inside the starter-kit zip at docs/cost-model/. 5,000 agent-runs/mo load, real Anthropic + AWS RDS + ElastiCache + LangSmith list prices, with model-cascade and reserved-instance levers wired up. The version you’ll defend to a CFO. Preview the CSV →
Optimization levers
Async architecture review with a staff-level reviewer (cohort beta).
Submit your repo, your ADR draft, or your supervisor-routing prompt. A staff or principal-level reviewer who has shipped this exact stack responds within 7 days with line-by-line comments. Cohort capped at 12 members.
Bring a diff, an ADR draft, or an HITL flow.
The cohort beta runs as async architecture review — pick a reviewer by topic, send the artifact, get inline comments + a Loom walkthrough back. No back-and-forth scheduling. No 30-minute slot pressure.
PRO unlocks Modules 01-03. EXPERT unlocks the full platform.
PRO is the entry point — Modules 01-03 (a working multi-agent pipeline) plus the rest of the PRO catalog. EXPERT unlocks Modules 04-06 of this build, the 5 ADRs, the cost-model CSV, and the cohort-beta async review.
Pick this if you own the supervisor, not just a feature.
Staff / principal engineers · agent platform
You own the supervisor prompt, the RBAC boundary, and the answer to 'which agent decided this?' that your VP asks at the next incident review.
Engineering managers · AI
You need a reference architecture for the agent platform your CTO will ask about before the AI team gets headcount or a budget for production deployment.
Platform / infra leads
You absorb LangGraph without absorbing 4 new vendors. Postgres, Redis, Prometheus, Slack — tools you already operate. This is the playbook.
Founding engineers · AI startups
Your investors will ask 'how do you know agents are safe to ship?' before they ask about scale. The 5 ADRs + HITL gate + RBAC registry is the answer.
Going deeper? Four tracks back this project.
The Agentic Workflows curriculum is the foundation. These four tracks let you go deeper on agent eval, retrieval, production ops, and the platform-design discipline you'll need at staff level.
Quick answers.
Ready to ship the system that runs agents in production?
Start with PRO ($29/mo) for Modules 01-03 — the working multi-agent pipeline. Or unlock the full 6-module platform plus 5 ADRs, the cost-model CSV, and cohort-beta architecture review with EXPERT ($79/mo).