Build the
full-stack RAG platform
— ingest, retrieve, serve, eval, harden
Ship a production RAG platform with hybrid retrieval (pgvector + BM25 + RRF + cross-encoder rerank), a 4-class query router with confidence threshold, a 3-level failure cascade (RAG → LLM-only → cached), per-tenant index isolation, eval gates that block bad deploys, cost guardrails with auto-downgrade, and 5 committed ADRs. Modules 01-03 unlock with PRO; the platform unlocks with EXPERT.
The full-stack-AI system-design portfolio piece for staff AI roles — 5 committed ADRs (one Deprecated documenting a real cross-tenant leak reversal), a working hybrid-retrieval pipeline with eval gates, and a cost model that defends the cascade vs Sonnet-only baseline.
- Document ingestion DAG (Airflow + dbt + content-hash dedup + PII redaction) feeding a per-tenant pgvector index
- Hybrid retrieval — semantic + BM25 + reciprocal rank fusion + cross-encoder rerank with cited context
- 4-class query router (factual / analytical / open / ambiguous) with confidence threshold + ambiguous fallback
- FastAPI gateway with token-by-token SSE streaming + 3-level failure cascade + circuit breaker + per-tenant isolation
- Eval pipeline (offline + online + faithfulness + safety) wired to a release gate that blocks bad deploys
- Cost guardrail with model-cascade auto-downgrade + 6-mode incident simulator + 5 ADRs (one Deprecated)
Modules 01-03 unlock with PRO. The full platform with EXPERT.
Modules 01-03 (~9h) ship a complete RAG system — ingestion DAG with system contract, hybrid retrieval with rerank, 4-class query router with structured prompts and grounding score. Included with PRO. Modules 04-06 (~11h additional) layer on production serving (failure cascade, multi-tenant), eval + release gates, and reliability/cost/capstone. Unlock with EXPERT.
Foundation. Production. Capstone.
Each phase ends with a tagged release, a passing eval suite, and a passing failure-injection drill. No ambiguity about where you are.
Working RAG system running locally. SystemContract + ingestion DAG + hybrid retrieval + query router + LLM orchestration with grounding score, end-to-end on the 150-doc / 2k-chunk corpus.
Production serving. Per-tenant isolation, SSE streaming, 3-level failure cascade, circuit breaker, frontend React hook consuming the streaming endpoint.
Eval gates + reliability + cost + design narrative. Release gate blocks bad deploys, cost guardrail auto-downgrades on budget breach, incident simulator runs 6 failure modes, capstone doc ready for staff interview.
One command. Local FastAPI + Postgres + pgvector + Redis. No API key.
What lives in the repo
You get the real platform on day one — FastAPI gateway, Postgres + pgvector for the vector index, Redis for sessions / cache / cost tracker, sentence-transformers for embeddings, cross-encoder for rerank. The local-core demo runs without API keys via a MockLLMJudge; swap to Anthropic via requirements-llm.txt.
- system_contract.py — freshness / latency / correctness / coverage SLAs (per ADR-001)
- nexus/ingestion.py + dags/ — DocumentIngester + Airflow DAG + dbt staging models
- embeddings/ + retrieval/ — pgvector schema + hybrid + RRF + cross-encoder rerank
- routing/ + orchestration/ + prompts/ — QueryRouter + RAGOrchestrator + prompt_registry + tool framework
- serving/ + frontend/ — FastAPI gateway + FailureRouter + CircuitBreaker + SSE + React hook
- evaluation/ + observability/ — offline + online eval + ReleaseGate + TraceRecorder
- docs/adr/ + docs/cost-model/ — 5 committed ADRs (one Deprecated) + the runnable cost-model CSV
Full-stack AI Platform Starter Kit
Pre-built RAG platform with seeded 3-tenant Postgres + pgvector, 150 documents / ~2,268 chunks, Redis cache + cost tracker, FastAPI gateway with SSE streaming, 5 pytest gates. Now bundled: 5 ADR markdown files (docs/adr/) and the runnable cost-model CSV (docs/cost-model/) — unzip and read them straight from the repo.
The same RAG demo — but built for the multi-tenant case.
Most RAG tutorials show you a notebook with a single embedding call and a single LLM call. This shows what changes when 3 tenants share infrastructure, the retriever fails 0.4% of the time, eval gates block bad deploys, and the cost model has to defend itself to a CFO.
chunks_* tables · TenantAwareRetriever (post-ADR-005 reversal)pgvector + BM25 + RRF + cross-encoder rerank (ADR-002)FailureRouter 3-level cascade + CircuitBreaker (ADR-004)ReleaseGate blocks bad mergescost_guardrail.py; CSV in docs/cost-model/IncidentSimulator + design doc reviewWrite the ADRs staff engineers actually get judged on.
Five ADRs ship inside the starter-kit zip at docs/adr/, one per major decision in the build, including a real Deprecated ADR documenting the v0 row-filter multi-tenant design that was reverted to per-tenant index isolation after a real cross-tenant chunk leak. The kind of doc that travels with you to your next role. Preview ADR-001 →
SystemContract as the platform's north star (declared upfront)
SystemContract dataclass with freshness / latency / correctness / coverage floors; CI gates against it; M05 ReleaseGate reads from it directlypgvector + HNSW + hybrid RRF + rerank, not a dedicated vector DB
chunks(vector(384) + tsv) with HNSW index + Postgres FTS; semantic + BM25 → reciprocal rank fusion → top-20 → cross-encoder rerank → top-54-class query router with confidence threshold + ambiguous fallback
QueryRouter classifies into FACTUAL / ANALYTICAL / OPEN_ENDED / AMBIGUOUS; confidence < 0.6 → AMBIGUOUS clarification pathThree-level failure cascade: RAG → LLM-only → cached → error
FailureRouter with explicit is_degraded + disclaimer per response; CircuitBreaker per component skips broken paths for ~30sMulti-tenant retrieval via row-filter on shared index (v0)
chunks_<tenant> tables with TenantAwareRetriever reading from the right table by configRead the FinOps story, not just the latency one.
Module 06 ships a runnable cost-model CSV inside the starter-kit zip at docs/cost-model/. 3 tenants × 10k queries/mo load, real Anthropic + OpenAI + AWS list prices, with model-cascade and reserved-instance levers wired up. The version you’ll defend to a CFO. Preview the CSV →
Optimization levers
Async architecture review with a staff-level reviewer (cohort beta).
Submit your repo, your ADR draft, or your release-gate config. A staff or principal-level reviewer who has shipped this exact stack responds within 7 days with line-by-line comments. Cohort capped at 12 members.
Bring a diff, an ADR draft, or a release-gate config.
The cohort beta runs as async architecture review — pick a reviewer by topic, send the artifact, get inline comments + a Loom walkthrough back. No back-and-forth scheduling. No 30-minute slot pressure.
PRO unlocks Modules 01-03. EXPERT unlocks the full platform.
PRO is the entry point — Modules 01-03 (a working RAG system) plus the rest of the PRO catalog. EXPERT unlocks Modules 04-06 of this build, the 5 ADRs, the cost-model CSV, and the cohort-beta async review.
Pick this if you own the release gate, not just a feature.
Staff / principal engineers · AI platform
You own the release gate, the failure cascade, and the answer to 'why are we shipping this?' that your VP asks before launch.
Engineering managers · AI
You need a reference architecture for the RAG platform your CTO will ask about before the AI team gets headcount or a budget for production deployment.
Platform / infra leads
You absorb RAG without absorbing 4 new vendors. Postgres, Redis, Prometheus, Slack — tools you already operate. This is the playbook.
Founding engineers · AI startups
Your investors will ask 'how do you know your model is getting better?' before they ask about scale. The 5 ADRs + ReleaseGate + cost model is the answer.
Going deeper? Four tracks back this project.
The RAG curriculum is the foundation. These four tracks let you go deeper on eval, agents, production ops, and the platform-design discipline you'll need at staff level.
Quick answers.
Paired with this project
EXPERT-tier inference build: vLLM continuous batching + PagedAttention, Ray Serve autoscale (market-hours min=2), Redis semantic cache (35% hit), ServingCircuitBreaker, 5 chaos scenarios + runbook, runnable cost-model CSV with break-even-vs-OpenAI math. Module 01 with PRO.
EXPERT-tier retrieval build: pgvector + HNSW, BM25 + RRF, cross-encoder reranker, OpenAI function-calling agent, semantic cache, drift detection, multi-region replication code. Modules 01-02 with PRO; full platform with EXPERT.
Ready to ship the system that retrieves, generates, and evaluates?
Start with PRO ($29/mo) for Modules 01-03 — the working RAG system. Or unlock the full 6-module platform plus 5 ADRs, the cost-model CSV, and cohort-beta architecture review with EXPERT ($79/mo).