ai-de.net/Projects/P17 · Full-stack AI platform — full RAG system + production hardening

Last updated 2026-05-22By AI-DE Engineering Team

EXPERT-tier · PRO unlocks Modules 01-03AI & vectors trackP17

Build the
full-stack RAG platform
— ingest, retrieve, serve, eval, harden

Ship a production RAG platform with hybrid retrieval (pgvector + BM25 + RRF + cross-encoder rerank), a 4-class query router with confidence threshold, a 3-level failure cascade (RAG → LLM-only → cached), per-tenant index isolation, eval gates that block bad deploys, cost guardrails with auto-downgrade, and 5 committed ADRs. Modules 01-03 unlock with PRO; the platform unlocks with EXPERT.

Timeline

20-22 hours

Difficulty

Senior+

Stack

FastAPI · pgvector · Anthropic · Postgres · Redis · dbt

See EXPERT benefits

The full-stack-AI system-design portfolio piece for staff AI roles — 5 committed ADRs (one Deprecated documenting a real cross-tenant leak reversal), a working hybrid-retrieval pipeline with eval gates, and a cost model that defends the cascade vs Sonnet-only baseline.

By the end you will have wired

Document ingestion DAG (Airflow + dbt + content-hash dedup + PII redaction) feeding a per-tenant pgvector index
Hybrid retrieval — semantic + BM25 + reciprocal rank fusion + cross-encoder rerank with cited context
4-class query router (factual / analytical / open / ambiguous) with confidence threshold + ambiguous fallback
FastAPI gateway with token-by-token SSE streaming + 3-level failure cascade + circuit breaker + per-tenant isolation
Eval pipeline (offline + online + faithfulness + safety) wired to a release gate that blocks bad deploys
Cost guardrail with model-cascade auto-downgrade + 6-mode incident simulator + 5 ADRs (one Deprecated)

PREREQ · SENIOR+Built for engineers shipping RAG in production. Comfortable with Python services, async / asyncio, Postgres + SQL basics, and at least one of: vector search, vendor LLM APIs, or production observability. Not a “what is RAG” course.

fsap.platform · 6 modules · 3 tenants seeded · 2k chunks indexed · pgvector + Postgres + Redis

release gate ✓

Ingest

Retrieve

Serve

Operate

SystemContractfreshness · latency · correctness · coverage

DocumentIngesterS3 / wiki / JSON · SHA-256 dedup

Airflow DAGingest → normalize → chunk

dbt stagingstg_document_chunks

System contract — see ADR-001

pgvector + HNSW384-d · m=16 · ef_construction=64

BM25 (Postgres FTS)tsv column + GIN index

RRF fusionk=60 · top-50 → top-20

cross-encoder rerankms-marco-MiniLM-L-6-v2

Hybrid retrieval — see ADR-002

QueryRouter (4-class)confidence ≥ 0.6 · AMBIGUOUS fallback

FastAPI gateway/v1/query · SSE streaming

FailureRouterRAG → LLM-only → cached → error

TenantAwareRetrieverper-tenant index · ADR-005

Failure cascade — see ADR-004

ReleaseGateeval thresholds · regression cap

CostGuardrailauto-downgrade on budget breach

IncidentSimulator6 failure modes · playbooks

TraceRecorderrequest_id · component · latency

Release gate as CI/CD — see release_gate.py

# Judge cascade — 39% cost cut

Haiku handles routing + 70% of gen calls (USD 0.80/M in)

Sonnet only on complex / high-stakes queries

Cascade saves ~USD 76/mo at 10k queries/mo

→ ~USD 0.012 per query at optimized load

# Failure cascade — never blank-error

RAG fails → LLM-only with disclaimer

LLM-only fails → cached answer (24h max age)

Cache miss → honest error with retry guidance

→ is_degraded flag on every response · M05 eval gates on rate

3 tenants

isolated indexes · 2k chunks

5 ADRs

committed in starter kit

−39%

cost vs Sonnet-only baseline

Curriculum · 6 modules · 20-22 hours · 3 phases

Modules 01-03 unlock with PRO. The full platform with EXPERT.

Modules 01-03 (~9h) ship a complete RAG system — ingestion DAG with system contract, hybrid retrieval with rerank, 4-class query router with structured prompts and grounding score. Included with PRO. Modules 04-06 (~11h additional) layer on production serving (failure cascade, multi-tenant), eval + release gates, and reliability/cost/capstone. Unlock with EXPERT.

P17 · 6 modules · 20-22 hours · 60+ lessons

Free preview EXPERT required

M01

⊘Data Foundation & Ingestion

SystemContract dataclass (freshness / latency / correctness / coverage SLAs), DocumentIngester for S3 + wiki + JSON with SHA-256 dedup, Airflow DAG orchestrating ingest → normalize → chunk, dbt staging models, PII masking + tenant isolation at the data layer.

Phase 12.5h8 lessonsPRO TIER

Unlock with PRO →

M02

⊘Retrieval System & Knowledge Layer

pgvector + HNSW + sentence-transformers embeddings, BM25 via Postgres FTS, reciprocal rank fusion to merge, cross-encoder rerank for precision, CitedContext with numbered citations, FastAPI /retrieve endpoint with debug timings, hit_rate + MRR evaluation harness.

Phase 13h9 lessonsPRO TIER

Unlock with PRO →

M03

⊘LLM Orchestration & Intelligent Routing

4-class QueryRouter (factual / analytical / open_ended / ambiguous) with confidence threshold + ambiguous fallback, RAGOrchestrator with latency tracking, structured prompt registry with A/B experiments, CitationExtractor with grounding score, session memory with TTL, tool framework with read-only SQL execution.

Phase 13.5h11 lessonsPRO TIER

Unlock with PRO →

M04

⊘Serving Layer & Production API

FastAPI gateway with auth + rate-limit + sync/async paths, token-by-token SSE streaming with timeout protection, 3-level FailureRouter (RAG → LLM-only → cached → honest error), CircuitBreaker with CLOSED/OPEN/HALF_OPEN states, per-tenant index isolation via TenantAwareRetriever (post-ADR-005 reversal), frontend useStreamQuery React hook.

Phase 23h9 lessonsEXPERT TIER

Unlock with EXPERT →

M05

⊘Evaluation, Feedback & Release Gates

Offline eval (BLEU / ROUGE / exact / semantic similarity / faithfulness / safety), online eval from production traces, FeedbackLoop aggregating user ratings, ReleaseGate with pass/fail criteria, TraceRecorder for request lineage, design doc + tradeoff memo + launch checklist + rollout plan + oncall runbook.

Phase 34h11 lessonsEXPERT TIER

Unlock with EXPERT →

M06

⊘Reliability, Cost & Capstone

CostModel + CostGuardrail (auto-downgrade on budget breach), ModelRouter (cheap / expensive / special tiers), LatencyBudget with p50/p95/p99 tracking, SafetyGuardrail (moderation / injection / PII), IncidentSimulator (6 failure modes with playbooks), SpendTracker, capstone design doc + interview narrative.

Phase 34h12 lessonsEXPERT TIER

Unlock with EXPERT →

Modules 01-03 with PRO ($29/mo) · Modules 04-06 with EXPERT ($79/mo)

See plans →

Backed by curriculum

RAG Learning Path

8 modules35 hourspgvector · Hybrid retrieval · Rerank · Eval gates

Open curriculum

iThis curriculum is the foundation for the project — it’s not a sales add-on. EXPERT subscribers get full access to all modules.

The build, in 3 phases

Foundation. Production. Capstone.

Each phase ends with a tagged release, a passing eval suite, and a passing failure-injection drill. No ambiguity about where you are.

01~9h

Foundation (M01 · M02 · M03)

Working RAG system running locally. SystemContract + ingestion DAG + hybrid retrieval + query router + LLM orchestration with grounding score, end-to-end on the 150-doc / 2k-chunk corpus.

✓Hybrid retriever with rerank serving from FastAPI /retrieve (M02)
✓4-class QueryRouter with confidence threshold + AMBIGUOUS fallback (M03)
✓RAGOrchestrator with prompt registry + A/B experiments + grounding score (M03)

02~3h

Production (M04)

Production serving. Per-tenant isolation, SSE streaming, 3-level failure cascade, circuit breaker, frontend React hook consuming the streaming endpoint.

✓Per-tenant TenantAwareRetriever (post-ADR-005 reversal, M04)
✓FailureRouter cascade + CircuitBreaker per component (M04)
✓Token-by-token SSE streaming + frontend useStreamQuery hook (M04)

03~8h

Capstone (M05 · M06)

Eval gates + reliability + cost + design narrative. Release gate blocks bad deploys, cost guardrail auto-downgrades on budget breach, incident simulator runs 6 failure modes, capstone doc ready for staff interview.

✓Offline + online eval pipeline + ReleaseGate as CI/CD (M05)
✓CostGuardrail + ModelRouter + 6-mode IncidentSimulator (M06)
✓5 ADRs (one Deprecated) + cost-model CSV + capstone design doc (M06)

Project setup · 10 minutes

One command. Local FastAPI + Postgres + pgvector + Redis. No API key.

What lives in the repo

You get the real platform on day one — FastAPI gateway, Postgres + pgvector for the vector index, Redis for sessions / cache / cost tracker, sentence-transformers for embeddings, cross-encoder for rerank. The local-core demo runs without API keys via a MockLLMJudge; swap to Anthropic via requirements-llm.txt.

system_contract.py — freshness / latency / correctness / coverage SLAs (per ADR-001)
nexus/ingestion.py + dags/ — DocumentIngester + Airflow DAG + dbt staging models
embeddings/ + retrieval/ — pgvector schema + hybrid + RRF + cross-encoder rerank
routing/ + orchestration/ + prompts/ — QueryRouter + RAGOrchestrator + prompt_registry + tool framework
serving/ + frontend/ — FastAPI gateway + FailureRouter + CircuitBreaker + SSE + React hook
evaluation/ + observability/ — offline + online eval + ReleaseGate + TraceRecorder
docs/adr/ + docs/cost-model/ — 5 committed ADRs (one Deprecated) + the runnable cost-model CSV

Download · Starter Kit · 115 files · 3.0 MB

Full-stack AI Platform Starter Kit

Pre-built RAG platform with seeded 3-tenant Postgres + pgvector, 150 documents / ~2,268 chunks, Redis cache + cost tracker, FastAPI gateway with SSE streaming, 5 pytest gates. Now bundled: 5 ADR markdown files (docs/adr/) and the runnable cost-model CSV (docs/cost-model/) — unzip and read them straight from the repo.

EXPERT project · 115 files · ADRs + cost model bundled · last updated 2026-05-08

~/projects/full-stack-ai-platform — zsh

1. Boot Postgres + pgvector + Redis + Prometheus

$ unzip full-stack-ai-platform-starter.zip

$ cd full-stack-ai-platform-starter && cp .env.example .env

$ docker compose up -d

2. Apply migrations + seed 150 docs / 2k chunks

$ python3 -m venv .venv && source .venv/bin/activate

$ pip install -r requirements-core.txt

$ python scripts/run_local_demo.py

3. Run tenant isolation gate (5 tests, ~0.1s)

$ pytest scripts/test_tenant_isolation.py -v

4. Open ADR-001 + the cost model

$ less docs/adr/001-system-contract-as-north-star.md

$ open docs/cost-model/full-stack-ai-platform-cost-model.csv

tenants seeded

150

raw documents

~2,268

chunks indexed

5 ADRs

+ cost-model CSV

Production hardening

The same RAG demo — but built for the multi-tenant case.

Most RAG tutorials show you a notebook with a single embedding call and a single LLM call. This shows what changes when 3 tenants share infrastructure, the retriever fails 0.4% of the time, eval gates block bad deploys, and the cost model has to defend itself to a CFO.

Notebook RAG demoWhat most teams ship

Tenant isolation

WHERE tenant_id = ? on every query, hopefully

Retrieval

Semantic-only ANN search

Failure mode

HTTP 500 with no body

Eval

Spot-check with 10 queries

Cost

Whatever the bill says next month

Deploy gate

Push and pray

Your full-stack platformModules 04–06

✓

Tenant isolation

Per-tenant chunks_* tables · TenantAwareRetriever (post-ADR-005 reversal)

✓

Retrieval

Hybrid pgvector + BM25 + RRF + cross-encoder rerank (ADR-002)

✓

Failure mode

FailureRouter 3-level cascade + CircuitBreaker (ADR-004)

✓

Eval

Offline + online eval · faithfulness · grounding score · M05 ReleaseGate blocks bad merges

✓

Cost

Model cascade −39% vs Sonnet-only · auto-downgrade in cost_guardrail.py; CSV in docs/cost-model/

✓

Deploy gate

Eval gates + 6-mode IncidentSimulator + design doc review

EXPERT-only · architecture decision records

Write the ADRs staff engineers actually get judged on.

Five ADRs ship inside the starter-kit zip at docs/adr/, one per major decision in the build, including a real Deprecated ADR documenting the v0 row-filter multi-tenant design that was reverted to per-tenant index isolation after a real cross-tenant chunk leak. The kind of doc that travels with you to your next role. Preview ADR-001 →

ADR-001Accepted

SystemContract as the platform's north star (declared upfront)

Context

Without a single source of truth for SLAs, every decision gets made locally and the platform optimizes for nothing in particular

Decision

SystemContract dataclass with freshness / latency / correctness / coverage floors; CI gates against it; M05 ReleaseGate reads from it directly

Tradeoff

Contract amendments are PRs (friction by design); requires honest defaults, not permissive ones

Reversal

Drop to YAML SLO list (~1 engineer-day) if amendments cross 5/quarter or the team defaults floors to permissive

ADR-002Accepted

pgvector + HNSW + hybrid RRF + rerank, not a dedicated vector DB

Context

One database (Postgres we already operate) over Pinecone managed / Weaviate / Qdrant; compliance keeps documents in our VPC

Decision

chunks(vector(384) + tsv) with HNSW index + Postgres FTS; semantic + BM25 → reciprocal rank fusion → top-20 → cross-encoder rerank → top-5

Tradeoff

pgvector recall@1 is 2-3pp behind Pinecone on adversarial queries; cross-encoder rerank closes the gap on top-5

Reversal

Pinecone swap is ~2 engineer-weeks via the Retriever Protocol when corpus crosses 5M chunks per tenant

ADR-003Accepted

4-class query router with confidence threshold + ambiguous fallback

Context

RAG-everywhere wastes tokens + raises hallucination on open-ended queries; hard regex rules break on paraphrases

Decision

QueryRouter classifies into FACTUAL / ANALYTICAL / OPEN_ENDED / AMBIGUOUS; confidence < 0.6 → AMBIGUOUS clarification path

Tradeoff

Router on critical path (~150-300ms median); accept ~8% clarification rate over confidently-wrong answers

Reversal

Drop router (~2 engineer-days) when AMBIGUOUS rate {'>'} 25% or router latency dominates the budget

ADR-004Accepted

Three-level failure cascade: RAG → LLM-only → cached → error

Context

Components fail in production; HTTP 500 + retries + confidently-wrong-on-stale-data are all worse than degraded-with-disclaimer

Decision

FailureRouter with explicit is_degraded + disclaimer per response; CircuitBreaker per component skips broken paths for ~30s

Tradeoff

+50% latency on degraded path; cache fallback can show a 24h-old answer; users see honest disclaimers

Reversal

Drop cascade (~2 engineer-days) when degradation rate {'>'} 15% sustained — the cascade is masking a real systemic failure

ADR-005Deprecated

Multi-tenant retrieval via row-filter on shared index (v0)

Context

Day-4 MVP: shared chunks table + WHERE tenant_id = ? filter on every retrieval query

Decision

Reverted in M04 — moved to per-tenant chunks_<tenant> tables with TenantAwareRetriever reading from the right table by config

Why reversed

Cross-tenant leak through the BM25 path on 2026-04-21 (one query path was missing the WHERE clause); 2-week dwell time before compliance caught it

Replaced by

Per-tenant indexes; ~4.5 engineer-day reversal cost; tenant boundary is now schema-level

EXPERT-only · cost model

Read the FinOps story, not just the latency one.

Module 06 ships a runnable cost-model CSV inside the starter-kit zip at docs/cost-model/. 3 tenants × 10k queries/mo load, real Anthropic + OpenAI + AWS list prices, with model-cascade and reserved-instance levers wired up. The version you’ll defend to a CFO. Preview the CSV →

ComponentBaseline / moOptimized / moDelta

Anthropic Claude Sonnet (LLM gen)

100% baseline → 30% optimized · ~24M in / ~6M out tok/mo

$96

$29

−$67

Anthropic Claude Haiku (router + 70% gen)

router on every query + 70% of gen calls in optimized

$17

—

OpenAI text-embedding-3-small

M01 ingest + M02 query · ~5M tok/mo · 67% cache hit

−$2

AWS RDS Postgres + pgvector (db.t4g.medium)

100GB gp3 · vector index + chunks + traces (per ADR-002)

$50

$35

−$15

AWS ElastiCache Redis (cache.t4g.small)

session memory + L4 cache + cost tracker (per ADR-004)

$35

$26

−$9

GitHub Actions + container registry

~150 PR runs/mo × 6 min × Linux + GHCR

$10

—

Total · 3 tenants · 10k queries/mo

~$0.019 per query at baseline · ~$0.012 optimized

$194

$118

−$76 (−39%)

Optimization levers

Model cascade (Haiku for router + 70% gen · Sonnet for hard queries)

Haiku router on every query ($0.80/M in). 70% of gen calls go to Haiku; only complex / high-stakes queries escalate to Sonnet. Per ADR-003 confidence threshold + ADR-001 SystemContract correctness floor.

−$67 / mo · −35%

L1 query-result cache (1-hour TTL)

Per ADR-004 cascade — cache complete responses for repeated queries within 1h. Drops re-execution + re-LLM cost. ~12% hit rate on factual-class queries.

−$11 / mo · grows with question repetition

RDS + ElastiCache 1-yr reserved

Commit to 12-month reserved capacity once load is stable for 30 days. ~30% off RDS, ~26% off ElastiCache. Break-even at month 4.

−$24 / mo · −28% on store cost

EXPERT benefit · cohort beta

Async architecture review with a staff-level reviewer (cohort beta).

Submit your repo, your ADR draft, or your release-gate config. A staff or principal-level reviewer who has shipped this exact stack responds within 7 days with line-by-line comments. Cohort capped at 12 members.

Bring a diff, an ADR draft, or a release-gate config.

The cohort beta runs as async architecture review — pick a reviewer by topic, send the artifact, get inline comments + a Loom walkthrough back. No back-and-forth scheduling. No 30-minute slot pressure.

Mira R.

Ex-staff · RAG platform · top-3 cloud

Hybrid retrieval design, RRF tuning, cross-encoder rerank tradeoffs, multi-tenant index topology

“Send the diff. I'll go line-by-line through your retriever and the per-tenant table boundaries and pick out the joins that leak across.”

Daniel K.

Principal · LLM platform · enterprise SaaS

Failure cascade design, eval pipeline tuning, release-gate threshold calibration, cost model defense

“Send your worst stuck-deploy. We'll walk it backwards from the eval-gate failure to whether retrieval, generation, or grounding broke first.”

Anya S.

Eng manager · AI platform · public Series-D

Org design for AI platform teams, hiring rubrics, staff-engineer interview prep, ADR review

“If you're prepping for staff promo, send your ADR draft. We'll work backwards from the rubric.”

Format

Async

Turnaround

7 days

Cohort

12 members

Scope

ADR + arch review

Request a slot →

What your tier unlocks

PRO unlocks Modules 01-03. EXPERT unlocks the full platform.

PRO is the entry point — Modules 01-03 (a working RAG system) plus the rest of the PRO catalog. EXPERT unlocks Modules 04-06 of this build, the 5 ADRs, the cost-model CSV, and the cohort-beta async review.

What you getFREEPROEXPERT

Modules 01-03 of P17

Foundation + retrieval + LLM orchestration (~9h)

—

Included

Modules 04-06 of P17

Serving + eval + reliability + cost + capstone (~11h)

—

Included

5 committed ADRs + cost-model CSV

Starter kit docs/adr/ + docs/cost-model/

—

Included

PRO project catalog

Production-grade builds

All current

All current + this one

Curriculum

All 7 tracks

Phase 1 only

All

All + bonus modules

Code review

Senior+ reviewers

—

4 / month

Unlimited

Cohort-beta architecture review

Async · 7-day turnaround · 12-member cap

—

Included

Certificate

Verifiable on LinkedIn

—

Yes

Yes + LinkedIn rec

$79/mo

billed monthly · open enrollment · cancel anytime

or annual

$699/yr save 26%

Unlock EXPERT →

Who this is for

Pick this if you own the release gate, not just a feature.

Staff / principal engineers · AI platform

You own the release gate, the failure cascade, and the answer to 'why are we shipping this?' that your VP asks before launch.

Engineering managers · AI

You need a reference architecture for the RAG platform your CTO will ask about before the AI team gets headcount or a budget for production deployment.

Platform / infra leads

You absorb RAG without absorbing 4 new vendors. Postgres, Redis, Prometheus, Slack — tools you already operate. This is the playbook.

Founding engineers · AI startups

Your investors will ask 'how do you know your model is getting better?' before they ask about scale. The 5 ADRs + ReleaseGate + cost model is the answer.

Related curriculum

Going deeper? Four tracks back this project.

The RAG curriculum is the foundation. These four tracks let you go deeper on eval, agents, production ops, and the platform-design discipline you'll need at staff level.

FAQ · EXPERT tier

Quick answers.

How is this different from PRO?+

Modules 01-03 (the working RAG system — ingestion DAG, hybrid retrieval, query router + LLM orchestration with grounding score) are included with PRO at $29/mo. Modules 04-06 (production serving, eval + release gates, reliability + cost + capstone), the 5 committed ADRs, the runnable cost-model CSV, and the cohort-beta async architecture review unlock with EXPERT at $79/mo. PRO gets you a working RAG system; EXPERT gets you the platform you'd defend in an architecture review.

Is this still useful if I'm using OpenAI / Pinecone instead of Anthropic / pgvector?+

Yes — most of the value is in the design decisions, not the vendor stack. ADR-002 lays out exactly when pgvector wins vs Pinecone / Weaviate / Qdrant; the hybrid + RRF + rerank pattern is vendor-agnostic. The Retriever interface is a Protocol — swapping pgvector for Pinecone is documented as ~2 engineer-weeks. The cost-model CSV uses Anthropic prices but the cascade pattern works on any cheap-model + premium-model pair.

What are the 5 ADRs that ship with this project?+

ADR-001 SystemContract as the platform's north star (declared upfront, gates CI); ADR-002 pgvector + HNSW + hybrid RRF + cross-encoder rerank over dedicated vector DB; ADR-003 4-class query router with confidence threshold + ambiguous fallback; ADR-004 3-level failure cascade (RAG → LLM-only → cached → honest error); ADR-005 (Deprecated) v0 row-filter shared index that was reverted to per-tenant index isolation after a real cross-tenant chunk leak with 2-week dwell time.

Is the cohort-beta mentor program 1:1 video calls?+

Not for v1. The cohort beta runs as async review: you submit a diff / ADR / release-gate config / failure-cascade design, a staff-level reviewer responds within 7 days with inline comments + a Loom walkthrough. Cohort is capped at 12 members so reviewers can keep the SLA. We'll evaluate live 1:1 sessions once the cohort signal is solid.

How long until I can finish this EXPERT project?+

20-22 hours of focused work across 6 modules. Most learners spread it across 5-7 weeks alongside a day job. Modules 01-03 alone are ~9 hours and ship a working hybrid-retrieval RAG system you can deploy locally with the 150-doc / 2k-chunk seed corpus.

Is this enough to interview for staff AI / RAG-platform roles?+

It's a strong forcing function. Staff RAG-platform interviews lean heavily on system design (multi-tenant isolation, retrieval topology, failure handling, cost) and on having opinions backed by real tradeoffs. The 5 ADRs you commit (one Deprecated, with the cross-tenant leak incident) are exactly the artifacts a panel asks about. Pair with the cohort-beta async review on your final repo and you have the portfolio piece.

What is NOT in scope?+

Model fine-tuning. Pre-training. Custom embedding model training. Agent execution platforms (we use a tool framework in M03; we don't build the full agent loop — see /projects/agentic-data-pipeline for that). This is a RAG platform — you ship the system that retrieves + generates + evaluates, not the system that creates the models.

Related projects

Paired with this project

P15·PAID·ai

AI serving platform — vLLM + Ray Serve under SLA

EXPERT-tier inference build: vLLM continuous batching + PagedAttention, Ray Serve autoscale (market-hours min=2), Redis semantic cache (35% hit), ServingCircuitBreaker, 5 chaos scenarios + runbook, runnable cost-model CSV with break-even-vs-OpenAI math. Module 01 with PRO.

Explore project →

P14·PAID·ai

AI retrieval platform — pgvector + hybrid + RRF + cross-encoder

EXPERT-tier retrieval build: pgvector + HNSW, BM25 + RRF, cross-encoder reranker, OpenAI function-calling agent, semantic cache, drift detection, multi-region replication code. Modules 01-02 with PRO; full platform with EXPERT.

Explore project →

Ready to ship the system that retrieves, generates, and evaluates?

Start with PRO ($29/mo) for Modules 01-03 — the working RAG system. Or unlock the full 6-module platform plus 5 ADRs, the cost-model CSV, and cohort-beta architecture review with EXPERT ($79/mo).

See EXPERT benefits

P17 · Full-stack AI platform · EXPERT · PRO unlocks M01-M03Unlock EXPERT →

Build thefull-stack RAG platform— ingest, retrieve, serve, eval, harden

Modules 01-03 unlock with PRO. The full platform with EXPERT.

Foundation. Production. Capstone.

One command. Local FastAPI + Postgres + pgvector + Redis. No API key.

What lives in the repo

Full-stack AI Platform Starter Kit

The same RAG demo — but built for the multi-tenant case.

Write the ADRs staff engineers actually get judged on.

SystemContract as the platform's north star (declared upfront)

pgvector + HNSW + hybrid RRF + rerank, not a dedicated vector DB

4-class query router with confidence threshold + ambiguous fallback

Three-level failure cascade: RAG → LLM-only → cached → error

Multi-tenant retrieval via row-filter on shared index (v0)

Read the FinOps story, not just the latency one.

Optimization levers

Async architecture review with a staff-level reviewer (cohort beta).

Bring a diff, an ADR draft, or a release-gate config.

PRO unlocks Modules 01-03. EXPERT unlocks the full platform.

Pick this if you own the release gate, not just a feature.

Staff / principal engineers · AI platform

Engineering managers · AI

Platform / infra leads

Founding engineers · AI startups

Going deeper? Four tracks back this project.

LLM Evaluation

Agentic Workflows

MLOps for Data Engineers

System Design for Data Engineers

Quick answers.

Paired with this project

Ready to ship the system that retrieves, generates, and evaluates?

Build the
full-stack RAG platform
— ingest, retrieve, serve, eval, harden