ai-de.net/Projects/P06 · Enterprise RAG — retrieval-quality build

Last updated 2026-05-22By AI-DE Engineering Team

EXPERT-tier · PRO unlocks Module 01AI & vectors trackP06

Build a
retrieval-quality
RAG platform — that survives a relevance review

Ship a production RAG platform anchored on retrieval quality: 4 chunking strategies with measured 62/78/85% accuracy A/B, hybrid BM25 + dense + RRF retrieval, cross-encoder reranking, RAGAS 4-metric eval harness, and a multi-provider LLM gateway. Module 01 unlocks with PRO; the retrieval stack unlocks with EXPERT.

Timeline

16.5 hours

Difficulty

Senior+

Stack

FastAPI · OpenAI · Pinecone · Qdrant · Redis · Prometheus

See EXPERT benefits

The retrieval-quality portfolio piece for staff AI / ML platform roles — 5 committed ADRs, a runnable cost-model CSV, a chunking-A/B benchmark with cited numbers, and a hybrid + reranker pipeline you can defend in a relevance-review.

By the end you will have wired

Multi-format parsers (PDF / DOCX / Markdown / Text) + 4-strategy ChunkingFactory
Pinecone HNSW index over text-embedding-3-small (1536-dim) with namespace per tenant
Hybrid retriever — BM25 + dense fused via Reciprocal Rank Fusion (k=60)
Cross-encoder reranker (top-50 → top-10) with disable-able score floor
RAGAS 4-metric harness (faithfulness / answer_relevancy / context_precision / context_recall)
LLM gateway with fallback chain, response cache (1h TTL), per-tenant cost tracker
5 ADRs (one Deprecated) committed alongside the code, plus a runnable cost-model CSV

PREREQ · STAFF+Built for engineers who debug retrieval failures, not RAG demos. Comfortable with Python services, vector indexes, and at least one of: BM25 / hybrid retrieval, cross-encoders, or RAGAS-style eval harnesses. Not a “hello LangChain” tutorial.

enterprise-rag.platform · tenant=acme · · 4-stage pipeline

RAGAS canary green

Ingest

Retrieve

Generate

Operate

PDFParserpypdf · page markers

DOCXParserpython-docx · tables

ChunkingFactoryRecursive default · Semantic opt-in

/upload + workerFastAPI background task

Multi-format parsers + 4-strategy chunker

Pinecone HNSW1536-dim · namespace/tenant

BM25 indexin-process · lexical

RRF mergek=60 (paper default)

Cross-encodertop-50 → top-10

Hybrid BM25 + dense + reranker

LLMGatewaygpt-4o → gpt-4o-mini fallback

Response cache1h TTL · 20% hit rate

Query rewriterHyDE + multi-query

/chat streamcitations + scores

Gateway + fallback chain + streaming

RAGAS canary4 metrics · faithfulness > 0.85

Prometheusrag_query_latency · cache_hits

TenantManagernamespace + quota

Circuit breakerdegraded mode runbook

RAGAS + Prometheus + SLA guardrails

# Chunking A/B is the headline retrieval-quality artifact

Fixed-size (1000c, 200 overlap) → 62% accuracy@10 — cuts mid-sentence (ADR-005 Deprecated)

Recursive (paragraph → sentence → word) → 78% — +16pp default

Semantic (embedding-boundary detection) → 85% — opt-in for legal/medical/contract docs

→ ChunkingFactory routes by document tag; defaults are honest, upgrades are paid

# 5 ADRs + cost-model CSV bundled in the kit

docs/adr/001-hybrid-retrieval-rrf.md — BM25 + dense via RRF k=60

docs/adr/005-fixed-size-chunking-deprecated.md — the chunking-A/B reversal

docs/cost-model/enterprise-rag-cost-model.csv — 5 tenants × 300k req/mo

→ commit the artifacts a relevance review actually opens

70 files

in starter zip · ADRs bundled

−75%

model spend at the reference scenario

RAGAS × 4

metrics wired in CI

Curriculum · 5 modules · 16.5 hours · 3 phases

Module 01 unlocks with PRO. The retrieval stack with EXPERT.

Module 01 (~3h) ships multi-format parsers, the 4-strategy ChunkingFactory, the FastAPI upload API, and a React drag-and-drop uploader — included with PRO. Modules 02–05 (~13.5h) layer the Pinecone HNSW index, hybrid retriever, cross-encoder reranker, RAGAS harness, multi-provider LLM gateway, and multi-tenant blueprint — and unlock with EXPERT.

P06 · 5 modules · 16.5 hours · 35+ lessons

Free preview EXPERT required

M01

⊘Document Ingestion Pipeline

Multi-format parsers (PDF / DOCX / Markdown / Text via pypdf + python-docx) + 4 chunking strategies on a factory pattern (Fixed / Recursive / Semantic / Paragraph) with measured retrieval-accuracy A/B (62/78/85%). FastAPI upload route with background processing + React drag-and-drop uploader.

Phase 1: Foundation3h6 lessonsPRO TIER

Unlock with PRO →

M02

⊘Vector Database & Embeddings

Pinecone HNSW index over text-embedding-3-small (1536-dim, cosine), namespace-per-tenant isolation, batch embedding service with cost tracking, streaming embedder with content hashing + dead-letter queue, HNSW vs IVF deep-dive (95-99% recall, 100-1000× faster than brute force).

Phase 2: Retrieval quality3h7 lessonsEXPERT TIER

Unlock with EXPERT →

M03

⊘Querying, Prompting & Generation

Hybrid retrieval (BM25 + dense) fused via Reciprocal Rank Fusion, cross-encoder reranker (top-50 → top-10), multi-provider LLM client (OpenAI + Anthropic) with token counting, query rewriter + HyDE + multi-query expansion, streaming chat with citations + score breakdowns.

Phase 2: Retrieval quality3.5h8 lessonsEXPERT TIER

Unlock with EXPERT →

M04

⊘Production Hardening & Deployment

RAGAS evaluation harness (faithfulness / answer_relevancy / context_precision / context_recall), Prometheus metrics, multi-stage Dockerfile + non-root user + health checks, retrieval explainability route, API-key auth + per-tenant rate limiting, render.yaml + GitHub Actions deploy.

Phase 3: Production3.5h7 lessonsEXPERT TIER

Unlock with EXPERT →

M05

⊘AI Platform Orchestration & Multi-Tenancy

LLM Gateway with fallback chain (gpt-4o → gpt-4o-mini), response caching (1h TTL), per-tenant cost tracker. TenantManager with namespace-per-tenant + defense-in-depth filtering. SLA + circuit breaker + degraded-mode runbook. Event-driven sync via Kafka. Agentic RAG scaffold + feedback loop.

Phase 3: Production3.5h8 lessonsEXPERT TIER

Unlock with EXPERT →

Module 01 with PRO ($29/mo) · Modules 02–05 with EXPERT ($79/mo)

See plans →

Backed by curriculum

RAG Learning Path

8 modules35 hoursRAG patterns · Retrieval · Reranking · Eval

Open curriculum

iThe RAG curriculum is the foundation for the project — same stack you build here, taught from chunking and retrieval-quality first principles. EXPERT subscribers get full access to all modules.

The build, in 3 phases

Foundation. Retrieval quality. Platform.

Each phase ends with a tagged release, a passing RAGAS canary on the 4-metric harness, and a measurable retrieval-quality delta on the seed corpus. No ambiguity about where you are.

01~3h

Foundation (Module 01)

Multi-format ingest live locally. ChunkingFactory routing by document tag; recursive default benchmarked at +16pp over fixed-size.

✓PDF + DOCX + Markdown + Text parsers with page-marker / table-aware extraction
✓ChunkingFactory with 4 strategies + measured A/B vs golden set
✓FastAPI upload + React uploader + background processing with status polling

02~6.5h

Retrieval quality (Modules 02–04)

Hybrid retrieval + cross-encoder reranker live. RAGAS 4-metric canary green; faithfulness > 0.85 enforced as quality floor.

✓Pinecone HNSW (1536-dim) + BM25 + RRF merge with k=60 fusion constant
✓Cross-encoder rerank (top-50 → top-10) with disable-able score floor
✓RAGAS canary in CI on the 40-question golden set + Prometheus metrics

03~3.5h

Platform (Module 05)

LLM Gateway with fallback chain + per-tenant cost tracker. Multi-tenant namespace isolation, SLA + circuit breaker, runbook drilled.

✓LLM gateway library with response cache (1h TTL) + per-tenant rate limit
✓TenantManager with namespace-per-tenant + defense-in-depth filtering
✓Failure-mode runbook + circuit-breaker degraded mode in M05 capstone

Project setup · 15 minutes

One command. Local FastAPI + Qdrant + Redis (no API key needed).

What lives in the repo

You get the unified production code on day one — FastAPI as the gateway, Qdrant for local vector storage (Pinecone-compatible interface), Redis for caching + rate-limit fallback, sentence-transformers cross-encoder for reranking, plus the lazy-init OpenAI/Anthropic clients so imports stay clean even without keys.

docker-compose.yml — Qdrant + Redis + API container with health checks
app/services/document_parser.py + chunking.py — pypdf + python-docx + 4-strategy ChunkingFactory
app/services/vector_store.py + rag_service.py — Pinecone/Qdrant + hybrid retrieve + cross-encoder reranker
scripts/ragas_eval.py — RAGAS canary harness with 4-metric scoring
gateway/llm_gateway.py + tenant/tenant_manager.py — fallback chain + response cache + per-tenant quota
docs/adr/ + docs/cost-model/ — 5 ADRs (one Deprecated) + the runnable cost-model CSV

Download · Starter Kit · 70 files · 75 KB

Enterprise RAG Starter Kit

Pre-built RAG platform with parsers, ChunkingFactory, hybrid retriever, RAGAS harness, LLM gateway, and the 4-document seed corpus that powered the chunking A/B benchmark. Now bundled: 5 committed ADR markdown files (docs/adr/) and the runnable cost-model CSV (docs/cost-model/) — unzip and read them straight from the repo.

EXPERT project · 70 files · ADRs + cost model bundled · last updated 2026-05-09

~/projects/enterprise-rag — zsh

1. Unzip and bring up Qdrant + Redis (no API key needed)

$ unzip enterprise-rag-starter.zip -d enterprise-rag

$ cd enterprise-rag && cp .env.example .env

$ docker-compose up -d

2. Install Python deps and boot the FastAPI app

$ python -m venv .venv && source .venv/bin/activate

$ pip install -r requirements.txt

$ uvicorn app.main:app --reload --port 8000

3. Upload a sample doc and run a hybrid query

$ curl -F file=@data/sample_docs/long_handbook.md http://localhost:8000/upload

$ curl -X POST http://localhost:8000/chat \

$ -d '{"query":"what is the parental leave policy"}'

4. Read the ADRs and re-run the cost model

$ cat docs/adr/001-hybrid-retrieval-rrf.md

$ open docs/cost-model/enterprise-rag-cost-model.csv

seed docs · long_handbook + 3

~80

chunks · recursive default

golden questions · A/B set

RAGAS × 4

metrics in CI

Production hardening

The same RAG demo — but built for the relevance-review case.

Most RAG tutorials stop at vector search + a prompt template. This shows what changes when retrieval failures land in your inbox, not the LLM’s — and a relevance reviewer wants to see the chunking-A/B numbers behind your defaults.

Notebook RAGWhat most teams ship

Chunking

Fixed-size 1000 chars, hope for the best

Retrieval

Vector similarity only, top-K from one index

Reranking

None — trust the bi-encoder ordering

Eval

Eyeball a few outputs, ship

LLM call

Hardcoded gpt-4o, no fallback

Failure mode

Bad answer, no signal, no recovery

Your retrieval-quality platformModules 02–05

✓

Chunking

ChunkingFactory routes by tag; recursive default (78%), semantic opt-in for legal/medical (85%) — ADR-003

✓

Retrieval

BM25 + Pinecone HNSW fused via RRF k=60 — ADR-001

✓

Reranking

Cross-encoder (ms-marco-MiniLM-L-6-v2) over top-50 → top-10 with disable-able score floor — ADR-002

✓

Eval

RAGAS canary on 40-question golden set — faithfulness floor > 0.85 from sla.yaml

✓

LLM call

LLMGateway with gpt-4o → gpt-4o-mini fallback + 1h response cache — ADR-004

✓

Failure mode

Circuit breaker + degraded mode (LLM down → chunks-only; vector-store down → BM25-only) + on-call runbook

EXPERT-only · architecture decision records

Write the ADRs staff retrieval engineers actually get judged on.

Five ADRs ship inside the starter-kit zip at docs/adr/, one per major decision in the build, including a real Deprecated ADR documenting the fixed-size-chunking → recursive reversal that the chunking-A/B benchmark forced. Preview ADR-001 →

ADR-001Accepted

Hybrid retrieval (BM25 + dense) with Reciprocal Rank Fusion

Context

Dense-only misses exact-match queries (SKUs, function names, headings) by ~10-15pp; BM25-only loses paraphrase recall

Decision

Run both retrievers in parallel; fuse rankings via RRF k=60; feed top-50 to the reranker

Tradeoff

Two indexes to keep consistent + ~30-40ms parallel latency vs hit-rate@10 lift from ~74% to ~88%

Reversal

Drop BM25 when its marginal hit-rate < 5pp; ~3 engineer-days to disable

ADR-002Accepted

Cross-encoder reranking (top-50 → top-10) precision lever

Context

Bi-encoders score chunks independently — never compare query and chunk in same forward pass; LLM-as-reranker is too expensive on every query

Decision

ms-marco-MiniLM-L-6-v2 cross-encoder on dedicated CPU pool; rerank top-50 → top-10 with score floor

Tradeoff

+150-200ms p95 latency + CPU compute vs +14pp hit-rate@10 (74% → 88%) on the chunking-A/B bench

Reversal

Disable per-endpoint via .env flag if precision lift drops < 5pp on RAGAS canary

ADR-003Accepted

Recursive chunking default; semantic for high-value docs only

Context

Chunking A/B on 4-doc seed: Fixed 62% / Paragraph 71% / Recursive 78% / Semantic 85% retrieval accuracy@10

Decision

ChunkingFactory.from_metadata — recursive default; semantic for legal/medical/contract tags; paragraph for clean Markdown

Tradeoff

Operational complexity (factory + tag policy + 2 thresholds) vs avoiding either-or (~$30k/yr semantic-everywhere or 16pp accuracy regression)

Reversal

ENV flag drops factory to recursive-only if mix stops earning its complexity; ~3 engineer-days

ADR-004Accepted

LLM Gateway with fallback chain (gpt-4o → gpt-4o-mini)

Context

Per-handler LLM calls fail vendor outages, repeat per-tenant cost-attribution code, and miss 20% cache opportunity

Decision

LLMGateway singleton — fallback chain, 1h response cache, per-tenant rate-limit + cost cap, in-process v1

Tradeoff

Single point of failure + test-state cleanup vs centralised outage / cost / cache surface

Reversal

Extract as out-of-process service (~5 engineer-days, interface preserved)

EXPERT-only · cost model

Read the per-query story, not just the latency one.

Module 04 ships a runnable cost-model CSV inside the starter-kit zip at docs/cost-model/. 5-tenant reference load (10k qpd · 300k req/mo), real OpenAI + Pinecone serverless + AWS list prices, with the model-cascade and 20%-cache-hit levers wired up. The version you defend to a CFO. Preview the CSV →

ComponentBaseline / moOptimized / moDelta

OpenAI GPT-4o (premium tier)

100% baseline · 20% optimized via router · 2K in / 300 out / req

$2,400

$384

−$2,016

OpenAI GPT-4o-mini (mini tier)

80% optimized via router · ~17× cheaper than premium

—

$92

—

OpenAI text-embedding-3-small

query embeddings + amortised ingest

—

Pinecone serverless · HNSW

2M vectors · 5 namespaces (one per tenant)

$70

—

ElastiCache Redis (cache.t4g.small + replica)

response cache + BM25 cache + rate-limit fallback

$54

—

Reranker compute (cross-encoder, self-hosted)

ms-marco-MiniLM on t4g.medium · ~150ms p95

$30

—

Total · 5 tenants · 300k req/mo

$0.0085 per req baseline → $0.0021 optimized

$2,559

$635

−$1,924 (−75%)

Optimization levers

Model cascade (gpt-4o-mini default + gpt-4o escalation)

ADR-004. LLM gateway routes 80% of traffic to gpt-4o-mini; only escalates the 20% that need premium reasoning. Largest single lever.

−$1,901 / mo (alone)

Semantic response cache (1h TTL)

ADR-004 again — gateway hashes prompt + tenant + retrieved-chunks fingerprint. 20% hit rate at the reference workload. Stacks with cascade.

−$480 / mo (alone)

Cross-encoder reranking → tighter prompts

ADR-002. Top-50 → top-10 means the LLM sees only highest-precision chunks, reducing average input ~5%.

−$30 / mo (additional)

EXPERT benefit · cohort beta

Async architecture review with a staff-level reviewer (cohort beta).

Submit your repo, your ADR draft, or your RAGAS canary numbers. A staff or principal-level reviewer who has shipped this exact stack responds within 7 days with line-by-line comments. Cohort capped at 12 members.

Bring a diff, an ADR draft, or a RAGAS regression report.

The cohort beta runs as async architecture review — pick a reviewer by topic, send the artifact, get inline comments + a Loom walkthrough back. No back-and-forth scheduling. No 30-minute slot pressure.

Priya R.

Ex-staff · retrieval platform · top-3 search co.

Hybrid retrieval, reranker tuning, chunking strategy A/B, eval-driven retrieval design

“Send the diff. I'll go line-by-line through your RRF fusion and your reranker score floor and pick out the cases where lexical and semantic disagree.”

Jamal K.

Principal · AI infra · public Series-D

LLM gateway design, per-tenant cost attribution, fallback chain tradeoffs, cache strategy

“Send your CSV and your gateway code. We'll walk it backwards from the totals row to the assumption that breaks first when traffic 3×s.”

Rachel W.

Eng manager · AI · public mid-cap

Org design for retrieval teams, RAGAS regression gates, hiring rubrics, staff promo packets

“If you're prepping for staff, send your ADR-005 + your chunking A/B writeup. We'll work backwards from the rubric.”

Format

Async

Turnaround

7 days

Cohort

12 members

Scope

ADR + RAGAS review

Request a slot →

What your tier unlocks

PRO unlocks Module 01. EXPERT unlocks the full retrieval stack.

PRO is the entry point — Module 01 (multi-format parsers + 4-strategy chunker + upload API + uploader UI) plus the rest of the PRO catalog. EXPERT unlocks Modules 02–05 of this build, the 5 ADRs, the cost-model CSV, and the cohort-beta async architecture review.

What you getFREEPROEXPERT

Module 01 of P06

Document Ingestion + ChunkingFactory (~3h)

—

Included

Modules 02–05 of P06

Retrieval / Production / Platform (~13.5h)

—

Included

5 committed ADRs + cost-model CSV

Starter kit docs/adr/ + docs/cost-model/

—

Included

PRO project catalog

Production-grade builds

All current

All current + this one

Curriculum

All 7 tracks

Phase 1 only

All

All + bonus modules

Code review

Senior+ reviewers

—

4 / month

Unlimited

Cohort-beta architecture review

Async · 7-day turnaround · 12-member cap

—

Included

Certificate

Verifiable on LinkedIn

—

Yes

Yes + LinkedIn rec

$79/mo

billed monthly · open enrollment · cancel anytime

or annual

$699/yr save 26%

Unlock EXPERT →

Who this is for

Pick this if you own retrieval quality, not just the prompt.

AI engineers shipping RAG to staff/principal

You debug retrieval failures, you tune reranker thresholds, and your manager asks 'what's our hit-rate@10?' on review day. The chunking A/B + RAGAS harness is the answer.

ML platform leads building retrieval infra

You absorb retrieval as a platform service. Pinecone + BM25 + reranker + LLM gateway as a single library that the rest of the org calls — that's the playbook.

Founding engineers · AI startups

Your first paying customer will ask 'why did the bot say that?' before they ask about scale. Citations + explainability + RAGAS faithfulness is the answer in one repo.

Engineering managers · AI

You need a relevance-review rubric for the AI roadmap your VP will ask about. The 5 ADRs (one Deprecated) are exactly the artifacts a panel actually opens.

Related curriculum

Going deeper? Four tracks back this project.

The RAG curriculum is the foundation for the project. These four tracks let you go deeper on the parts that matter most for your role.

FAQ · EXPERT tier

Quick answers.

How is this different from P30 enterprise-ai-platform?+

Different lanes. P06 owns retrieval *quality* — hybrid BM25 + dense + RRF, cross-encoder reranking, 4-strategy chunking A/B (62/78/85% accuracy), RAGAS 4-metric eval harness, HNSW vs IVF tradeoffs, query rewriting / HyDE / multi-query. P30 owns retrieval *governance* — Postgres RLS multi-tenancy, Presidio PII redaction, prompt-injection / jailbreak guardrails, GDPR data-deletion, audit logs. Pick P06 if your manager asks 'what's our hit-rate@10?' Pick P30 if your CISO asks 'what about the audit log?'

How is this different from PRO?+

Module 01 (multi-format parsers + 4-strategy ChunkingFactory + upload API + React uploader) is included with PRO at $29/mo — a working ingestion pipeline you can ship a demo on. Modules 02–05 (Pinecone HNSW + hybrid retriever + cross-encoder reranker + RAGAS canary + LLM gateway + multi-tenant blueprint), plus the 5 committed ADRs, the runnable cost-model CSV, and the cohort-beta async architecture review, unlock with EXPERT at $79/mo. PRO gets you the foundation; EXPERT gets you the retrieval-quality system you'd defend in a relevance review.

Are the chunking-A/B numbers (62 / 78 / 85%) real?+

Real and reproducible on the seed corpus. 4 documents (long_handbook + short_faq + structured_pricing + noisy_release_notes) × 40 hand-built golden questions × 4 strategies = the benchmark in the kit. ADR-005 documents the reversal from fixed-size (62%) to recursive (78%) when the benchmark exposed mid-sentence cuts in the long-handbook content. Real workloads will move these numbers; the methodology is what travels.

Does this work without a real Pinecone account?+

Yes. The starter kit ships with Qdrant in docker-compose as the default vector store — Pinecone-compatible interface, no API key needed. The Pinecone code path is enabled by an env flag if you want to run against the managed service. The chunking, retrieval, rerank, and RAGAS code are vector-store-agnostic.

Can my company expense it?+

Yes — receipts and a learning-budget letter are downloadable on subscription. Many EXPERT learners are reimbursed under engineering training or AI upskilling budgets. The cost-model CSV is itself a defensible business artifact for that conversation.

Is this enough to interview for staff retrieval / RAG roles?+

It's a strong forcing function. Staff AI / ML platform interviews lean heavily on retrieval system design (chunking, hybrid retrieval, reranking, eval) and on having opinions backed by real tradeoffs. The 5 ADRs you commit (one Deprecated, with the chunking A/B receipts) are exactly the artifacts a panel asks about. Pair with the cohort-beta async review on your final repo and you have a portfolio piece that survives a staff promo packet.

Related projects

Paired with this project

P30·PAID·ai

Enterprise AI platform — multi-tenant governed RAG

EXPERT-tier governance build: pgvector + Postgres RLS, Presidio + jailbreak guardrails, lineage + policy in Redis, per-tenant cost tracker, OTel + Prometheus. 4 modules · 20-26h. Module 01 with PRO.

Explore project →

P14·PAID·ai

AI retrieval platform — pgvector + hybrid + RRF + cross-encoder

EXPERT-tier retrieval build: pgvector + HNSW, BM25 + RRF, cross-encoder reranker, OpenAI function-calling agent, semantic cache, drift detection, multi-region replication code. Modules 01-02 with PRO; full platform with EXPERT.

Explore project →

Ready to ship a retrieval-quality RAG platform?

Start with PRO ($29/mo) for Module 01 — multi-format parsers + 4-strategy ChunkingFactory. Or unlock the full 5-module retrieval stack plus 5 ADRs, the cost-model CSV, and cohort-beta architecture review with EXPERT ($79/mo).

See EXPERT benefits

P06 · Enterprise RAG · EXPERT · PRO unlocks M01Unlock EXPERT →

Build aretrieval-qualityRAG platform — that survives a relevance review

Module 01 unlocks with PRO. The retrieval stack with EXPERT.

Foundation. Retrieval quality. Platform.

One command. Local FastAPI + Qdrant + Redis (no API key needed).

What lives in the repo

Enterprise RAG Starter Kit

The same RAG demo — but built for the relevance-review case.

Write the ADRs staff retrieval engineers actually get judged on.

Hybrid retrieval (BM25 + dense) with Reciprocal Rank Fusion

Cross-encoder reranking (top-50 → top-10) precision lever

Recursive chunking default; semantic for high-value docs only

LLM Gateway with fallback chain (gpt-4o → gpt-4o-mini)

Read the per-query story, not just the latency one.

Optimization levers

Async architecture review with a staff-level reviewer (cohort beta).

Bring a diff, an ADR draft, or a RAGAS regression report.

PRO unlocks Module 01. EXPERT unlocks the full retrieval stack.

Pick this if you own retrieval quality, not just the prompt.

AI engineers shipping RAG to staff/principal

ML platform leads building retrieval infra

Founding engineers · AI startups

Engineering managers · AI

Going deeper? Four tracks back this project.

Vector Databases & Retrieval Infrastructure

LLM Evaluation

AI Inference & Serving Systems

Agentic Workflows

Quick answers.

Paired with this project

Ready to ship a retrieval-quality RAG platform?

Build a
retrieval-quality
RAG platform — that survives a relevance review