Build a
production retrieval
platform with pgvector + RRF
Ship a real retrieval platform with pgvector + HNSW, BM25 + GIN, Reciprocal Rank Fusion, a cross-encoder reranker, hash-based incremental updates, an OpenAI function-calling agent, Redis semantic cache, and an SLA verifier that runs 1k queries with p50/p95/p99 latency and recall@10 checks. Modules 01-02 unlock with PRO; the full platform with EXPERT.
The retrieval system-design portfolio piece for staff AI infra roles — 5 committed ADRs, a runnable cost-model CSV, a multi- region replication design, and a runbook with 6 P1/P2 failure scenarios you can defend in an architecture review.
- pgvector + HNSW index with M=16, ef_construction=64, ef_search tuned via a recall-vs-latency sweep
- Hybrid retriever with BM25 + RRF (k=60) + cross-encoder reranker hitting ~0.81 recall@10 on 50 golden queries
- Hash-based incremental embedding pipeline (~1% daily churn vs 100% nightly) with versioned model migration
- OpenAI function-calling agent with short-term + long-term memory + cosine-similarity context retrieval
- Prometheus + Grafana observability with semantic cache (50%+ hit-rate), A/B router, drift diagnosis
- 5 ADRs (one Deprecated) committed alongside the code, plus a runnable cost-model CSV
Modules 01-02 unlock with PRO. Modules 03-05 with EXPERT.
Modules 01-02 (~6h) ship a working hybrid retriever with BM25 + RRF + cross-encoder hitting ~0.81 recall@10 on the bundled 50 golden queries — included with PRO. Modules 03-05 (~10h additional) layer on the scale, observability, and production-platform story and unlock with EXPERT.
Embed. Retrieve. Scale.
Each phase ends with a tagged release, a passing eval suite, and a runbook drill. No ambiguity about where you are.
Working hybrid retriever with BM25 + RRF + cross-encoder hitting ~0.81 recall@10 on 50 golden queries. HNSW tuned, eval harness green.
- ✓pgvector + HNSW index (recall-vs-latency tuned)
- ✓Hybrid /search endpoint with RRF + reranker
- ✓Eval harness · recall@10 + MRR + nDCG
1M-capable pipeline with hash-based incremental + versioned embeddings + OpenAI agent. Prometheus + Grafana + semantic cache + drift diagnosis + SLA verifier all green.
- ✓Incremental embedding · resume-on-failure
- ✓Agent with function calling + memory
- ✓Semantic cache · A/B router · drift diagnosis
Multi-region replication code, GDPR + PII RBAC, sharding, DESIGN.md, runbook YAML with 6 P1/P2 failure scenarios.
- ✓Multi-region async replication + lag monitoring
- ✓GDPR cascade delete + compliance report
- ✓Runbook YAML · 6 scenarios · P1/P2
One command. Local pgvector + Redis + FastAPI + cross-encoder.
What lives in the repo
You get the real platform on day one — pgvector + HNSW + GIN indexes inside Postgres, a FastAPI service for semantic + hybrid + agent endpoints, Redis for the semantic cache, sentence-transformers cross-encoder for reranking, and Prometheus + Grafana for the dashboards.
- seed/ + migrations/ — 5-table schema, BM25 GENERATED column, HNSW tune migrations
- api/ + scripts/ — FastAPI endpoints, RRF fuser, reranker, ingest, eval, HNSW benchmark
- embedding_pipeline.py + incremental_manager.py + embedding_version.py — async batch + hash-incremental + zero-DT model migration
- retrieval_agent.py + agent_memory.py — OpenAI function calling + short/long-term memory
- metrics.py + semantic_cache.py + ab_router.py + drift_diagnosis.py + sla_verifier.py — Prometheus instrumentation + cache + A/B + drift + SLA
- infra/ + runbook/ + DESIGN.md — multi-region + GDPR + PII RBAC + sharding + 6-scenario runbook YAML
- docs/adr/ + docs/cost-model/ — 5 ADRs (one Deprecated) + the runnable cost-model CSV
AI Retrieval Platform Starter Kit
Pre-built retrieval platform: 5 tutorial modules of source, Docker compose (pgvector + Redis), 5K seeded docs, 50 golden queries, 2K labeled pairs, 1K drift corpus, 7 pytest gates. Now bundled: 5 committed ADR markdown files (docs/adr/) and the runnable cost-model CSV (docs/cost-model/) — unzip and read them straight from the repo.
The same RAG retrieval — but built for the production case.
Most retrieval tutorials show you a flat NumPy scan against a pickled index. This shows what changes when 4 tenants share infrastructure, on-call owns the recall dashboard, and finance asks for cost-per-1k-queries.
pgvector HNSW with M / ef_search tuned + recall benchmarkRRF + cross-encoder reranker (+23% over BM25-only)incremental_manager.py, ~1%/day)agent_memory short/long-termsla_verifier (p50/p95/p99 + recall + cost)Write the ADRs staff engineers actually get judged on.
Five ADRs ship inside the starter-kit zip at docs/adr/, one per major decision in the build, including a real Deprecated ADR documenting the single-fixed-embedding-model reversal. The kind of doc that travels with you to your next role. Preview ADR-001 →
pgvector + HNSW over Qdrant / Pinecone / Weaviate
CREATE EXTENSION vector + HNSW(M=16, ef_construction=64); Qdrant kept as alternativeHybrid retrieval is RRF, not score averaging or learned-sparse
1.0 / (k + rank), k=60 — rank-only fusion, score-freeReranker is a CPU cross-encoder, not LLM-as-judge
cross-encoder/ms-marco-MiniLM-L-6-v2, batch 32 on CPU, <50ms top-50→top-10Index updates use hash-based incremental, not nightly full re-embed
compute_hash(content) SHA-256 · re-embed only when hash changed (~1%/day)Single fixed embedding model assumption
embedding_version + embedding_status + zero-downtime re-embedRead the FinOps story for the platform you actually ship.
Module 04 ships a runnable cost-model CSV inside the starter-kit zip at docs/cost-model/. 1-tenant beta load (~1M queries/mo, 1M-vector corpus), real AWS RDS + EC2 + ElastiCache + OpenAI list prices, with the 1-yr Reserved Instance, semantic-cache hit-rate, and hash-based incremental embedding levers wired up. Preview the CSV →
Optimization levers
Async architecture review with a staff-level reviewer (cohort beta).
Submit your repo, your ADR draft, or your recall-vs-latency benchmark. A staff or principal-level reviewer who has shipped this exact stack at scale responds within 7 days with line-by-line comments + a Loom walkthrough. Cohort capped at 12 members.
Bring a diff, an ADR draft, or a recall benchmark.
The cohort beta runs as async architecture review — pick a reviewer by topic, send the artifact, get inline comments + a Loom walkthrough back. No back-and-forth scheduling. No 30-minute slot pressure.
PRO unlocks Modules 01-02. EXPERT unlocks the full platform.
PRO is the entry point — Modules 01-02 plus the rest of the PRO catalog. EXPERT unlocks Modules 03-05 of this build, the 5 ADRs, the cost-model CSV, and the cohort-beta async review.
Pick this if you own the recall dashboard, not just a query.
Senior retrieval engineers
You've shipped semantic search. Now you own the eval harness, the recall dashboard, the embedding-version migration plan, and the architecture review with platform.
AI infra engineers
You absorb new RAG features without absorbing new vendors. pgvector + Redis + FastAPI + Prometheus — tools your platform team already operates.
Engineering managers · search / RAG
You need a reference architecture for the retrieval-quality + cost questions your CTO will ask before the AI team gets headcount or a model-serving budget.
Founding engineers · AI startups
Your investors will ask about retrieval quality and unit economics before they ask about scale. The 5 ADRs + cost model + recall benchmark is the answer.
Going deeper? Four tracks back this project.
The Vector Databases curriculum is the foundation. These four tracks let you go deeper on the parts that matter most for your role.
Quick answers.
Paired with this project
EXPERT-tier retrieval-quality RAG: 4-strategy chunking A/B (62/78/85%), hybrid BM25 + dense + RRF, cross-encoder reranker, RAGAS 4-metric canary, LLM gateway with fallback. 5 ADRs + cost-model CSV bundled.
EXPERT-tier full-stack RAG: pgvector + HNSW + hybrid retrieval + RRF + cross-encoder rerank, 4-class query router with confidence threshold, 3-level failure cascade (RAG → LLM-only → cached), per-tenant index isolation, eval gates, cost guardrails, 6-mode incident simulator, 5 committed ADRs (one Deprecated), runnable cost-model CSV. 6 modules · 20-22h. Modules 01-03 with PRO.
Ready to ship a real retrieval platform?
Start with PRO ($29/mo) for Modules 01-02 — pgvector + hybrid retrieval + reranker + eval harness. Or unlock the full 5-module platform plus 5 ADRs, the cost-model CSV, and cohort-beta architecture review with EXPERT ($79/mo).