Skip to content

Back to Insights

Production Decisions

Architecture Decision Records

Every EXPERT project ships 5 ADRs that document the real engineering tradeoffs behind it — chunking strategies, retrieval fusion, caching tiers, exactly-once delivery, judge cascades. Including one Deprecated decision per project with the receipts for why it was reverted. 50 ADRs across 10 projects.

Enterprise RAG

ADR-001 · Hybrid retrieval (BM25 + dense) with Reciprocal Rank Fusion
Accepted
A RAG system's retrieval layer is the single largest determinant of answer quality. Module 02 ships a Pinecone HNSW index over text-embedding-3-small (1536-dim, cosine). On the seed corpus (4 document
ADR-002 · Cross-encoder reranking on top-K (precision lever vs latency cost)
Accepted
The hybrid retriever (ADR-001) returns the fused top-50 chunks. The LLM's context window can hold maybe top-10 of those without saturating cost or diluting attention. The question is: which 10?
ADR-003 · Recursive chunking as default; semantic for high-value docs
Accepted
Chunking is the most consequential decision in a RAG system. The chunk you emit at ingest time is the chunk the LLM sees at query time — there is no recovery from a bad cut. We benchmarked 4 strategie
ADR-004 · LLM gateway with fallback chain (gpt-4o → gpt-4o-mini)
Accepted
By Module 04 every API handler in the codebase calls the LLM directly via OpenAIClient or AnthropicClient (Module 03's multi-provider client). That works locally but fails three real production needs:
ADR-005 · Fixed-size chunking (DEPRECATED)
Deprecated
When the M01 ingestion pipeline first shipped, the chunker was a single fixed-size strategy: split every document into 1000-character windows with 200-character overlap. The rationale was reasonable o

AI Cost Optimization

AI Serving Platform

Agentic Data Pipeline

AI Retrieval Platform

Full-Stack AI Platform

LLM Evaluation Framework

LLM Ingestion Pipeline

PredictFlow Feature Store

Enterprise AI Platform

Press Cmd+K to open

No internet connection. Some features may be unavailable.