Skip to content
Back to LLM Engineering Path

RAG Systems in Production

How Notion and Intercom built context-aware AI with Retrieval-Augmented Generation

Why These Case Studies Matter

RAG (Retrieval-Augmented Generation) has become the dominant pattern for building production LLM applications. Instead of fine-tuning models on your data, RAG retrieves relevant context and includes it in the prompt. This approach is faster, cheaper, and more maintainable.

These case studies reveal how Notion and Intercom built RAG systems that serve millions of users daily. You'll learn retrieval strategies, chunking techniques, permission handling, and production considerations that apply to any RAG application.

Learning Path: After reading these case studies, build your own RAG system with the RAG Enterprise Project, then follow the step-by-step walkthrough.

Note on Metrics: These case studies are based on publicly available information from engineering blogs, conference talks, and open-source documentation. While we've verified core architectural patterns and technologies, some specific numbers (especially cost figures and exact scale metrics) are estimates for educational purposes. Where possible, we've updated unverified claims to reflect documented information or general ranges.

Featured Case Studies

Deep dives into Notion AI and Intercom's RAG architectures

Notion AI

Case Study #1

!

The Problem

Users wanted AI that understood their workspace context - documents, databases, wikis - not generic responses. Needed to search across millions of pages in real-time while maintaining security boundaries and generating contextually relevant responses.

Scale

Active Users
30 million+
Documents Indexed
1 billion+
AI Queries/Day
10 million+
Vector Dimensions
1536 (OpenAI)
Latency SLA
<2 seconds
Accuracy Target
>90% relevance
Click "Read More" to see the full solution, impact metrics, and key takeaways

Intercom

Case Study #2

!

The Problem

Support teams needed AI that could answer customer questions using knowledge base articles, past conversations, and product documentation. Required real-time retrieval across 100M+ messages while maintaining conversation context and handling multi-turn dialogues.

Scale

Messages/Day
100 million+
Knowledge Articles
500,000+
Conversations Indexed
1 billion+
Languages Supported
45+
Response Time
<3 seconds
Resolution Rate
60% automated
Click "Read More" to see the full solution, impact metrics, and key takeaways
Press Cmd+K to open