RAG Enterprise Project

Step-by-Step Walkthrough: Build a Production RAG System

Total Time: ~2 hours

Difficulty: Advanced

Tools: LangChain, ChromaDB, OpenAI

What You'll Build

In this walkthrough, you'll build a production RAG (Retrieval-Augmented Generation) system that enhances LLM responses with your own documents:

Parse and chunk documents (PDF, TXT, Markdown)
Generate embeddings and store in vector database
Implement semantic search with similarity scoring
Build question-answering with retrieved context
Test retrieval quality and answer accuracy

Prerequisites

Python 3.8+ installed

OpenAI API key (or other LLM provider)

Understanding of embeddings and vector search

Basic knowledge of LLMs and prompting

Set Up RAG Environment

25 min

1.1 Create Project Structure

# Create project directory

mkdir rag-enterprise

cd rag-enterprise

# Create subdirectories

mkdir -p documents/raw documents/processed src vectorstore

1.2 Install RAG Stack

# Create virtual environment

python -m venv venv

source venv/bin/activate

# Install dependencies

pip install langchain==0.1.0 chromadb==0.4.18 \

openai==1.6.1 tiktoken==0.5.2 \

pypdf==3.17.4 python-dotenv==1.0.0

# Save requirements

pip freeze > requirements.txt

1.3 Configure API Keys

# Create .env file

cat > .env <<EOF

OPENAI_API_KEY=your-api-key-here

CHROMA_PERSIST_DIRECTORY=./vectorstore

EMBEDDING_MODEL=text-embedding-3-small

LLM_MODEL=gpt-3.5-turbo

EOF

API Key Security

Never commit .env files to Git! Add .env to your .gitignore file immediately.

1.4 Create Sample Documents

Add a sample document to test with:

# Create sample document

cat > documents/raw/company_policy.txt <<EOF

Company Remote Work Policy

1. Eligibility: All full-time employees are eligible for remote work

after 90 days of employment.

2. Schedule: Remote employees must maintain core hours of 10am-3pm EST

for team collaboration.

3. Equipment: Company provides laptop, monitor, and $500 home office

stipend for eligible employees.

4. Communication: Daily standup at 10am via Zoom. Slack response

within 2 hours during core hours.

5. Performance: Quarterly reviews assess output quality and

collaboration effectiveness, not hours worked.

EOF

1.5 Test OpenAI Connection

from openai import OpenAI

from dotenv import load_dotenv

import os

load_dotenv()

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# Test embedding generation

response = client.embeddings.create(

model="text-embedding-3-small",

input="Hello, world!"

)

print(f"Embedding dimension: {len(response.data[0].embedding)}")

Expected Output

Embedding dimension: 1536

Build Document Parser and Chunker

40 min

2.1 Create Document Loader

# src/document_loader.py

from langchain.document_loaders import TextLoader, PyPDFLoader

from langchain.text_splitter import RecursiveCharacterTextSplitter

from pathlib import Path

def load_documents(directory):

"""Load all documents from directory"""

documents = []

path = Path(directory)

for file in path.glob("**/*"):

if not file.is_file():

continue

if file.suffix == ".txt":

loader = TextLoader(str(file))

elif file.suffix == ".pdf":

loader = PyPDFLoader(str(file))

else:

continue

documents.extend(loader.load())

print(f"Loaded {len(documents)} documents")

return documents

2.2 Implement Smart Chunking

Chunk documents with overlap for better retrieval:

def chunk_documents(documents, chunk_size=500, chunk_overlap=50):

"""Split documents into chunks with overlap"""

text_splitter = RecursiveCharacterTextSplitter(

chunk_size=chunk_size,

chunk_overlap=chunk_overlap,

length_function=len,

separators=["\n\n", "\n", " ", ""]

)

chunks = text_splitter.split_documents(documents)

print(f"Created {len(chunks)} chunks")

return chunks

Why Chunk Overlap?

Overlap ensures context isn't lost at chunk boundaries. A 50-character overlap prevents splitting important information across chunks.

2.3 Test Loading and Chunking

# Test the loader

docs = load_documents("documents/raw")

chunks = chunk_documents(docs)

# Inspect first chunk

print(f"\nFirst chunk:")

print(chunks[0].page_content)

print(f"\nMetadata:")

print(chunks[0].metadata)

Expected Output

Loaded 1 documents
Created 3 chunks

First chunk:
Company Remote Work Policy
1. Eligibility: All full-time employees...

Create Vector Store with Embeddings

30 min

3.1 Build Vector Store

# src/vector_store.py

from langchain.embeddings import OpenAIEmbeddings

from langchain.vectorstores import Chroma

import os

def create_vector_store(chunks):

"""Create vector store from chunks"""

embeddings = OpenAIEmbeddings(

model=os.getenv("EMBEDDING_MODEL")

)

# Create vector store with persistence

vector_store = Chroma.from_documents(

documents=chunks,

embedding=embeddings,

persist_directory=os.getenv("CHROMA_PERSIST_DIRECTORY")

)

print(f"Created vector store with {len(chunks)} chunks")

return vector_store

3.2 Test Semantic Search

# Create vector store

vector_store = create_vector_store(chunks)

# Test similarity search

query = "What are the core hours for remote work?"

results = vector_store.similarity_search(query, k=2)

print(f"\nQuery: {query}")

print(f"\nTop {len(results)} results:")

for i, doc in enumerate(results, 1):

print(f"\n{i}. {doc.page_content[:200]}...")

Expected Behavior

The search should return chunks mentioning "10am-3pm EST" core hours, demonstrating semantic understanding beyond keyword matching.

3.3 Search with Similarity Scores

# Search with scores

results_with_scores = vector_store.similarity_search_with_score(query, k=2)

for doc, score in results_with_scores:

print(f"\nSimilarity: {score:.4f}")

print(f"Content: {doc.page_content[:150]}...")

Similarity Scores

Lower scores = higher similarity (distance metric). Typical good matches have scores < 0.5. Use scores to filter low-quality retrievals.

Test Retrieval and Generation

25 min

4.1 Create RAG Chain

# src/rag_chain.py

from langchain.chat_models import ChatOpenAI

from langchain.chains import RetrievalQA

from langchain.prompts import PromptTemplate

# Custom prompt template

template = """Use the following context to answer the question.

If you cannot answer from the context, say "I don't have that information."

Context: {context}

Question: {question}

Answer:"""

prompt = PromptTemplate(

template=template,

input_variables=["context", "question"]

)

# Create QA chain

llm = ChatOpenAI(model_name=os.getenv("LLM_MODEL"), temperature=0)

qa_chain = RetrievalQA.from_chain_type(

llm=llm,

chain_type="stuff",

retriever=vector_store.as_retriever(search_kwargs={"k": 3}),

chain_type_kwargs={"prompt": prompt}

)

4.2 Ask Questions

# Test questions

questions = [

"What equipment does the company provide for remote work?",

"How long must employees work before being eligible for remote work?",

"What are the core hours for remote employees?"

]

for question in questions:

result = qa_chain({"query": question})

print(f"\nQ: {question}")

print(f"A: {result['result']}")

Expected Answers

1. "Laptop, monitor, and $500 home office stipend"
2. "90 days of employment"
3. "10am-3pm EST"

4.3 Evaluate Answer Quality

# Test with out-of-context question

question = "What is the company's vacation policy?"

result = qa_chain({"query": question})

print(f"\nQ: {question}")

print(f"A: {result['result']}")

Quality Check

The system should respond "I don't have that information" for questions not covered in the documents. This prevents hallucination!

Troubleshooting

• Poor retrieval: Adjust chunk_size and chunk_overlap parameters
• Hallucinations: Strengthen prompt instructions to stick to context
• API errors: Check OpenAI API key and rate limits

See the RAG Troubleshooting Guide for more solutions.

Walkthrough Complete!

You've built a production RAG system with document parsing, embeddings, vector search, and LLM-powered question answering. You're ready for Part 2!

Continue to Part 2: Advanced RAG Techniques View Troubleshooting Guide

What You've Learned:

Document loading and parsing

Smart chunking with overlap

Embedding generation with OpenAI

Vector store with ChromaDB

Semantic search and similarity scoring

RAG chain with custom prompts

Question answering with context

Hallucination prevention techniques