Question 1

What is chunking in RAG?

Accepted Answer

Chunking is the process of splitting source documents into smaller text segments before generating embeddings and indexing them in a vector database. Each chunk becomes a retrievable unit. Chunk size and strategy directly determine retrieval quality — chunks that are too large dilute relevance scores; chunks that are too small lose sentence context.

Question 2

What is the best chunk size for RAG?

Accepted Answer

The best chunk size depends on your documents and queries. A good default is 256–512 tokens with 10–20% overlap. For factual Q&A (precise answers to specific questions), smaller chunks (128–256 tokens) work better. For summarization and complex reasoning tasks, larger chunks (512–1024 tokens) preserve more context. Always benchmark on your specific corpus using hit rate and MRR metrics.

Question 3

What is the difference between fixed-size and semantic chunking?

Accepted Answer

Fixed-size chunking splits documents at a fixed token count (e.g. every 512 tokens) with optional overlap. It is fast and predictable but can split mid-sentence or mid-paragraph. Semantic chunking uses an embedding model to detect topic boundaries and splits at natural semantic breaks. Semantic chunking produces more coherent chunks but is slower and requires an embedding model at ingestion time.

Question 4

What is parent-document chunking in RAG?

Accepted Answer

Parent-document chunking indexes small chunks for precise retrieval but returns the full parent document (or larger surrounding context) to the LLM. This gives the precision of small-chunk retrieval with the context richness of large-chunk generation. It is particularly useful when the answer requires understanding surrounding paragraphs that would be split off in standard chunking.

Strategy	Speed	Quality	Use when
Fixed-size	Fast	Good	Prototyping, homogeneous docs
Recursive character	Fast	Better	Most production cases
Semantic	Slow	Best	High-stakes, heterogeneous docs
Parent-document	Medium	Best for generation	When context window matters
Markdown/HTML aware	Fast	Better	Structured web/wiki content

RAG Chunking Explained: What It Is and How It Works

The Chunk Size Trade-off

Chunking Strategies

Strategy Comparison

Parent-Document Chunking

Common Mistakes

FAQ

Related