Evaluation & Trade-offs
Understanding RAG parameters, failure modes, and tuning strategies is crucial for production deployment.
Key Trade-offs
Chunk Size
How many characters per document chunk
Low Value
200-500
High precision, low context
High Value
1000-2000
More context, may include irrelevant info
Recommendation: Start with 1000 chars, adjust based on document type
Impacts
- Retrieval quality
- Context richness
- Token usage
Chunk Overlap
Characters shared between adjacent chunks
Low Value
0-50
May lose context at boundaries
High Value
200-400
Better boundary handling, more redundancy
Recommendation: Use 10-20% of chunk size for overlap
Impacts
- Context continuity
- Storage size
- Processing time
Top-K Retrieval
Number of chunks to retrieve per query
Low Value
1-2
Focused but may miss relevant info
High Value
6-10
More comprehensive but noisier
Recommendation: Default to 4, adjust based on document complexity
Impacts
- Answer completeness
- Latency
- Cost
RAG vs Non-RAG: Real Examples
Knowledge freshness question
Question: "What is the latest feature released in Q4 2024?"
"I don't have information about events after my training cutoff in early 2024. Please check the official documentation or announcements."
- No access to recent data
- Cannot help with current info
"According to the Q4 2024 release notes, the latest feature is the AI-powered search enhancement, launched on December 15th. This includes semantic search and auto-complete functionality. (Source: release-notes-q4.pdf)"
- Access to current data
- Cites specific sources
- Provides accurate details
Company-specific information
Question: "What is our company's refund policy?"
"Typically, refund policies vary by company. Most companies offer 30-day return windows. Please check your specific company's policy documentation for accurate information."
- Generic answer
- No specific details
- User must look elsewhere
"According to the customer policy document, refunds are available within 14 days of purchase for digital products and 30 days for physical items. Refund requests must be submitted through the support portal. (Source: customer-policies.pdf, Chunk 12)"
- Company-specific answer
- Accurate policy details
- Actionable information
Common Failure Cases
Insufficient Context
When retrieved chunks don't contain the answer
Cause: Document doesn't contain relevant information, or chunks are too small/large
Solution: Adjust chunk size, improve document coverage, or acknowledge uncertainty
Q: What's the CEO's phone number? → If not in documents, RAG can't help
Wrong Document Retrieved
Similar embeddings but irrelevant content
Cause: Embedding model captures wrong semantic aspects, or documents have similar vocabulary
Solution: Use reranking, hybrid search, or better embedding models
Q: Python snake care → Retrieves Python programming docs instead
Hallucination Despite Context
LLM generates incorrect info even with good context
Cause: LLM may still rely on training data or misinterpret context
Solution: Lower temperature, use structured prompts, add verification steps
Context mentions 'up to 30 days' but LLM says 'exactly 30 days'
Context Window Overflow
Too much retrieved context exceeds model limits
Cause: High top-K or large chunks with long queries
Solution: Limit context length, use summarization, or chunked generation
10 chunks × 1000 chars approaches context limits with long prompts
Ready to see how this was built?
Explore the design decisions and technical choices behind RagXGen.