Evaluation & Trade-offs

Understanding RAG parameters, failure modes, and tuning strategies is crucial for production deployment.

Key Trade-offs

Chunk Size

How many characters per document chunk

Low Value

200-500

High precision, low context

High Value

1000-2000

More context, may include irrelevant info

Recommendation: Start with 1000 chars, adjust based on document type

Impacts

  • Retrieval quality
  • Context richness
  • Token usage

Chunk Overlap

Characters shared between adjacent chunks

Low Value

0-50

May lose context at boundaries

High Value

200-400

Better boundary handling, more redundancy

Recommendation: Use 10-20% of chunk size for overlap

Impacts

  • Context continuity
  • Storage size
  • Processing time

Top-K Retrieval

Number of chunks to retrieve per query

Low Value

1-2

Focused but may miss relevant info

High Value

6-10

More comprehensive but noisier

Recommendation: Default to 4, adjust based on document complexity

Impacts

  • Answer completeness
  • Latency
  • Cost

RAG vs Non-RAG: Real Examples

Scenario

Knowledge freshness question

Question: "What is the latest feature released in Q4 2024?"

Without RAG

"I don't have information about events after my training cutoff in early 2024. Please check the official documentation or announcements."

  • No access to recent data
  • Cannot help with current info
With RAG

"According to the Q4 2024 release notes, the latest feature is the AI-powered search enhancement, launched on December 15th. This includes semantic search and auto-complete functionality. (Source: release-notes-q4.pdf)"

  • Access to current data
  • Cites specific sources
  • Provides accurate details
Scenario

Company-specific information

Question: "What is our company's refund policy?"

Without RAG

"Typically, refund policies vary by company. Most companies offer 30-day return windows. Please check your specific company's policy documentation for accurate information."

  • Generic answer
  • No specific details
  • User must look elsewhere
With RAG

"According to the customer policy document, refunds are available within 14 days of purchase for digital products and 30 days for physical items. Refund requests must be submitted through the support portal. (Source: customer-policies.pdf, Chunk 12)"

  • Company-specific answer
  • Accurate policy details
  • Actionable information

Common Failure Cases

Insufficient Context

When retrieved chunks don't contain the answer

Cause: Document doesn't contain relevant information, or chunks are too small/large

Solution: Adjust chunk size, improve document coverage, or acknowledge uncertainty

Q: What's the CEO's phone number? → If not in documents, RAG can't help

Wrong Document Retrieved

Similar embeddings but irrelevant content

Cause: Embedding model captures wrong semantic aspects, or documents have similar vocabulary

Solution: Use reranking, hybrid search, or better embedding models

Q: Python snake care → Retrieves Python programming docs instead

Hallucination Despite Context

LLM generates incorrect info even with good context

Cause: LLM may still rely on training data or misinterpret context

Solution: Lower temperature, use structured prompts, add verification steps

Context mentions 'up to 30 days' but LLM says 'exactly 30 days'

Context Window Overflow

Too much retrieved context exceeds model limits

Cause: High top-K or large chunks with long queries

Solution: Limit context length, use summarization, or chunked generation

10 chunks × 1000 chars approaches context limits with long prompts

Ready to see how this was built?

Explore the design decisions and technical choices behind RagXGen.