About RagXGen
A deep dive into the design decisions, technical choices, and lessons learned from building this RAG implementation.
Project Overview
RagXGen is a portfolio project designed to demonstrate deep understanding of Retrieval-Augmented Generation. It's not just a demo—it's a complete, production-style implementation with clear architecture and honest discussion of trade-offs.
The goal is to show potential employers, collaborators, and AI leads that I understand RAG beyond tutorials—I can build, evaluate, and improve real systems.
Tech Stack
Frontend
Next.js 14
App Router for modern React patterns
TypeScript
Type safety and better DX
Tailwind CSS
Utility-first styling
Framer Motion
Smooth animations
Backend
FastAPI
High-performance async Python API
Pydantic
Data validation and settings
LangChain
RAG pipeline orchestration
FAISS
Vector similarity search
AI/ML
OpenAI GPT-4o-mini
Response generation
text-embedding-3-small
Document embeddings
RecursiveCharacterTextSplitter
Smart chunking
Design Decisions
Session-based Vector Stores
Why
Each user gets an isolated FAISS index
Trade-off
Uses more memory but prevents data leakage between users
Alternative
Shared store with tenant filtering (more complex)
In-memory Storage
Why
Simplicity for demo purposes
Trade-off
Data lost on server restart
Alternative
Persist FAISS indices to disk or use managed vector DB
RecursiveCharacterTextSplitter
Why
Handles various content types well
Trade-off
May not be optimal for highly structured documents
Alternative
Semantic chunking or document-specific splitters
Low Temperature (0.1)
Why
Prioritizes factual, consistent responses
Trade-off
Less creative, may be repetitive
Alternative
Higher temperature for creative tasks
Lessons Learned
Start Simple
Basic RAG works surprisingly well. Add complexity only when needed.
Chunk Size Matters
Too small = no context. Too large = noise. 1000 chars is a good starting point.
Prompt Engineering is Key
The system prompt significantly affects response quality and grounding.
Observability is Essential
Log retrieved chunks and scores to debug and improve the pipeline.
Production Improvements
If deploying this to production, here's what I'd add:
Retrieval Quality
- Add reranking with a cross-encoder model
- Implement hybrid search (semantic + keyword)
- Use HyDE (Hypothetical Document Embeddings)
- Add query expansion/reformulation
Scalability
- Use managed vector DB (Pinecone, Weaviate, Qdrant)
- Add Redis caching for frequent queries
- Implement connection pooling
- Deploy with auto-scaling
Reliability
- Add comprehensive error handling
- Implement retry logic with exponential backoff
- Add health checks and monitoring
- Set up structured logging
User Experience
- Streaming responses for faster perceived latency
- Add conversation history and memory
- Implement feedback collection
- Support more file formats (DOCX, HTML, etc.)
Let's Connect
Interested in discussing RAG, AI/ML engineering, or potential opportunities? I'd love to hear from you.