# Retrieval-Augmented Generation (RAG)
Retrieval-Augmented Generation (RAG) is a technique that combines information retrieval with language model generation to produce more accurate and grounded responses.
Chunk #1
— ## How RAG Works
142 tokens
## How RAG Works
The RAG pipeline consists of three main stages:
1. **Indexing**: Documents are split into chunks, embedded into vectors, and stored in a vector database like Qdrant, Pinecone, or Weaviate.
2. **Retrieval**: When a user query arrives, it is embedded using the same model, and the most similar document chunks are retrieved via approximate nearest neighbor (ANN) search.
3. **Generation**: The retrieved chunks are passed as context to a large language model (LLM) like GPT-4 or Claude, which generates a response grounded in the provided information.
Chunk #2
— ## Key Benefits
130 tokens
## Key Benefits
- **Reduced hallucination**: By grounding responses in retrieved documents, RAG significantly reduces the tendency of LLMs to generate incorrect information.
- **Up-to-date knowledge**: Unlike fine-tuning, RAG allows models to access the latest information without retraining.
- **Source attribution**: Each response can be traced back to specific source documents, enabling citation and verification.
- **Domain specificity**: Organizations can build RAG systems over their proprietary knowledge bases.
Chunk #3
— ## Chunking Strategies
148 tokens
## Chunking Strategies
Effective chunking is critical for RAG performance:
- **Fixed-size chunking**: Split text into chunks of N tokens with overlap. Simple but may break semantic boundaries.
- **Semantic chunking**: Split by paragraphs, sections, or sentences, respecting document structure.
- **Recursive chunking**: Start with large chunks, recursively split those exceeding the size limit.
- **Agentic chunking**: Use an LLM to determine optimal split points based on content.
The optimal chunk size depends on the embedding model and use case, typically ranging from 256 to 1024 tokens.
Chunk #4
— ## Vector Databases
123 tokens
## Vector Databases
Vector databases are purpose-built for storing and searching high-dimensional embeddings:
- **Qdrant**: Open-source, supports filtering and payload storage, written in Rust.
- **Pinecone**: Managed cloud service with simple API, scales automatically.
- **Weaviate**: Open-source with built-in vectorization modules and GraphQL API.
- **Milvus**: Open-source, designed for billion-scale vector search.
- **ChromaDB**: Lightweight, designed for AI application development.
Chunk #5
— ## Advanced RAG Techniques
6 tokens
## Advanced RAG Techniques
Chunk #6
— ### Hybrid Search
39 tokens
### Hybrid Search
Combining dense vector search with sparse keyword search (BM25) improves recall by capturing both semantic similarity and exact term matches.
Chunk #7
— ### Re-ranking
41 tokens
### Re-ranking
After initial retrieval, a cross-encoder model re-ranks the results for better precision. Models like Cohere Rerank or BGE Reranker are commonly used.
Chunk #8
— ### Query Expansion
31 tokens
### Query Expansion
Generating multiple reformulations of the user query helps retrieve a broader set of relevant documents.
Chunk #9
— ### Knowledge Graphs
42 tokens
### Knowledge Graphs
Integrating knowledge graphs with RAG adds structured relationships between entities, enabling multi-hop reasoning and better context understanding.
Delete Document
Are you sure you want to delete 918508c4217c4ddf8da2c12b45b38010?
All 10 chunks, vectors, and extracted entities will be removed.