Tutorial 10: CRAG (Corrective RAG)
CRAG extends RAG with corrective capabilities - when local retrieval fails, it falls back to web search to find answers.
Overview
CRAG (Corrective RAG) adds a corrective mechanism:
- Retrieve from local documents
- Grade document relevance
- If insufficient, search the web
- Combine knowledge sources
- Generate answer
Architecture
When to Use CRAG
- Document corpus may not cover all topics
- Users ask about recent events
- Need to supplement local with external knowledge
- Building research assistants
State Definition
python
class CRAGState(TypedDict):
question: str # User's question
documents: List[Document] # Local documents
web_results: List[Document] # Web search results
combined_documents: List[Document] # Merged for generation
knowledge_source: str # "local", "web", "combined"
generation: str # Final answerWeb Search Integration
Using Tavily (Recommended)
python
from tavily import TavilyClient
import os
def web_search(query: str, max_results: int = 3) -> List[Document]:
client = TavilyClient(api_key=os.environ["TAVILY_API_KEY"])
response = client.search(query, max_results=max_results)
return [
Document(
page_content=r["content"],
metadata={"source": r["url"], "title": r["title"], "type": "web"}
)
for r in response["results"]
]Using DuckDuckGo (Free)
python
from duckduckgo_search import DDGS
def web_search(query: str, max_results: int = 3) -> List[Document]:
with DDGS() as ddgs:
results = list(ddgs.text(query, max_results=max_results))
return [
Document(
page_content=r["body"],
metadata={"source": r["href"], "title": r["title"], "type": "web"}
)
for r in results
]Node Functions
Grade and Decide
python
def grade_documents(state: CRAGState) -> dict:
"""Grade documents and decide knowledge source."""
relevant, _ = doc_grader.grade_documents(
state["documents"],
state["question"]
)
if len(relevant) >= 2:
return {"combined_documents": relevant, "knowledge_source": "local"}
elif len(relevant) == 1:
return {"combined_documents": relevant, "knowledge_source": "combined"}
else:
return {"combined_documents": [], "knowledge_source": "web"}Web Search Node
python
def search_web(state: CRAGState) -> dict:
"""Search the web for additional information."""
web_docs = web_search(state["question"], max_results=3)
# Combine with any existing relevant docs
combined = state["combined_documents"] + web_docs
return {
"web_results": web_docs,
"combined_documents": combined,
}Routing Logic
python
def route_after_grading(state: CRAGState) -> str:
"""Route based on knowledge source decision."""
if state["knowledge_source"] == "local":
return "generate"
else:
return "web_search"Graph Construction
python
from langgraph.graph import StateGraph, START, END
graph = StateGraph(CRAGState)
# Nodes
graph.add_node("retrieve_local", retrieve_local)
graph.add_node("grade_documents", grade_documents)
graph.add_node("web_search", search_web)
graph.add_node("generate", generate)
# Edges
graph.add_edge(START, "retrieve_local")
graph.add_edge("retrieve_local", "grade_documents")
graph.add_conditional_edges(
"grade_documents",
route_after_grading,
{"generate": "generate", "web_search": "web_search"}
)
graph.add_edge("web_search", "generate")
graph.add_edge("generate", END)
crag = graph.compile()Source Attribution
Track where answers come from:
python
def generate(state: CRAGState) -> dict:
"""Generate with source attribution."""
context_parts = []
for i, doc in enumerate(state["combined_documents"], 1):
source_type = doc.metadata.get("type", "local")
source_name = doc.metadata.get("filename", doc.metadata.get("title", "Unknown"))
context_parts.append(f"[Source {i} ({source_type}): {source_name}]\n{doc.page_content}")
context = "\n\n".join(context_parts)
# Generate with context...Configuration
bash
# Environment variables
TAVILY_API_KEY=your-key-here
CRAG_MIN_RELEVANT_DOCS=2
CRAG_WEB_RESULTS_COUNT=3Best Practices
- Rate limiting: Respect web search API limits
- Caching: Cache web results for repeated queries
- Source diversity: Balance local and web sources
- Freshness: Prefer web for time-sensitive queries
- Attribution: Always cite web sources
Comparison
| Aspect | Self-RAG | CRAG |
|---|---|---|
| Primary focus | Quality | Coverage |
| Failure handling | Retry | Fallback |
| External dependencies | None | Web search API |
| Best for | Accuracy | Comprehensiveness |
Quiz
Test your understanding of CRAG (Corrective RAG):
Knowledge Check
What does CRAG do when local document retrieval is insufficient?
Knowledge Check
Which web search API is recommended in the tutorial for CRAG?
Knowledge Check
What is the primary focus difference between Self-RAG and CRAG?
Knowledge Check
What are the three possible values for the 'knowledge_source' field in CRAGState?
Knowledge Check T/F
True or False: CRAG requires an external web search API to function.