Tutorial 10: CRAG (Corrective RAG)

CRAG extends RAG with corrective capabilities - when local retrieval fails, it falls back to web search to find answers.

Overview

CRAG (Corrective RAG) adds a corrective mechanism:

Retrieve from local documents
Grade document relevance
If insufficient, search the web
Combine knowledge sources
Generate answer

Architecture

When to Use CRAG

Document corpus may not cover all topics
Users ask about recent events
Need to supplement local with external knowledge
Building research assistants

State Definition

python

class CRAGState(TypedDict):
    question: str                      # User's question
    documents: List[Document]          # Local documents
    web_results: List[Document]        # Web search results
    combined_documents: List[Document] # Merged for generation
    knowledge_source: str              # "local", "web", "combined"
    generation: str                    # Final answer

Web Search Integration

Using Tavily (Recommended)

python

from tavily import TavilyClient
import os

def web_search(query: str, max_results: int = 3) -> List[Document]:
    client = TavilyClient(api_key=os.environ["TAVILY_API_KEY"])
    response = client.search(query, max_results=max_results)

    return [
        Document(
            page_content=r["content"],
            metadata={"source": r["url"], "title": r["title"], "type": "web"}
        )
        for r in response["results"]
    ]

Using DuckDuckGo (Free)

python

from duckduckgo_search import DDGS

def web_search(query: str, max_results: int = 3) -> List[Document]:
    with DDGS() as ddgs:
        results = list(ddgs.text(query, max_results=max_results))
        return [
            Document(
                page_content=r["body"],
                metadata={"source": r["href"], "title": r["title"], "type": "web"}
            )
            for r in results
        ]

Node Functions

Grade and Decide

python

def grade_documents(state: CRAGState) -> dict:
    """Grade documents and decide knowledge source."""
    relevant, _ = doc_grader.grade_documents(
        state["documents"],
        state["question"]
    )

    if len(relevant) >= 2:
        return {"combined_documents": relevant, "knowledge_source": "local"}
    elif len(relevant) == 1:
        return {"combined_documents": relevant, "knowledge_source": "combined"}
    else:
        return {"combined_documents": [], "knowledge_source": "web"}

Web Search Node

python

def search_web(state: CRAGState) -> dict:
    """Search the web for additional information."""
    web_docs = web_search(state["question"], max_results=3)

    # Combine with any existing relevant docs
    combined = state["combined_documents"] + web_docs

    return {
        "web_results": web_docs,
        "combined_documents": combined,
    }

Routing Logic

python

def route_after_grading(state: CRAGState) -> str:
    """Route based on knowledge source decision."""
    if state["knowledge_source"] == "local":
        return "generate"
    else:
        return "web_search"

Graph Construction

python

from langgraph.graph import StateGraph, START, END

graph = StateGraph(CRAGState)

# Nodes
graph.add_node("retrieve_local", retrieve_local)
graph.add_node("grade_documents", grade_documents)
graph.add_node("web_search", search_web)
graph.add_node("generate", generate)

# Edges
graph.add_edge(START, "retrieve_local")
graph.add_edge("retrieve_local", "grade_documents")
graph.add_conditional_edges(
    "grade_documents",
    route_after_grading,
    {"generate": "generate", "web_search": "web_search"}
)
graph.add_edge("web_search", "generate")
graph.add_edge("generate", END)

crag = graph.compile()

Source Attribution

Track where answers come from:

python

def generate(state: CRAGState) -> dict:
    """Generate with source attribution."""
    context_parts = []
    for i, doc in enumerate(state["combined_documents"], 1):
        source_type = doc.metadata.get("type", "local")
        source_name = doc.metadata.get("filename", doc.metadata.get("title", "Unknown"))
        context_parts.append(f"[Source {i} ({source_type}): {source_name}]\n{doc.page_content}")

    context = "\n\n".join(context_parts)
    # Generate with context...

Configuration

bash

# Environment variables
TAVILY_API_KEY=your-key-here
CRAG_MIN_RELEVANT_DOCS=2
CRAG_WEB_RESULTS_COUNT=3

Best Practices

Rate limiting: Respect web search API limits
Caching: Cache web results for repeated queries
Source diversity: Balance local and web sources
Freshness: Prefer web for time-sensitive queries
Attribution: Always cite web sources

Comparison

Aspect	Self-RAG	CRAG
Primary focus	Quality	Coverage
Failure handling	Retry	Fallback
External dependencies	None	Web search API
Best for	Accuracy	Comprehensiveness

Quiz

Test your understanding of CRAG (Corrective RAG):

Knowledge Check

What does CRAG do when local document retrieval is insufficient?

AReturns an error message

BFalls back to web search

CRetries local retrieval with different parameters

DUses a default pre-written answer

Knowledge Check

Which web search API is recommended in the tutorial for CRAG?

AGoogle Custom Search API

BBing Web Search API

CTavily

DSerpAPI

Knowledge Check

What is the primary focus difference between Self-RAG and CRAG?

ASelf-RAG focuses on speed, CRAG on accuracy

BSelf-RAG focuses on quality, CRAG on coverage

CSelf-RAG uses local docs, CRAG only uses web

DSelf-RAG is for questions, CRAG is for summarization

Knowledge Check

What are the three possible values for the 'knowledge_source' field in CRAGState?

Aprimary, secondary, fallback

Blocal, web, combined

Cfast, medium, slow

Dcached, fresh, mixed

Knowledge Check T/F

True or False: CRAG requires an external web search API to function.

TTrue

FFalse

Tutorial 10: CRAG (Corrective RAG) ​

Overview ​

Architecture ​

When to Use CRAG ​

State Definition ​

Web Search Integration ​

Using Tavily (Recommended) ​

Using DuckDuckGo (Free) ​

Node Functions ​

Grade and Decide ​

Web Search Node ​

Routing Logic ​

Graph Construction ​

Source Attribution ​

Configuration ​

Best Practices ​

Comparison ​

Quiz ​

Knowledge Check

Knowledge Check

Knowledge Check

Knowledge Check

Knowledge Check T/F

Tutorial 10: CRAG (Corrective RAG)

Overview

Architecture

When to Use CRAG

State Definition

Web Search Integration

Using Tavily (Recommended)

Using DuckDuckGo (Free)

Node Functions

Grade and Decide

Web Search Node

Routing Logic

Graph Construction

Source Attribution

Configuration

Best Practices

Comparison

Quiz