Skip to content

Tutorial 12: Agentic RAG

Agentic RAG gives an LLM agent control over the retrieval process - deciding when, what, and how to retrieve.

Overview

Previous patterns use fixed retrieval flows. Agentic RAG:

  • Agent decides when to retrieve
  • Multiple retrieval rounds possible
  • Query decomposition for complex questions
  • Iterative refinement

Architecture

Retrieval as a Tool

python
from langchain_core.tools import tool

@tool
def search_documents(query: str) -> str:
    """Search the document database for information.

    Args:
        query: The search query.

    Returns:
        Retrieved document contents.
    """
    docs = retriever.retrieve_documents(query, k=3)
    return "\n\n".join([doc.page_content for doc in docs])

tools = [search_documents]
llm_with_tools = llm.bind_tools(tools)

Agent System Prompt

python
SYSTEM_PROMPT = """You are a research assistant with document search.

Strategy:
1. Break complex questions into sub-questions
2. Search for each aspect separately
3. Synthesize information from multiple searches
4. Provide comprehensive answers with sources

You can search multiple times if needed."""

State Definition

python
from langchain_core.messages import BaseMessage

class AgenticRAGState(TypedDict):
    messages: List[BaseMessage]  # Conversation history

Node Functions

Agent Node

python
def agent(state: AgenticRAGState) -> dict:
    """Agent decides next action."""
    messages = [SystemMessage(content=SYSTEM_PROMPT)] + state["messages"]
    response = llm_with_tools.invoke(messages)
    return {"messages": [response]}

Tool Execution Node

python
from langgraph.prebuilt import ToolNode

# Create tool execution node
tool_node = ToolNode(tools)

# Or implement custom execution
def execute_tools(state: AgenticRAGState) -> dict:
    """Execute tool calls from agent."""
    last_message = state["messages"][-1]
    tool_calls = last_message.tool_calls

    tool_messages = []
    for tool_call in tool_calls:
        tool_result = search_documents.invoke(tool_call["args"])
        tool_messages.append(
            ToolMessage(
                content=tool_result,
                tool_call_id=tool_call["id"]
            )
        )

    return {"messages": tool_messages}

ReAct Loop

python
def should_continue(state: AgenticRAGState) -> str:
    last_message = state["messages"][-1]

    if hasattr(last_message, "tool_calls") and last_message.tool_calls:
        return "tools"
    return "end"

graph.add_conditional_edges(
    "agent",
    should_continue,
    {"tools": "tools", "end": END}
)
graph.add_edge("tools", "agent")  # Loop back

Graph Construction

python
from langgraph.graph import StateGraph, START, END

graph = StateGraph(AgenticRAGState)

# Nodes
graph.add_node("agent", agent)
graph.add_node("tools", execute_tools)

# Edges
graph.add_edge(START, "agent")
graph.add_conditional_edges(
    "agent",
    should_continue,
    {"tools": "tools", "end": END}
)
graph.add_edge("tools", "agent")

agentic_rag = graph.compile()

Usage

python
from langchain_core.messages import HumanMessage

# Simple question - single retrieval
result = agentic_rag.invoke({
    "messages": [HumanMessage(content="What is Self-RAG?")]
})

# Complex question - multiple retrievals
result = agentic_rag.invoke({
    "messages": [HumanMessage(content="Compare Self-RAG and CRAG")]
})

print(result["messages"][-1].content)

Advanced: Multiple Tools

Provide multiple retrieval strategies:

python
@tool
def search_documents(query: str) -> str:
    """Search local documents."""
    docs = retriever.retrieve_documents(query, k=3)
    return format_docs(docs)

@tool
def search_web(query: str) -> str:
    """Search the web for current information."""
    results = web_search_api(query)
    return format_web_results(results)

@tool
def list_available_documents() -> str:
    """List all available documents in the database."""
    return retriever.list_documents()

tools = [search_documents, search_web, list_available_documents]

Query Decomposition Example

The agent can break down complex queries:

User: "Compare Self-RAG and CRAG, and explain which is better for current events."

Agent Reasoning:
1. Search for "Self-RAG" → Gets Self-RAG info
2. Search for "CRAG" → Gets CRAG info
3. Search for "current events RAG" → Gets info on temporal queries
4. Synthesizes comparison and recommendation

Benefits

AspectStandard RAGAgentic RAG
ControlFixed flowAgent decides
QueriesSingleMultiple
ComplexitySimpleComplex supported
AdaptabilityPredefinedDynamic

Best Practices

  1. Clear tool descriptions: Help agent choose right tool
  2. Max iterations: Prevent infinite loops
  3. Cost monitoring: Multiple LLM calls can add up
  4. Tool result formatting: Make results easy for agent to parse
  5. Error handling: Handle tool failures gracefully

Configuration

bash
# Environment variables
AGENTIC_RAG_MAX_ITERATIONS=10
AGENTIC_RAG_AGENT_MODEL=llama3.2:3b

Comparison with Previous Patterns

PatternRetrieval ControlBest For
Basic RAGNoneSimple Q&A
Self-RAGQuality checksAccuracy
CRAGFallback logicCoverage
Adaptive RAGQuery routingEfficiency
Agentic RAGFull agent controlComplex research

Quiz

Test your understanding of Agentic RAG:

Knowledge Check

What is the key difference between Agentic RAG and previous RAG patterns?

AAgentic RAG is faster than other patterns
BThe agent decides when, what, and how to retrieve
CAgentic RAG only uses web search
DAgentic RAG does not support multiple sources

Knowledge Check

What is the ReAct loop in Agentic RAG?

AA retrieval optimization technique
BA reasoning and acting cycle where the agent decides and executes tools
CA type of neural network architecture
DA caching mechanism for faster responses

Knowledge Check

Why might Agentic RAG perform multiple retrievals for a single question?

ATo increase response speed
BTo handle complex questions by breaking them into sub-questions
CTo reduce API costs
DMultiple retrievals are always required

Knowledge Check

How is retrieval implemented in Agentic RAG?

AAs a fixed pipeline step
BAs a tool that the agent can invoke
CAs a background process
DAs a separate microservice

Knowledge Check

What best practice helps prevent infinite loops in Agentic RAG?

AUsing smaller models
BSetting max iterations limit
CCaching all results
DDisabling web search