Tutorial 13: Perplexity-Style Research Assistant
Build a full-featured research assistant with in-text citations, source metadata, and follow-up suggestions.
Overview
This tutorial combines all RAG patterns into a polished research experience:
- In-text citations
[1],[2] - Source cards with metadata
- Multi-source synthesis
- Follow-up question suggestions
Architecture
Source Data Model
python
@dataclass
class Source:
index: int # Citation number [1], [2], etc.
title: str # Source title
url: str # URL or file path
content: str # Relevant excerpt
source_type: str # "local" or "web"
page: Optional[int] # Page number if applicable
relevance_score: float # Similarity scoreState Definition
python
class ResearchState(TypedDict):
question: str # User's question
sources: List[Source] # All gathered sources
answer: str # Answer with citations
follow_up_questions: List[str] # Suggested questionsCitation Prompt
python
RESEARCH_PROMPT = """You are a research assistant.
IMPORTANT: Cite sources using [1], [2], etc. inline.
Every factual claim should have a citation.
Sources:
{sources}
Question: {question}
Answer with inline citations:"""Web Search Setup
Tavily API (Recommended)
- Sign up at https://tavily.com
- Get your free API key
- Add to
.env:TAVILY_API_KEY=tvly-your-key-here
Usage in Code
python
from tavily import TavilyClient
import os
client = TavilyClient(api_key=os.environ["TAVILY_API_KEY"])
results = client.search(query, max_results=3)Node Functions
Gather Sources
python
def gather_sources(state: ResearchState) -> dict:
"""Gather sources from multiple locations."""
sources = []
# Local documents
local_docs = retriever.retrieve_documents(state["question"], k=3)
for i, doc in enumerate(local_docs, 1):
sources.append(Source(
index=i,
title=doc.metadata.get("filename", "Unknown"),
url=doc.metadata.get("source", ""),
content=doc.page_content,
source_type="local",
page=doc.metadata.get("page"),
relevance_score=doc.metadata.get("score", 0.0)
))
# Web search
web_results = web_search(state["question"], max_results=3)
for j, result in enumerate(web_results, len(sources) + 1):
sources.append(Source(
index=j,
title=result["title"],
url=result["url"],
content=result["content"],
source_type="web",
page=None,
relevance_score=result.get("score", 0.0)
))
return {"sources": sources}Generate Answer with Citations
python
def generate_answer(state: ResearchState) -> dict:
"""Generate answer with inline citations."""
# Format sources for prompt
sources_text = "\n\n".join([
f"[{s.index}] {s.title}\n{s.content}"
for s in state["sources"]
])
# Generate answer
prompt = RESEARCH_PROMPT.format(
sources=sources_text,
question=state["question"]
)
response = llm.invoke(prompt)
return {"answer": response.content}Generate Follow-up Questions
python
FOLLOWUP_PROMPT = """Based on this question and answer, suggest 3 related follow-up questions.
Original Question: {question}
Answer: {answer}
Follow-up questions (one per line):"""
def generate_followups(state: ResearchState) -> dict:
"""Generate follow-up question suggestions."""
prompt = FOLLOWUP_PROMPT.format(
question=state["question"],
answer=state["answer"]
)
response = llm.invoke(prompt)
# Parse questions (one per line)
questions = [
q.strip().lstrip("123456789.-) ")
for q in response.content.split("\n")
if q.strip()
][:3]
return {"follow_up_questions": questions}Graph Construction
python
from langgraph.graph import StateGraph, START, END
graph = StateGraph(ResearchState)
# Nodes
graph.add_node("gather_sources", gather_sources)
graph.add_node("generate_answer", generate_answer)
graph.add_node("generate_followups", generate_followups)
# Edges
graph.add_edge(START, "gather_sources")
graph.add_edge("gather_sources", "generate_answer")
graph.add_edge("generate_answer", "generate_followups")
graph.add_edge("generate_followups", END)
research_assistant = graph.compile()Output Formatting
python
def format_response(result: dict) -> str:
output = []
# Answer with citations
output.append(result["answer"])
# Sources section
output.append("\n" + "─" * 50)
output.append("Sources:")
for src in result["sources"]:
icon = "🌐" if src.source_type == "web" else "📄"
score = f"[{src.relevance_score*100:.0f}%]"
if src.page:
output.append(f"[{src.index}] {icon} {src.title} (page {src.page}) {score}")
else:
output.append(f"[{src.index}] {icon} {src.title} {score}")
if src.url:
output.append(f" {src.url}")
# Follow-ups
output.append("\n" + "─" * 50)
output.append("Related Questions:")
for q in result["follow_up_questions"]:
output.append(f"• {q}")
return "\n".join(output)Usage
python
result = research_assistant.invoke({
"question": "What is Self-RAG and how does it differ from CRAG?"
})
print(format_response(result))Example Output
Self-RAG is a framework that enhances LLMs with self-reflection [1].
It grades retrieved documents for relevance and checks answers for
hallucinations [1][2]. CRAG differs by adding web search as a
fallback mechanism [3].
──────────────────────────────────────────────────
Sources:
[1] 📄 self_rag_paper.pdf (page 3) [92%]
/path/to/self_rag_paper.pdf
[2] 📄 rag_survey.pdf (page 12) [87%]
/path/to/rag_survey.pdf
[3] 🌐 "Corrective RAG Explained" [85%]
https://example.com/crag-explained
──────────────────────────────────────────────────
Related Questions:
• What are the performance benchmarks for Self-RAG?
• How does CRAG handle web search failures?
• Can Self-RAG and CRAG be combined?Advanced Features
Source Ranking
python
def rank_sources(sources: List[Source]) -> List[Source]:
"""Rank sources by relevance and recency."""
return sorted(
sources,
key=lambda s: (s.relevance_score, s.source_type == "web"),
reverse=True
)Source Deduplication
python
def deduplicate_sources(sources: List[Source]) -> List[Source]:
"""Remove duplicate or very similar sources."""
unique_sources = []
seen_content = set()
for source in sources:
# Simple dedup based on first 100 chars
content_hash = hash(source.content[:100])
if content_hash not in seen_content:
unique_sources.append(source)
seen_content.add(content_hash)
return unique_sourcesConfiguration
bash
# .env
TAVILY_API_KEY=tvly-your-key-here
RAG_COLLECTION_NAME=documents
EMBEDDING_MODEL_NAME=all-mpnet-base-v2
RESEARCH_MAX_LOCAL_SOURCES=3
RESEARCH_MAX_WEB_SOURCES=3Best Practices
- Source diversity: Balance local and web sources
- Citation verification: Ensure all claims are cited
- Source quality: Filter low-relevance sources
- User experience: Format output for readability
- Rate limiting: Respect API limits for web search
Congratulations!
You've completed the RAG Patterns tutorial series. You can now build:
- Basic RAG systems
- Self-reflective RAG with quality grading
- Corrective RAG with web fallback
- Adaptive RAG with query routing
- Agentic RAG with agent control
- Full research assistants with citations
All running locally with Ollama!
Quiz
Test your understanding of the Perplexity-Style Research Assistant:
Knowledge Check
What are the three main components of the research assistant's output?
Knowledge Check
How are citations formatted in the research assistant's answers?
Knowledge Check
What fields does the Source data model include?
Knowledge Check
What is the purpose of the follow-up questions feature?
Knowledge Check T/F
True or False: The research assistant only uses local documents, not web search.