Tutorial 22: Reflection Pattern

Overview

The Reflection Pattern is an iterative improvement technique where an LLM generates output, critiques it, and revises it repeatedly until the output meets quality standards. This pattern mirrors how humans improve their work through drafts and revisions.

Architecture

Core Components

1. State Schema

python

class ReflectionState(TypedDict):
    """State for reflection pattern."""
    messages: Annotated[list, add_messages]  # Conversation history
    task: str                                 # Original task
    draft: str                                # Current draft
    critique: str                             # Latest critique
    iteration: int                            # Current iteration
    max_iterations: int                       # Safety limit

Key fields:

task: The original request (immutable)
draft: Current version being refined
critique: Feedback from the critic
iteration: Tracks refinement cycles
max_iterations: Prevents infinite loops

2. Generator Node

The generator creates initial drafts and revisions:

python

def create_generator_node(llm: BaseChatModel):
    """Create generator that creates or revises drafts."""
    def generator(state: ReflectionState) -> dict:
        iteration = state.get("iteration", 0)

        if iteration == 0:
            # First draft
            prompt = f"Write a response to: {state['task']}"
        else:
            # Revision based on critique
            prompt = f"""Revise this draft:

Draft: {state['draft']}
Critique: {state['critique']}

Provide improved version addressing all feedback."""

        response = llm.invoke([HumanMessage(content=prompt)])

        return {
            "draft": response.content,
            "iteration": iteration + 1,
        }

    return generator

Generator behavior:

First iteration: Creates complete initial draft
Subsequent iterations: Revises based on critique feedback
Focus: Address specific concerns raised by critic

3. Critic Node

The critic evaluates drafts and provides feedback:

python

CRITIC_PROMPT = """You are a thoughtful editor.

Evaluate the draft on:
1. Completeness: Does it fully address the task?
2. Clarity: Is it clear and well-organized?
3. Accuracy: Is the information correct?
4. Tone: Is the tone appropriate?

If the draft is excellent, respond with exactly: "APPROVED"
Otherwise, provide specific, actionable feedback."""

def create_critic_node(llm: BaseChatModel):
    """Create critic that evaluates drafts."""
    def critic(state: ReflectionState) -> dict:
        prompt = f"""Review this draft:

Task: {state['task']}
Draft: {state['draft']}

Provide critique or "APPROVED" if excellent."""

        response = llm.invoke([
            SystemMessage(content=CRITIC_PROMPT),
            HumanMessage(content=prompt)
        ])

        return {"critique": response.content}

    return critic

Critic responsibilities:

Evaluate against multiple criteria
Provide specific, actionable feedback
Use explicit approval signal ("APPROVED")

4. Routing Logic

python

def should_continue(state: ReflectionState) -> str:
    """Determine whether to continue refining or finish."""
    iteration = state["iteration"]
    max_iterations = state["max_iterations"]
    critique = state.get("critique", "")

    # Stop if approved
    if "APPROVED" in critique.upper():
        return END

    # Stop if max iterations reached
    if iteration >= max_iterations:
        return END

    # Continue refining
    return "generator"

Termination conditions:

Critic approves (responds with "APPROVED")
Maximum iterations reached (safety limit)

Usage Examples

Basic Reflection

python

from langgraph_ollama_local.patterns.reflection import (
    create_reflection_graph,
    run_reflection_task,
)
from langchain_ollama import ChatOllama

llm = ChatOllama(model="llama3.1:8b")

# Create graph
graph = create_reflection_graph(
    llm=llm,
    max_iterations=3,
)

# Run reflection
result = run_reflection_task(
    graph=graph,
    task="Write a brief explanation of neural networks",
    max_iterations=3,
)

print(result["draft"])

Multi-Criteria Reflection

Use structured scoring across multiple dimensions:

python

# Create multi-criteria graph
graph = create_reflection_graph(
    llm=llm,
    max_iterations=4,
    use_multi_criteria=True,
    approval_threshold=8,  # Require 8/10 or higher
)

result = run_reflection_task(
    graph=graph,
    task="Write a technical blog post introduction",
    max_iterations=4,
)

# Critique includes structured scores
print(result["critique"])
# Output:
# Clarity: 8/10
# Accuracy: 9/10
# Completeness: 7/10
# Overall: Needs more concrete examples

Multi-criteria benefits:

Structured feedback across dimensions
Quantitative scoring (1-10)
Clearer improvement targets
Auto-approval when all scores meet threshold

Multi-Model Reflection

Use different models for generation and critique:

python

from langgraph_ollama_local.patterns.reflection import (
    create_multi_model_reflection_graph,
)

# Fast model for generation
generator_llm = ChatOllama(model="llama3.2:3b", temperature=0.8)

# Stronger model for critique
critic_llm = ChatOllama(model="llama3.1:70b", temperature=0.3)

graph = create_multi_model_reflection_graph(
    generator_llm=generator_llm,
    critic_llm=critic_llm,
    max_iterations=3,
)

result = run_reflection_task(
    graph=graph,
    task="Write a creative product description",
)

Multi-model benefits:

Generation: Use fast, creative model
Critique: Use stronger, more analytical model
Cost/Speed: Optimize for performance
Quality: Better critiques improve output

Advanced Patterns

Custom Critic Criteria

Define domain-specific evaluation criteria:

python

TECHNICAL_CRITIC_PROMPT = """Evaluate technical writing on:
1. Accuracy: Are technical details correct?
2. Completeness: Are all concepts explained?
3. Code Examples: Are examples correct and relevant?
4. Clarity: Is it accessible to the target audience?

If all criteria are met, respond with "APPROVED"."""

def create_technical_critic(llm):
    def critic(state):
        # Custom evaluation logic
        pass
    return critic

Track improvements across iterations:

python

def run_tracked_reflection(graph, task):
    """Run reflection with iteration tracking."""
    drafts = []
    critiques = []

    # Stream graph execution
    for event in graph.stream({
        "task": task,
        "draft": "",
        "critique": "",
        "iteration": 0,
        "max_iterations": 3,
    }):
        if "generator" in event:
            drafts.append(event["generator"]["draft"])
        if "critic" in event:
            critiques.append(event["critic"]["critique"])

    return {
        "drafts": drafts,
        "critiques": critiques,
        "final": drafts[-1],
    }

API Reference

`create_reflection_graph()`

Create a reflection graph with iterative improvement loop.

Parameters:

llm (BaseChatModel): Language model for generator and critic
max_iterations (int): Maximum refinement iterations (default: 3)
use_multi_criteria (bool): Use structured scoring (default: False)
approval_threshold (int): Score threshold for approval (default: 7)
checkpointer (Any): Optional checkpointer for persistence

Returns: Compiled StateGraph

`create_multi_model_reflection_graph()`

Create reflection graph with different models for generation and critique.

Parameters:

generator_llm (BaseChatModel): Model for generation
critic_llm (BaseChatModel): Model for critique
max_iterations (int): Maximum iterations (default: 3)
checkpointer (Any): Optional checkpointer

Returns: Compiled StateGraph

`run_reflection_task()`

Run a reflection task through the graph.

Parameters:

graph (CompiledStateGraph): Compiled reflection graph
task (str): Task description
max_iterations (int): Maximum iterations (default: 3)
thread_id (str): Thread ID for checkpointing (default: "default")

Returns: Final state dict with draft, critique, iteration

Node Creators

create_generator_node(llm): Create generator node
create_critic_node(llm): Create basic critic node
create_multi_criteria_critic_node(llm, approval_threshold): Create structured critic

Routing Functions

should_continue(state): Routing logic for reflection loop

When to Use Reflection

Ideal Use Cases

✅ Content Creation

Essays, reports, documentation
Professional emails and communications
Marketing copy and product descriptions
Technical writing and explanations

✅ Code Generation

Generate, review, refactor code
Documentation generation with quality checks
Test case generation with coverage review

✅ Analysis Tasks

Initial assessment followed by deeper analysis
Multi-stage research with refinement
Data analysis with iterative insights

When NOT to Use Reflection

❌ Simple Queries: One-shot questions don't need iteration ❌ Time-Critical Tasks: Reflection adds latency ❌ Token-Constrained: Multiple iterations consume more tokens ❌ Learning Across Attempts: Use Reflexion instead for episodic memory

Reflection vs Reflexion

Aspect	Reflection	Reflexion
Goal	Improve single output	Learn across attempts
Memory	Current draft only	Episodic memory of past attempts
Iterations	Refine one draft	Multiple fresh attempts
Tools	Not required	Often includes search/tools
Use Case	Writing, content	Problem-solving, research

Key difference: Reflection refines a single draft iteratively, while Reflexion makes multiple fresh attempts and learns from failures.

Performance Considerations

Iteration Limits

Recommended max iterations by model size:

Model Size	Max Iterations	Time/Iteration	Total Time
3B-8B	3	3-5s	9-15s
13B-34B	3-4	5-8s	15-32s
70B+	2-3	10-15s	20-45s

Token Usage

Token consumption per iteration:

Iteration 1: Task + Draft Generation + Critique
Iteration 2: Task + Old Draft + Critique + New Draft + New Critique
Iteration 3: Task + Old Draft + Old Critique + New Draft + New Critique

Optimization strategies:

Limit max iterations (3-5 typically sufficient)
Use concise prompts
Consider multi-model (fast for generation, strong for critique)
Cache common critique patterns

Best Practices

Set Reasonable Iteration Limits
- 3-5 iterations usually sufficient
- More iterations != better quality beyond a point
Use Clear Approval Signals
- Explicit "APPROVED" keyword in critique
- Avoid ambiguous approval language
Provide Specific Critique Prompts
- Define clear evaluation criteria
- Ask for actionable, specific feedback
- Avoid generic "make it better" feedback
Consider Multi-Model Approach
- Fast model for generation (creative)
- Strong model for critique (analytical)
- Balance quality and speed
Track Improvements
- Log drafts across iterations
- Monitor critique patterns
- Identify common failure modes
Use Multi-Criteria for Complex Tasks
- Structured scoring across dimensions
- Clearer feedback signals
- Better improvement tracking

Troubleshooting

Issue: Infinite Approval/Rejection Loop

Problem: Critic never approves or always approves

Solution:

python

# Make approval criteria explicit in prompt
CRITIC_PROMPT = """Evaluate on these EXACT criteria:
1. Addresses all parts of task: Yes/No
2. Clear and well-organized: Yes/No
3. No errors: Yes/No

Respond "APPROVED" ONLY if all three are Yes."""

Issue: Minimal Changes Between Iterations

Problem: Generator makes cosmetic changes, not substantive improvements

Solution:

python

GENERATOR_PROMPT = """When revising, focus on SUBSTANTIVE improvements:
- Address the specific concerns in the critique
- Don't just rephrase, improve the content
- Add missing information or remove errors
- Improve structure if needed"""

Issue: High Token Consumption

Problem: Too many tokens used per iteration

Solution:

Reduce max iterations
Use concise prompts
Consider multi-model approach
Summarize critique feedback

Examples

Example 1: Essay Writing

python

graph = create_reflection_graph(llm, max_iterations=3)

result = run_reflection_task(
    graph=graph,
    task="Write a 300-word essay on the importance of AI safety research"
)

print(f"Final essay ({result['iteration']} iterations):")
print(result['draft'])

Example 2: Professional Email

python

result = run_reflection_task(
    graph=graph,
    task="""Write a professional email declining a meeting due to conflict.
    Include: acknowledgment, brief reason, alternative suggestion."""
)

Example 3: Technical Documentation

python

graph = create_reflection_graph(
    llm=llm,
    max_iterations=4,
    use_multi_criteria=True,
    approval_threshold=8,
)

result = run_reflection_task(
    graph=graph,
    task="Write API documentation for a REST endpoint with examples"
)

Tutorial 23: Reflexion - Learning across multiple attempts
Tutorial 21: Plan-and-Execute - Break down tasks before execution
Tutorial 20: Multi-Agent Patterns - Multiple agents with specialization

Summary

The Reflection Pattern enables iterative improvement of LLM outputs through:

Generate → Critique → Revise loop
Explicit approval signals
Configurable iteration limits
Multi-criteria scoring options
Multi-model flexibility

Use Reflection for content creation tasks where quality improvement through iteration adds value without requiring episodic memory or learning across attempts.

Quiz

Test your understanding of the Reflection pattern:

Knowledge Check

What are the two main termination conditions for the Reflection pattern?

ATimeout or user cancellation

BCritic explicitly approves (APPROVED) or maximum iterations reached

CPerfect score achieved or no more changes possible

DContext limit exceeded or memory exhausted

Knowledge Check

What is the key architectural difference between Reflection and Reflexion patterns?

AReflection uses external tools while Reflexion does not

BReflection is always faster than Reflexion

CReflection refines a single draft iteratively, while Reflexion makes multiple fresh attempts with episodic memory of all past attempts

DThey are the same pattern with different naming conventions

Knowledge Check

Why would you use a multi-model approach in Reflection?

ATo reduce costs only without any quality benefits

BTo use a fast, creative model for generation and a stronger, analytical model for critique

CBecause single models are incapable of performing reflection

DTo intentionally slow down the process for more thorough results

Knowledge Check

What should the Critic prompt include to ensure effective evaluation?

AOnly a generic request to make the draft better

BMultiple specific evaluation criteria (completeness, clarity, accuracy, tone) and an explicit approval signal keyword

CA numerical score without any explanation

DJust the original task without the current draft

Knowledge Check

When should you NOT use the Reflection pattern?

AFor content creation tasks like essays and documentation

BFor code generation with quality reviews

CFor simple one-shot queries, time-critical tasks, or when you need to learn from multiple independent attempts

DFor professional email writing

Knowledge Check Fill In

What is the recommended maximum number of iterations for the Reflection pattern?

← Tutorial 21: Plan-and-Execute Tutorial 23: Reflexion →

Tutorial 22: Reflection Pattern ​

Overview ​

Architecture ​

Core Components ​

1. State Schema ​

2. Generator Node ​

3. Critic Node ​

4. Routing Logic ​

Usage Examples ​

Basic Reflection ​

Multi-Criteria Reflection ​

Multi-Model Reflection ​

Advanced Patterns ​

Custom Critic Criteria ​

Iterative Refinement Tracking ​

API Reference ​

create_reflection_graph() ​

create_multi_model_reflection_graph() ​

run_reflection_task() ​

Node Creators ​

Routing Functions ​

When to Use Reflection ​

Ideal Use Cases ​

When NOT to Use Reflection ​

Reflection vs Reflexion ​

Performance Considerations ​

Iteration Limits ​

Token Usage ​

Best Practices ​

Troubleshooting ​

Issue: Infinite Approval/Rejection Loop ​

Issue: Minimal Changes Between Iterations ​

Issue: High Token Consumption ​

Examples ​

Example 1: Essay Writing ​

Example 2: Professional Email ​

Example 3: Technical Documentation ​

Related Patterns ​

Further Reading ​

Summary ​

Quiz ​

Knowledge Check

Knowledge Check

Knowledge Check

Knowledge Check

Knowledge Check

Knowledge Check Fill In

Tutorial 22: Reflection Pattern

Overview

Architecture

Core Components

1. State Schema

2. Generator Node

3. Critic Node

4. Routing Logic

Usage Examples

Basic Reflection

Multi-Criteria Reflection

Multi-Model Reflection

Advanced Patterns

Custom Critic Criteria

Iterative Refinement Tracking

API Reference

`create_reflection_graph()`

`create_multi_model_reflection_graph()`

`run_reflection_task()`

Node Creators

Routing Functions

When to Use Reflection

Ideal Use Cases

When NOT to Use Reflection

Reflection vs Reflexion

Performance Considerations

Iteration Limits

Token Usage

Best Practices

Troubleshooting

Issue: Infinite Approval/Rejection Loop

Issue: Minimal Changes Between Iterations

Issue: High Token Consumption

Examples

Example 1: Essay Writing

Example 2: Professional Email

Example 3: Technical Documentation

Related Patterns

Further Reading

Summary

Quiz