Setup Guide
This guide will help you set up your development environment for building LangGraph agents with local LLMs via Ollama. Follow these steps to ensure everything works before starting the tutorials.
Prerequisites
Before you begin, make sure you have:
1. Python 3.12 or Higher
Check your Python version:
python --version
# or
python3 --versionIf you need to install or upgrade Python:
- macOS:
brew install python@3.12 - Ubuntu/Debian:
sudo apt install python3.12 python3.12-venv python3-pip - Windows: Download from python.org
Why Python 3.12+?
This project uses modern Python features like type hints and async syntax that require Python 3.12 or newer. Python 3.13 is also supported.
2. Ollama Setup Options
You have two options for running Ollama with your agents. Choose based on your needs:
Option 1: Ollama Local Serve (Recommended)
Recommended for Production & Learning
ollama-local-serve provides production-grade observability with real-time monitoring, request/response logging, GPU metrics, and LangChain integration. Perfect for development, debugging, and production deployments.
ollama-local-serve is a complete Ollama wrapper with built-in monitoring and observability:
Key Features:
- Real-time monitoring dashboard with GPU metrics
- Request/response logging and tracing
- Prometheus metrics for production monitoring
- Redis-backed response caching
- Docker and Kubernetes ready
- Built-in LangChain integration
- LAN/network deployment support
Quick Start with pip:
# Install with all features
pip install ollama-local-serve[all]
# Initialize configuration and start services
make init && make upQuick Start with Docker:
# Clone the repository
git clone https://github.com/AbhinaavRamesh/ollama-local-serve.git
cd ollama-local-serve
# Start all services
docker-compose up -dAccess Points:
- Dashboard:
http://localhost:3000- Real-time monitoring UI - API Server:
http://localhost:8000- FastAPI with logging - Ollama:
http://localhost:11434- Standard Ollama API
LangChain Integration Example:
from langchain_community.llms import Ollama
# Use the monitored endpoint instead of direct Ollama
llm = Ollama(
base_url="http://localhost:8000", # ollama-local-serve endpoint
model="llama3.2:3b"
)
# All requests are now logged and monitored
response = llm.invoke("Explain LangGraph")
# View request/response in dashboard at localhost:3000Architecture Overview:
See the ollama-local-serve GitHub repository for full documentation.
Option 2: Basic Ollama Setup
Quick Prototyping Only
Basic Ollama setup is great for quick experiments but lacks monitoring, logging, and production features. Consider using ollama-local-serve for serious development.
For minimal setup without monitoring features:
Install Ollama:
macOS and Linux:
curl -fsSL https://ollama.com/install.sh | shWindows: Download the installer from ollama.com
Verify Ollama is Running:
ollama --version
# Should output: ollama version 0.x.x
# Check if the service is running
curl http://localhost:11434/api/version
# Should return JSON with version infoIf Ollama isn't running, start it:
- macOS/Windows: Ollama starts automatically on login
- Linux:
systemctl start ollamaor runollama servein a terminal
Comparison: When to Use Each Option
| Feature | ollama-local-serve | Basic Ollama |
|---|---|---|
| Setup Time | 2-3 minutes | 1 minute |
| Monitoring Dashboard | ✅ Real-time UI | ❌ None |
| Request Logging | ✅ Full tracing | ❌ None |
| GPU Metrics | ✅ Yes | ❌ No |
| Response Caching | ✅ Redis-backed | ❌ No |
| Production Ready | ✅ Yes | ⚠️ Limited |
| Docker Support | ✅ Full support | ⚠️ Manual |
| Prometheus Metrics | ✅ Built-in | ❌ None |
| LAN Deployment | ✅ Easy | ⚠️ Manual |
| Debugging Tools | ✅ Extensive | ❌ None |
Use ollama-local-serve if:
- You want to monitor your agent's LLM calls
- You're building production applications
- You need request/response debugging
- You want GPU utilization metrics
- You're deploying to Docker/Kubernetes
- You need response caching for performance
Use basic Ollama if:
- You're doing quick one-off experiments
- You don't need any monitoring
- You want the absolute minimal setup
3. Git (for cloning the repository)
git --versionIf not installed:
- macOS:
brew install git - Ubuntu/Debian:
sudo apt install git - Windows: Download from git-scm.com
Installation
Option 1: Install from PyPI (Recommended)
The easiest way to get started is to install the package directly from PyPI:
# Create and activate a virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install with all features
pip install langgraph-ollama-local[all]
# Verify installation
langgraph-local checkInstall only what you need:
# Core patterns only (tutorials 1-7)
pip install langgraph-ollama-local
# Add RAG dependencies (tutorials 8-13)
pip install langgraph-ollama-local[rag]
# Add notebook support
pip install langgraph-ollama-local[notebooks]
# All features
pip install langgraph-ollama-local[all]Option 2: Install from Source (For Development)
If you want to modify the code or contribute:
git clone https://github.com/AbhinaavRamesh/langgraph-ollama-tutorial.git
cd langgraph-ollama-tutorial
# Create virtual environment
python -m venv venv
# Activate it
# On macOS/Linux:
source venv/bin/activate
# On Windows:
venv\Scripts\activate
# Install in development mode with all dependencies
pip install -e ".[all]"You should see (venv) in your terminal prompt when activated.
What Gets Installed
The [all] option installs:
- Core LangGraph and LangChain dependencies
- RAG dependencies (ChromaDB, FAISS, embeddings)
- Persistence backends (SQLite, Redis support)
- Development tools (pytest, linting)
- Jupyter notebook support
Verify Installation
langgraph-local checkThis command will:
- Verify Ollama connection
- Check if required models are available
- Display your current configuration
Expected output:
Checking Ollama connection...
✓ Connected to Ollama at http://127.0.0.1:11434
✓ Ollama version: 0.x.x
Checking for models...
✗ Recommended model 'llama3.2:3b' not found
Configuration:
OLLAMA_HOST: 127.0.0.1
OLLAMA_PORT: 11434
OLLAMA_MODEL: llama3.2:3bOllama Model Setup
Recommended Model: llama3.2:3b
For the best balance of speed and quality on consumer hardware, we recommend llama3.2:3b:
ollama pull llama3.2:3bThis model:
- Size: ~2GB download
- RAM: Requires ~4GB
- Speed: Fast responses on CPU
- Quality: Good for most tutorial tasks
Verify the Model
ollama listYou should see llama3.2:3b in the output:
NAME ID SIZE MODIFIED
llama3.2:3b a80c4f17acd5 2.0 GB 2 days agoAlternative Models
Depending on your hardware, you might prefer:
| Model | Size | RAM | Use Case |
|---|---|---|---|
llama3.2:1b | 1.3GB | 2GB | Fastest, limited capability |
llama3.2:3b | 2.0GB | 4GB | Recommended - best balance |
llama3.1:8b | 4.7GB | 8GB | Better quality, slower |
qwen2.5:7b | 4.7GB | 8GB | Good for code tasks |
llama3.1:70b | 40GB | 64GB | Highest quality, GPU recommended |
To use a different model:
# Pull the model
ollama pull llama3.1:8b
# Set it as default in .env (see Configuration below)
OLLAMA_MODEL=llama3.1:8bModel Performance
Smaller models (1b, 3b) work well for learning but may struggle with complex multi-agent scenarios. For production use, consider 7b or larger models.
Configuration
Create Environment File
cp .env.example .envEdit .env with your settings based on which Ollama option you chose:
For ollama-local-serve (Option 1):
# Ollama Configuration
OLLAMA_HOST=127.0.0.1 # localhost or LAN server IP
OLLAMA_PORT=8000 # ollama-local-serve API port (for monitoring)
OLLAMA_MODEL=llama3.2:3b # model to use by default
# Monitoring & Caching (ollama-local-serve)
MONITORING_ENABLED=true
REDIS_HOST=localhost
REDIS_PORT=6379
# Optional: Web Search (for CRAG tutorials 10, 13)
TAVILY_API_KEY= # Get free key at tavily.comFor basic Ollama (Option 2):
# Ollama Configuration
OLLAMA_HOST=127.0.0.1 # localhost or LAN server IP
OLLAMA_PORT=11434 # default Ollama port
OLLAMA_MODEL=llama3.2:3b # model to use by default
# Optional: Web Search (for CRAG tutorials 10, 13)
TAVILY_API_KEY= # Get free key at tavily.com
# Monitoring disabled for basic setup
MONITORING_ENABLED=falseConfiguration Options
| Variable | Default | Description |
|---|---|---|
OLLAMA_HOST | 127.0.0.1 | Ollama server IP address |
OLLAMA_PORT | 11434 or 8000 | Use 8000 for ollama-local-serve, 11434 for basic Ollama |
OLLAMA_MODEL | llama3.2:3b | Default model for all agents |
MONITORING_ENABLED | false | Enable monitoring features (ollama-local-serve only) |
REDIS_HOST | localhost | Redis server for caching (ollama-local-serve only) |
REDIS_PORT | 6379 | Redis port (ollama-local-serve only) |
TAVILY_API_KEY | (optional) | Web search API key for CRAG tutorials |
LOG_LEVEL | INFO | Logging level: DEBUG, INFO, WARNING |
Verify Configuration
langgraph-local configExpected output (with ollama-local-serve):
Current Configuration:
Ollama Host: http://127.0.0.1:8000
Default Model: llama3.2:3b
Web Search: Disabled (no Tavily key)
Monitoring: Enabled
Redis: localhost:6379Expected output (basic Ollama):
Current Configuration:
Ollama Host: http://127.0.0.1:11434
Default Model: llama3.2:3b
Web Search: Disabled (no Tavily key)
Monitoring: DisabledOptional: Web Search Setup
Tutorials 10 (CRAG) and 13 (Perplexity Clone) use web search for retrieving external information. This is optional but recommended for the full experience.
Get a Tavily API Key
- Sign up at tavily.com
- Get your free API key (1000 searches/month)
- Add to
.env:
TAVILY_API_KEY=tvly-your-key-hereTest Web Search
from tavily import TavilyClient
client = TavilyClient(api_key="tvly-your-key")
results = client.search("What is LangGraph?")
print(results)Alternative Search Tools
You can also use DuckDuckGo (no API key required) or Google Custom Search. See the CRAG tutorial for alternatives.
Optional: LAN Server Setup
Running Ollama on a GPU-equipped server and accessing it from laptops/workstations? This is a common setup for teams or when you want to centralize GPU resources.
Using ollama-local-serve (Recommended)
Best for LAN Deployments
ollama-local-serve is specifically designed for LAN deployments with built-in monitoring and multi-client support. See Option 1 above for features and benefits.
On your GPU server:
# Install ollama-local-serve
pip install ollama-local-serve[all]
# Configure for LAN access
# Edit .env to set OLLAMA_HOST=0.0.0.0
make init
# Start all services (Ollama, FastAPI, Dashboard, Redis)
make up
# Access dashboard at http://your-server-ip:3000On your development machine(s):
Update your project's .env:
OLLAMA_HOST=192.168.1.100 # your GPU server IP
OLLAMA_PORT=8000 # ollama-local-serve API port (for monitoring)
# OR use port 11434 for direct Ollama access (no monitoring)
MONITORING_ENABLED=true
REDIS_HOST=192.168.1.100 # for shared cachingBenefits of LAN Setup with ollama-local-serve:
- All team members can monitor LLM usage in real-time
- Shared Redis cache improves response times
- Centralized logging and metrics
- GPU utilization visible to everyone
- No configuration changes needed in your LangGraph code
Manual Ollama LAN Setup
If you prefer the basic Ollama setup without monitoring:
On the server:
# Configure Ollama to bind to all interfaces
export OLLAMA_HOST=0.0.0.0:11434
ollama serveOn your development machine:
# Test connection
curl http://your-server-ip:11434/api/version
# Update .env
OLLAMA_HOST=your-server-ip
OLLAMA_PORT=11434Security Note
Ollama and ollama-local-serve have no built-in authentication. Only expose on trusted networks or use a VPN/SSH tunnel for remote access. Consider using a reverse proxy with authentication for production deployments.
Verifying Installation
Quick Test
Create a simple test file test_setup.py:
from langgraph_ollama_local import LocalAgentConfig
from langchain_core.messages import HumanMessage
# Initialize
config = LocalAgentConfig()
llm = config.create_chat_client()
# Test basic call
response = llm.invoke([HumanMessage(content="Say hello!")])
print(response.content)Run it:
python test_setup.pyExpected output:
Hello! How can I help you today?Run Example Notebook
# Start Jupyter
jupyter lab
# Open examples/core_patterns/01_chatbot_basics.ipynbRun all cells. If everything works, you're ready to start the tutorials!
Troubleshooting
Ollama Connection Issues
Problem: Connection refused to localhost:11434
Solutions:
- Check if Ollama is running:
curl http://localhost:11434/api/version - Restart Ollama:
ollama serve(Linux) or restart the app (macOS/Windows) - Check firewall settings if using a remote server
Model Not Found
Problem: Model 'llama3.2:3b' not found
Solutions:
# List available models
ollama list
# Pull the model
ollama pull llama3.2:3b
# Or use a different model
OLLAMA_MODEL=llama3.1:8bImport Errors
Problem: ModuleNotFoundError: No module named 'langgraph'
Solutions:
# Make sure virtual environment is activated
source venv/bin/activate # or venv\Scripts\activate on Windows
# Reinstall
pip install -e ".[all]"
# Or install specific dependencies
pip install langgraph langchain-coreSlow Generation
Problem: Model takes too long to respond
Solutions:
- Use a smaller model:
ollama pull llama3.2:1b - Check CPU/memory usage:
toporhtop - Consider using a GPU server with LAN setup
- Reduce context length in configuration
Memory Errors
Problem: RuntimeError: Out of memory
Solutions:
- Switch to a smaller model (1b instead of 3b)
- Close other applications
- Increase system swap/page file
- Use quantized models:
ollama pull llama3.2:3b-q4_0
Jupyter Kernel Issues
Problem: Jupyter can't find the installed packages
Solutions:
# Install ipykernel in your venv
pip install ipykernel
# Create kernel spec
python -m ipykernel install --user --name=langgraph-env
# Select 'langgraph-env' as kernel in JupyterNext Steps
Once everything is installed and verified:
- Start with Tutorial 01: Chatbot Basics
- Browse all tutorials: Tutorial Index
- Join the community: GitHub Discussions
Additional Resources
Need Help?
- Check GitHub Issues for known problems
- Ask in GitHub Discussions
- FAQ section (coming soon)
Ready to start building? Head to Tutorial 01: Chatbot Basics to begin your LangGraph journey!