An agent automates the tool calling loop from the previous lesson. Instead of manually building [HumanMessage → AIMessage → ToolMessage → AIMessage] sequences, you call agent.invoke({"messages": question}) and the agent handles reasoning, tool dispatch, result injection, and final response generation autonomously. Under the hood, LangChain v1 agents are compiled LangGraph StateGraph objects.
Prerequisites: langchain, langchain-community, langchain-core, langgraph, ddgs, python-dotenv installed. Ollama running with qwen3 and llama3.2 (or gemma3) pulled.
pip install -U langchain langchain-community langchain-core langgraph
pip install -U ddgs python-dotenv
ollama pull qwen3
ollama pull gemma3
The tools.py Module
The tools.py file in the project directory defines a reusable web_search tool backed by DuckDuckGo via the ddgs library. This pattern separates tool definitions from agent logic — import the module in any notebook without re-defining tools:
# tools.py
from dotenv import load_dotenv
load_dotenv()
from langchain_core.tools import tool
from ddgs import DDGS
@tool
def web_search(query: str, num_results: int = 10) -> str:
"""Search the web using DuckDuckGo.
Args:
query: Search query string
num_results: Number of results to return (default: 10)
Returns:
Formatted search results with titles, descriptions, and URLs
"""
try:
results = list(DDGS().text(
query=query,
max_results=num_results,
region="us-en",
timelimit="d",
backend="google, bing, brave, yahoo, wikipedia, duckduckgo"
))
if not results:
return f"No results found for '{query}'"
formatted_results = [f"Search Results for '{query}':\n"]
for i, result in enumerate(results, 1):
title = result.get('title', 'No title')
body = result.get('body', 'No description available')
href = result.get('href', '')
formatted_results.append(f"{i}. **{title}**\n {body}\n {href}")
return "\n\n".join(formatted_results)
except Exception as e:
return f"Search error: {str(e)}"
Test the tool directly:
import tools
print(tools.web_search.invoke({"query": "What is Langchain?", "num_results": 1}))
Search Results for 'What is Langchain?':
1. **What is Langchain? - Analytics Vidhya**
This is where LangChain comes into play, a powerful open-source Python framework designed to simplify the development of LLM-powered applications.
https://www.analyticsvidhya.com/blog/2024/06/langchain-guide/
Agent with Explicit Model Instance
Model Parameters
Configure ChatOllama with explicit parameters for precise control over model behaviour:
| Parameter | Description |
|---|---|
temperature |
Randomness (0.0 = deterministic, 1.0 = very creative) |
num_predict |
Maximum tokens to generate (equivalent to max_tokens) |
top_k |
Number of highest-probability tokens to consider at each step |
top_p |
Cumulative probability threshold for nucleus sampling |
repeat_penalty |
Penalty multiplier for repeating tokens |
num_ctx |
Context window size in tokens |
reasoning |
Enable chain-of-thought reasoning (Qwen3 specific) |
Your First Agent
from langchain_ollama import ChatOllama
from langchain.agents import create_agent
import tools
system_prompt = """You are a helpful AI assistant.
Use the available tools when needed to answer questions accurately.
If you need to search for information, use the web_search tool.
Always provide clear and concise answers.
"""
model = ChatOllama(model="qwen3", base_url="http://localhost:11434")
agent = create_agent(model=model, tools=[tools.web_search], system_prompt=system_prompt)
create_agent returns a compiled LangGraph StateGraph. It automatically manages the message loop — calling the model, detecting tool calls, executing them, and continuing until the model produces a final answer with no pending tool calls.
Invoking the Agent
result = agent.invoke({"messages": "What is the top 10 global news right now?"})
result
{'messages': [
HumanMessage(content='What is the top 10 global news right now?', ...),
AIMessage(content='', ..., tool_calls=[{'name': 'web_search', 'args': {'num_results': 10, 'query': 'top 10 global news'}, ...}], ...),
ToolMessage(content="Search Results for 'top 10 global news':\n\n1. **What are the top 10 global news stories that have made**...", name='web_search', ...),
AIMessage(content='Here are the top 10 global news stories...', ...)
]}
The messages list shows the full reasoning trajectory: the agent searched the web, received results, and synthesized an answer.
Passing Existing Results Back to the Agent
You can re-invoke the agent with the previous result to continue the conversation:
result1 = agent.invoke(result)
The agent resumes from the existing message history — useful for multi-turn follow-ups.
Experimenting with Model Settings
Compare different model configurations to observe how parameters affect output quality and reasoning style:
question = "What is the capital of France? Provide a brief explanation."
model1 = ChatOllama(
model="qwen3",
base_url="http://localhost:11434",
temperature=0,
top_p=1,
repeat_penalty=1.2,
num_predict=1000,
num_ctx=4096,
reasoning=True
)
agent1 = create_agent(model=model1, tools=[tools.web_search], system_prompt=system_prompt)
result1 = agent1.invoke({"messages": question})
With reasoning=True, the model includes its chain-of-thought in additional_kwargs['reasoning_content'] before calling the tool:
reasoning_content: "Okay, the user is asking for the capital of France and a brief explanation.
Let me start by recalling what I know. France is a country in Europe, and I believe the capital
is Paris. But wait, I should make sure that's correct. Maybe I should check using the web_search
tool to confirm..."
Final answer after web search:
The capital of France is **Paris**. It has served as the political and administrative center of
France since the 3rd century, though it became the official capital after being liberated in 1944.
Paris is renowned for its cultural landmarks, historical significance, and role as a global hub
for art, fashion, and commerce.
Tip
reasoning=True enables Qwen3's extended thinking mode — the model reasons step-by-step before committing to a tool call or final answer. This improves accuracy for complex queries but increases latency and token usage.
Dynamic Model Selection
For cost-optimized deployments, automatically switch between a fast and a capable model based on conversation complexity. LangChain v1 supports middleware via wrap_model_call that intercepts each model call to inspect and modify the request:
Selection Logic
- < 3 messages →
qwen3(fast, efficient for simple queries) - ≥ 3 messages →
llama3.2(better reasoning, longer context)
Real-World Applications
- Customer service bots — simple queries use the fast model; complex escalations switch to the advanced model
- Research assistants — quick fact lookups stay on Qwen3; multi-step analysis moves to a larger model
from langchain_ollama import ChatOllama
from langchain.agents import create_agent, AgentState
from langgraph.runtime import Runtime
import tools
from langchain.agents.middleware import wrap_model_call, ModelRequest, ModelResponse
basic_model = ChatOllama(model="qwen3", base_url="http://localhost:11434", num_predict=1000)
advanced_model = ChatOllama(model="llama3.2", base_url="http://localhost:11434", num_predict=1000)
@wrap_model_call
def dynamic_model_selection(request: ModelRequest, handler) -> ModelResponse:
message_count = len(request.state["messages"])
if message_count < 3:
print(f"Using Qwen3 for {message_count} messages")
request.model = basic_model
else:
print(f"Using llama3.2 for {message_count} messages")
request.model = advanced_model
return handler(request)
Attach the middleware to the agent:
agent = create_agent(
model=basic_model,
tools=[tools.web_search],
system_prompt=system_prompt,
middleware=[dynamic_model_selection]
)
Verify the agent is a compiled graph:
agent
<langgraph.graph.state.CompiledStateGraph object at 0x0000018E9E7D54D0>
Helper Function
def get_agent_output(messages: list):
messages = {'messages': messages}
result = agent.invoke(messages)
return result
Run with two messages:
messages = ["How are you?", "What's the weather in Mumbai today?"]
result = get_agent_output(messages)
Using Qwen3 for 2 messages
Using llama3.2 for 4 messages
The first call starts with 2 messages → Qwen3 is used. After the tool call adds a ToolMessage, the count reaches 4 → switches to llama3.2 for the final synthesis.
Full result trajectory:
result
{'messages': [
HumanMessage(content='How are you?', ...),
HumanMessage(content="What's the weather in Mumbai today?", ...),
AIMessage(content='', ..., tool_calls=[{'name': 'web_search', 'args': {'num_results': 10, 'query': 'current weather in Mumbai'}, ...}], ...),
ToolMessage(content="Search Results for 'current weather in Mumbai':\n\n1. Mumbai weather: Sunny skies...\n https://timesofindia.indiatimes.com/...\n\n2. Mumbai weather update: IMD predicts partly cloudy skies...", ...),
AIMessage(content='The weather in Mumbai today is sunny with temperatures reaching 32.9°C...', ...)
]}
Streaming Agent Responses
Instead of waiting for the full result, stream the agent's output as it generates. Three modes are available:
stream_mode="values" — Full State at Each Step
Returns the complete message list after every step. Best for displaying incremental progress:
for chunk in agent.stream({"messages": messages}, stream_mode="values"):
print(chunk['messages'][-1].content, end='', flush=True)
print("\n\n------")
What's the weather in Mumbai today?
------
Using Qwen3 for 2 messages
------
Search Results for 'weather in Mumbai today':
1. **Mumbai Weather Today 21 October 2025, Tomorrow & Weekly IMD**
Mumbai Weather Today: temperatures will be between 27 C and 37 C...
https://www.timesnownews.com/weather/mumbai
...
------
stream_mode="updates" — Only Changed State
Returns only the delta at each step — more efficient for processing:
for chunk in agent.stream({"messages": messages}, stream_mode="updates"):
if 'model' in chunk:
chunk['model']['messages'][-1].pretty_print()
if 'tools' in chunk:
chunk['tools']['messages'][-1].pretty_print()
print("\n\n------")
Using Qwen3 for 2 messages
================================== Ai Message ==================================
I'm just a helpful AI assistant, but I'm here and ready to help! 😊
For the weather in Mumbai today, let me check that for you:
Tool Calls:
web_search (5baa36dd-...)
Args:
num_results: 5
query: weather in Mumbai today
------
================================= Tool Message =================================
Name: web_search
Search Results for 'weather in Mumbai today':
1. **Mumbai Weather Today 21 October 2025...**
Mumbai Weather Today: temperatures will be between 27 C and 37 C...
...
------
Tip
Use stream_mode="values" for building chat UIs where you display the growing conversation. Use stream_mode="updates" for logging pipelines where you only care about what changed. Use stream_mode="messages" (not shown) for token-by-token streaming of the final answer only.
Agent Streaming Modes & Configurations
| Streaming Mode | Output Deltas | Best Use Case |
|---|---|---|
stream_mode="values" |
Full conversation history at each step | Building real-time chat UIs to render state progression |
stream_mode="updates" |
Only the raw state change of the current step | Backend logging, auditing agent trajectories, or routing pipelines |
stream_mode="messages" |
Token-by-token generation of the final assistant response | Displaying live typing effects for user interfaces |
What You Built
In this lesson you built fully autonomous LangChain agents using create_agent:
- Basic agent —
create_agent(model, tools, system_prompt)+.invoke()runs the complete tool loop automatically - Model parameters —
temperature,num_predict,top_p,reasoningdocumented with their effects - tools.py — reusable
web_searchtool backed by DuckDuckGo, importable across notebooks - Dynamic model selection —
wrap_model_callmiddleware switches betweenqwen3andllama3.2based on message count - Streaming — three modes (
values,updates,messages) for different consumption patterns
The agent is a compiled CompiledStateGraph — a LangGraph state machine that automatically manages the reasoning loop until completion.