#langchain#agents#langgraph#create-agent#tools#web-search#streaming#dynamic-model-selection#qwen3#ollama#python

LangChain Agents with create_agent

Build autonomous LangChain v1 agents using create_agent — wire web search tools, configure model parameters, implement dynamic model switching, and stream agent responses.

Jun 4, 2026 at 10:30 AM8 min readFollowFollow (Hindi)

Topics You Will Master

Understanding how create_agent abstracts the tool calling loop into an autonomous agent
Creating an agent with explicit ChatOllama model configuration and a web_search tool
Key ChatOllama model parameters: temperature, num_predict, top_k, top_p, repeat_penalty, num_ctx, reasoning
Building a reusable tools.py module with a DuckDuckGo-backed web_search tool
Invoking the agent and inspecting the full message trajectory (HumanMessageAIMessage with tool calls → ToolMessage → final AIMessage)
Implementing dynamic model selection middleware with wrap_model_call — switching between qwen3 (fast) and llama3.2 (advanced) based on conversation length
Streaming agent responses with three stream_mode options: "values", "updates", and "messages"
Best For

Developers who understand tool calling and want to automate the full reasoning-and-action loop with LangChain's create_agent abstraction.

Expected Outcome

A fully functional LangChain agent that autonomously searches the web, handles multi-turn conversations, and selects the optimal LLM dynamically — with live streaming output.

An agent automates the tool calling loop from the previous lesson. Instead of manually building [HumanMessage → AIMessage → ToolMessage → AIMessage] sequences, you call agent.invoke({"messages": question}) and the agent handles reasoning, tool dispatch, result injection, and final response generation autonomously. Under the hood, LangChain v1 agents are compiled LangGraph StateGraph objects.

Prerequisites: langchain, langchain-community, langchain-core, langgraph, ddgs, python-dotenv installed. Ollama running with qwen3 and llama3.2 (or gemma3) pulled.

BASH
pip install -U langchain langchain-community langchain-core langgraph
pip install -U ddgs python-dotenv
ollama pull qwen3
ollama pull gemma3

LangChain & Ollama — Local AI Development

Build production-ready LLM apps entirely on your own hardware. No API keys, no cloud costs.

Enroll on Udemy →

The tools.py Module

The tools.py file in the project directory defines a reusable web_search tool backed by DuckDuckGo via the ddgs library. This pattern separates tool definitions from agent logic — import the module in any notebook without re-defining tools:

PYTHON
# tools.py
from dotenv import load_dotenv
load_dotenv()

from langchain_core.tools import tool
from ddgs import DDGS

@tool
def web_search(query: str, num_results: int = 10) -> str:
    """Search the web using DuckDuckGo.

    Args:
        query: Search query string
        num_results: Number of results to return (default: 10)

    Returns:
        Formatted search results with titles, descriptions, and URLs
    """
    try:
        results = list(DDGS().text(
            query=query,
            max_results=num_results,
            region="us-en",
            timelimit="d",
            backend="google, bing, brave, yahoo, wikipedia, duckduckgo"
        ))

        if not results:
            return f"No results found for '{query}'"

        formatted_results = [f"Search Results for '{query}':\n"]
        for i, result in enumerate(results, 1):
            title = result.get('title', 'No title')
            body = result.get('body', 'No description available')
            href = result.get('href', '')
            formatted_results.append(f"{i}. **{title}**\n   {body}\n   {href}")

        return "\n\n".join(formatted_results)

    except Exception as e:
        return f"Search error: {str(e)}"

Test the tool directly:

PYTHON
import tools

print(tools.web_search.invoke({"query": "What is Langchain?", "num_results": 1}))
OUTPUT
Search Results for 'What is Langchain?':

1. **What is Langchain? - Analytics Vidhya**
   This is where LangChain comes into play, a powerful open-source Python framework designed to simplify the development of LLM-powered applications.
   https://www.analyticsvidhya.com/blog/2024/06/langchain-guide/

Agent with Explicit Model Instance

Model Parameters

Configure ChatOllama with explicit parameters for precise control over model behaviour:

Parameter Description
temperature Randomness (0.0 = deterministic, 1.0 = very creative)
num_predict Maximum tokens to generate (equivalent to max_tokens)
top_k Number of highest-probability tokens to consider at each step
top_p Cumulative probability threshold for nucleus sampling
repeat_penalty Penalty multiplier for repeating tokens
num_ctx Context window size in tokens
reasoning Enable chain-of-thought reasoning (Qwen3 specific)

Your First Agent

PYTHON
from langchain_ollama import ChatOllama
from langchain.agents import create_agent
import tools

system_prompt = """You are a helpful AI assistant.
Use the available tools when needed to answer questions accurately.
If you need to search for information, use the web_search tool.
Always provide clear and concise answers.
"""

model = ChatOllama(model="qwen3", base_url="http://localhost:11434")

agent = create_agent(model=model, tools=[tools.web_search], system_prompt=system_prompt)

create_agent returns a compiled LangGraph StateGraph. It automatically manages the message loop — calling the model, detecting tool calls, executing them, and continuing until the model produces a final answer with no pending tool calls.

Invoking the Agent

PYTHON
result = agent.invoke({"messages": "What is the top 10 global news right now?"})
result
PYTHON
{'messages': [
  HumanMessage(content='What is the top 10 global news right now?', ...),
  AIMessage(content='', ..., tool_calls=[{'name': 'web_search', 'args': {'num_results': 10, 'query': 'top 10 global news'}, ...}], ...),
  ToolMessage(content="Search Results for 'top 10 global news':\n\n1. **What are the top 10 global news stories that have made**...", name='web_search', ...),
  AIMessage(content='Here are the top 10 global news stories...', ...)
]}

The messages list shows the full reasoning trajectory: the agent searched the web, received results, and synthesized an answer.

Passing Existing Results Back to the Agent

You can re-invoke the agent with the previous result to continue the conversation:

PYTHON
result1 = agent.invoke(result)

The agent resumes from the existing message history — useful for multi-turn follow-ups.


Experimenting with Model Settings

Compare different model configurations to observe how parameters affect output quality and reasoning style:

PYTHON
question = "What is the capital of France? Provide a brief explanation."

model1 = ChatOllama(
    model="qwen3",
    base_url="http://localhost:11434",
    temperature=0,
    top_p=1,
    repeat_penalty=1.2,
    num_predict=1000,
    num_ctx=4096,
    reasoning=True
)

agent1 = create_agent(model=model1, tools=[tools.web_search], system_prompt=system_prompt)
result1 = agent1.invoke({"messages": question})

With reasoning=True, the model includes its chain-of-thought in additional_kwargs['reasoning_content'] before calling the tool:

PLAINTEXT
reasoning_content: "Okay, the user is asking for the capital of France and a brief explanation.
Let me start by recalling what I know. France is a country in Europe, and I believe the capital
is Paris. But wait, I should make sure that's correct. Maybe I should check using the web_search
tool to confirm..."

Final answer after web search:

PYTHON
The capital of France is **Paris**. It has served as the political and administrative center of
France since the 3rd century, though it became the official capital after being liberated in 1944.
Paris is renowned for its cultural landmarks, historical significance, and role as a global hub
for art, fashion, and commerce.

Tip

reasoning=True enables Qwen3's extended thinking mode — the model reasons step-by-step before committing to a tool call or final answer. This improves accuracy for complex queries but increases latency and token usage.


Dynamic Model Selection

For cost-optimized deployments, automatically switch between a fast and a capable model based on conversation complexity. LangChain v1 supports middleware via wrap_model_call that intercepts each model call to inspect and modify the request:

Selection Logic

  • < 3 messagesqwen3 (fast, efficient for simple queries)
  • ≥ 3 messagesllama3.2 (better reasoning, longer context)

Real-World Applications

  • Customer service bots — simple queries use the fast model; complex escalations switch to the advanced model
  • Research assistants — quick fact lookups stay on Qwen3; multi-step analysis moves to a larger model
PYTHON
from langchain_ollama import ChatOllama
from langchain.agents import create_agent, AgentState
from langgraph.runtime import Runtime
import tools
from langchain.agents.middleware import wrap_model_call, ModelRequest, ModelResponse

basic_model = ChatOllama(model="qwen3", base_url="http://localhost:11434", num_predict=1000)
advanced_model = ChatOllama(model="llama3.2", base_url="http://localhost:11434", num_predict=1000)

@wrap_model_call
def dynamic_model_selection(request: ModelRequest, handler) -> ModelResponse:
    message_count = len(request.state["messages"])

    if message_count < 3:
        print(f"Using Qwen3 for {message_count} messages")
        request.model = basic_model
    else:
        print(f"Using llama3.2 for {message_count} messages")
        request.model = advanced_model

    return handler(request)

Attach the middleware to the agent:

PYTHON
agent = create_agent(
    model=basic_model,
    tools=[tools.web_search],
    system_prompt=system_prompt,
    middleware=[dynamic_model_selection]
)

Verify the agent is a compiled graph:

PLAINTEXT
agent
PLAINTEXT
<langgraph.graph.state.CompiledStateGraph object at 0x0000018E9E7D54D0>

Helper Function

PYTHON
def get_agent_output(messages: list):
    messages = {'messages': messages}
    result = agent.invoke(messages)
    return result

Run with two messages:

PYTHON
messages = ["How are you?", "What's the weather in Mumbai today?"]
result = get_agent_output(messages)
OUTPUT
Using Qwen3 for 2 messages
Using llama3.2 for 4 messages

The first call starts with 2 messages → Qwen3 is used. After the tool call adds a ToolMessage, the count reaches 4 → switches to llama3.2 for the final synthesis.

Full result trajectory:

PLAINTEXT
result
PYTHON
{'messages': [
  HumanMessage(content='How are you?', ...),
  HumanMessage(content="What's the weather in Mumbai today?", ...),
  AIMessage(content='', ..., tool_calls=[{'name': 'web_search', 'args': {'num_results': 10, 'query': 'current weather in Mumbai'}, ...}], ...),
  ToolMessage(content="Search Results for 'current weather in Mumbai':\n\n1. Mumbai weather: Sunny skies...\n   https://timesofindia.indiatimes.com/...\n\n2. Mumbai weather update: IMD predicts partly cloudy skies...", ...),
  AIMessage(content='The weather in Mumbai today is sunny with temperatures reaching 32.9°C...', ...)
]}

Streaming Agent Responses

Instead of waiting for the full result, stream the agent's output as it generates. Three modes are available:

stream_mode="values" — Full State at Each Step

Returns the complete message list after every step. Best for displaying incremental progress:

PYTHON
for chunk in agent.stream({"messages": messages}, stream_mode="values"):
    print(chunk['messages'][-1].content, end='', flush=True)
    print("\n\n------")
OUTPUT
What's the weather in Mumbai today?

------
Using Qwen3 for 2 messages

------
Search Results for 'weather in Mumbai today':

1. **Mumbai Weather Today 21 October 2025, Tomorrow & Weekly IMD**
   Mumbai Weather Today: temperatures will be between 27 C and 37 C...
   https://www.timesnownews.com/weather/mumbai
...

------

stream_mode="updates" — Only Changed State

Returns only the delta at each step — more efficient for processing:

PYTHON
for chunk in agent.stream({"messages": messages}, stream_mode="updates"):
    if 'model' in chunk:
        chunk['model']['messages'][-1].pretty_print()
    if 'tools' in chunk:
        chunk['tools']['messages'][-1].pretty_print()
    print("\n\n------")
PYTHON
Using Qwen3 for 2 messages
================================== Ai Message ==================================

I'm just a helpful AI assistant, but I'm here and ready to help! 😊

For the weather in Mumbai today, let me check that for you:
Tool Calls:
  web_search (5baa36dd-...)
   Args:
     num_results: 5
     query: weather in Mumbai today

------
================================= Tool Message =================================
Name: web_search

Search Results for 'weather in Mumbai today':

1. **Mumbai Weather Today 21 October 2025...**
   Mumbai Weather Today: temperatures will be between 27 C and 37 C...
...
------

Tip

Use stream_mode="values" for building chat UIs where you display the growing conversation. Use stream_mode="updates" for logging pipelines where you only care about what changed. Use stream_mode="messages" (not shown) for token-by-token streaming of the final answer only.

Agent Streaming Modes & Configurations

Streaming Mode Output Deltas Best Use Case
stream_mode="values" Full conversation history at each step Building real-time chat UIs to render state progression
stream_mode="updates" Only the raw state change of the current step Backend logging, auditing agent trajectories, or routing pipelines
stream_mode="messages" Token-by-token generation of the final assistant response Displaying live typing effects for user interfaces

What You Built

In this lesson you built fully autonomous LangChain agents using create_agent:

  • Basic agentcreate_agent(model, tools, system_prompt) + .invoke() runs the complete tool loop automatically
  • Model parameterstemperature, num_predict, top_p, reasoning documented with their effects
  • tools.py — reusable web_search tool backed by DuckDuckGo, importable across notebooks
  • Dynamic model selectionwrap_model_call middleware switches between qwen3 and llama3.2 based on message count
  • Streaming — three modes (values, updates, messages) for different consumption patterns

The agent is a compiled CompiledStateGraph — a LangGraph state machine that automatically manages the reasoning loop until completion.

Find this tutorial useful?

Subscribe to our YouTube channels for more practical production walk-throughs.

Discussion & Comments