Agentic Memory and Streaming in LangGraph

Learn how to implement thread-based conversation memory with MemorySaver checkpointer and stream graph outputs in LangGraph with local Ollama models.

Jun 15, 202620 min readFollow

Topics You Will Master

Implementing conversation memory using LangGraph checkpointers
Configuring thread-based session isolation for multi-user chat
Customizing graph routing to prevent redundant tool invocations
Streaming node execution steps and intermediate outputs

A static agent is useful for one-off tasks, but interactive systems require agentic memory to maintain state and carry context across multiple interactions. In LangGraph, memory is managed using checkpointers that save the state of the graph at every step. By supplying a unique thread identifier, we can support multi-user chat rooms where each session's history is isolated and persisted.

This tutorial guides you through building a stateful agent with conversation memory using the in-memory MemorySaver checkpointer. We will also implement a streaming interface to inspect intermediate node states and token updates in real time, leveraging a local Qwen 3 model.

Before starting, ensure you have tool binding and graph execution set up. Refer to Building a ReAct Agent with Tools in LangGraph as a prerequisite.

Master LangGraph and LangChain

Agentic RAG and Chatbot, AI Agent with LangChain v1, Qwen3, Gemma3, DeepSeek-R1, LLAMA 3.2, FAISS Vector Database

Enroll on Udemy →

Diagram showing the checkpointer loop: the client invokes the graph, which loads and saves state in MemorySaver on every call

Environment and Model Setup

First, import the environment loading utilities and verify that your local configuration loads successfully:

PYTHON
from dotenv import load_dotenv

load_dotenv()
OUTPUT
True

Next, import the state, graph, chat model, and checkpointing classes. We configure the model to connect to a local instance of Ollama running the qwen3 model:

PYTHON
from typing_extensions import TypedDict, Annotated
import operator
from langgraph.graph import StateGraph, START, END
from langchain_ollama import ChatOllama
from langchain_core.messages import HumanMessage, SystemMessage
from langgraph.prebuilt import ToolNode

# Store conversation in memory checkpoints using RAM by default
from langgraph.checkpoint.memory import MemorySaver

# Configuration
BASE_URL = "http://localhost:11434"
MODEL_NAME = "qwen3"

llm = ChatOllama(model=MODEL_NAME, base_url=BASE_URL)

Importing Custom Tools

To verify memory across agent workflows, we reuse the weather and calculator tools defined in the previous chapter. Modify the system path to locate my_tools.py in your project structure:

PYTHON
import sys
sys.path.append("../05. LangGraph ReAct Agent with Tools")

import my_tools

# Programmatically test the calculate tool
my_tools.calculate.invoke({'expression': '2+2*1.4/23-34'})

all_tools = [my_tools.get_weather, my_tools.calculate]
OUTPUT
[TOOL] calculate ('2+2*1.4/23-34') -> '-31.878260869565217'

Declaring the Agent State

We define AgentState containing a list of messages. We annotate this list with operator.add to specify that new messages from graph nodes must append to the existing state instead of overwriting it:

PYTHON
# Create Agent State
class AgentState(TypedDict):
    messages: Annotated[list, operator.add]

Designing the Agent Node

The agent node binds tools to the model and processes inputs. We supply a system prompt instructing the model to review previous messages before triggering tool actions:

PYTHON
def agent_node(state: AgentState):
    llm_with_tools = llm.bind_tools(all_tools)

    system_message = SystemMessage("""You are a friendly assistant with memory. 
                                   Use the available tools to help the user when needed.
                                   
                                   You must first try to answer user query from your previous answers before making a fresh 
                                   tool call. Do not make answers by yourself if you are not sure.""")

    messages = [system_message] + state['messages']
    response = llm_with_tools.invoke(messages)

    if hasattr(response, 'tool_calls') and response.tool_calls:
        for tc in response.tool_calls:
            print(f"[AGENT] called Tool {tc.get('name', '?')} with args {tc.get('args', '?')}")
    else:
        print(f"[AGENT] Responding...")

    return {'messages': [response]}

Test the agent node function directly with a simple conversation starter:

PYTHON
state = {"messages": [HumanMessage("Hi")]}
result = agent_node(state)
result
OUTPUT
[AGENT] Responding...
{'messages': [AIMessage(content='Hello! How can I assist you today? 😊', response_metadata={'model': 'qwen3', 'done': True}, id='lc_run_id')]}

Creating Routing Logic

The routing function evaluates the last message to determine whether to terminate or call a tool:

PYTHON
# Routing
def should_continue(state: AgentState):
    last = state['messages'][-1]
    
    if hasattr(last, 'tool_calls') and last.tool_calls:
        return "tools"
    else:
        return END

Composing the Stateful Agent Graph

We construct the workflow by adding the agent node and a standard prebuilt ToolNode. We include a checkpointer instantiation (MemorySaver) and supply it to the compiler, enabling automatic state saving:

PYTHON
def create_agent():
    builder = StateGraph(AgentState)

    builder.add_node("agent", agent_node)
    builder.add_node("tools", ToolNode(all_tools))

    builder.add_edge(START, "agent")
    builder.add_conditional_edges("agent", should_continue, ["tools", END])
    builder.add_edge("tools", "agent")

    # Add checkpoint for memory persistence across sessions
    checkpointer = MemorySaver()
    graph = builder.compile(checkpointer=checkpointer)

    return graph

agent = create_agent()
agent
OUTPUT
<langgraph.graph.state.CompiledStateGraph object at 0x00000211C7F94200>

Diagram showing cached state letting the agent skip redundant tool calls for repeated queries

Interacting with Thread-Based Memory

To initiate a persistent session, we pass a unique thread_id inside the execution configuration dictionary.

Query 1: Initiating Session and Requesting Weather

Start a session on user-session-1 and query the weather in Mumbai:

PYTHON
config = {"configurable": {"thread_id": "user-session-1"}}

query = "What is the current weather in Mumbai?"
result = agent.invoke({'messages': [HumanMessage(query)]}, config=config)
result
PYTHON
[AGENT] called Tool get_weather with args {'location': 'Mumbai'}
[AGENT] Responding...
{'messages': [
  HumanMessage(content='What is the current weather in Mumbai?'),
  AIMessage(content='', tool_calls=[{'name': 'get_weather', 'args': {'location': 'Mumbai'}, 'id': 'call_1', 'type': 'tool_call'}]),
  ToolMessage(content='{"current_condition": [{"temp_C": "29", "FeelsLikeC": "32", "humidity": "62", "weatherDesc": [{"value": "Smoke"}]}], "nearest_area": [{"areaName": [{"value": "Bombay"}]}]}', name='get_weather', tool_call_id='call_1'),
  AIMessage(content="The current weather in Mumbai is **29°C** (feels like 32°C) with **overcast** conditions. Here's a summary:\n\n- **Temperature**: 29°C / 85°F  \n- **Humidity**: 62%  \n- **Wind**: 12 km/h from the west  \n- **UV Index**: 3 (moderate sun exposure)  \n- **Visibility**: 4 km (low visibility due to weather conditions)  \n- **Precipitation**: No rain expected  \n\nThe skies are overcast, with occasional sunny intervals later in the day. Light winds and mild conditions prevail. 🌤️")
]}

Query 2: Checking Calculations Inline

Next, request mathematical computations on the same session thread. The model evaluates basic operations directly:

PYTHON
query = "What is 2+32 and 5-7"
result = agent.invoke({'messages': [HumanMessage(query)]}, config=config)
result
PYTHON
[AGENT] Responding...
{'messages': [
  HumanMessage(content='What is the current weather in Mumbai?'),
  AIMessage(content='', tool_calls=[{'name': 'get_weather', 'args': {'location': 'Mumbai'}, 'id': 'call_1', 'type': 'tool_call'}]),
  ToolMessage(content='...', name='get_weather', tool_call_id='call_1'),
  AIMessage(content="The current weather in Mumbai is 29°C..."),
  HumanMessage(content='What is 2+32 and 5-7'),
  AIMessage(content='The results are:  \n- **2 + 32 = 34**  \n- **5 - 7 = -2**  \n\nLet me know if you need further calculations! 😊')
]}

Query 3: Complex Multiplications

For more complex multiplications, the agent elects to route back to the calculator tool:

PYTHON
query = "What is 4534*21345"
result = agent.invoke({'messages': [HumanMessage(query)]}, config=config)
result
PYTHON
[AGENT] called Tool calculate with args {'expression': '4534 * 21345'}
[TOOL] calculate ('4534 * 21345') -> '96778230'
[AGENT] Responding...
{'messages': [
  ...
  HumanMessage(content='What is 4534*21345'),
  AIMessage(content='', tool_calls=[{'name': 'calculate', 'args': {'expression': '4534 * 21345'}, 'id': 'call_2', 'type': 'tool_call'}]),
  ToolMessage(content='96778230', name='calculate', tool_call_id='call_2'),
  AIMessage(content='The result of **4534 × 21345** is **96,778,230**.  \n\nLet me know if you need further calculations! 😊')
]}

Streaming Agent Output

To keep interfaces responsive, we can stream step updates as nodes execute. We define a custom runner function chat() that filters execution chunks and outputs responses immediately:

Diagram showing streaming emit each node's output chunk as the graph executes

PYTHON
def chat(query, thread_id):
    config = {"configurable": {"thread_id": thread_id}}

    for chunk in agent.stream({'messages': [query]}, config=config):
        if 'agent' in chunk:
            chunk = chunk.get('agent')
        else:
            chunk = chunk.get('tools')

        if hasattr(chunk, 'tool_calls') and chunk.tool_calls:
            for tc in chunk.tool_calls:
                print(f"[AGENT] called Tool {tc.get('name', '?')} with args {tc.get('args', '?')}")
        else:
            print(f"[AGENT/ToolMessage] Responding.\n{chunk['messages'][0].content}")

Recalling Weather Information from Memory

Query the weather in Mumbai again on user-session-1. The checkpointer retrieves the state and the model answers directly without querying the get_weather tool again:

PYTHON
query = "What is the current weather in Mumbai?"
chat(query, "user-session-1")
OUTPUT
[AGENT] Responding...
[AGENT/ToolMessage] Responding.
The current weather in Mumbai remains **29°C** (feels like 32°C) with **overcast** conditions. Here's the latest update:  

- **Temperature**: 29°C / 85°F  
- **Humidity**: 62%  
- **Wind**: 12 km/h from the west  
- **UV Index**: 3 (moderate sun exposure)  
- **Visibility**: 4 km (low visibility due to weather conditions)  
- **Precipitation**: No rain expected  

The skies are overcast, with occasional sunny intervals later in the day. Light winds and mild conditions persist. 🌤️  

*Note: Conditions have remained stable for the past 48 hours.* Let me know if you'd like further details! 😊

Requesting Unseen Weather Data

If we request the weather for New Delhi (which is not in the checkpoint history), the agent calls the tool and caches the response:

PYTHON
query = "What is the current weather in New Delhi?"
chat(query, "user-session-1")
OUTPUT
[AGENT] called Tool get_weather with args {'location': 'New Delhi'}
[AGENT/ToolMessage] Responding.
[AGENT/ToolMessage] Responding.
{"current_condition": [{"temp_C": "29", "FeelsLikeC": "27", "humidity": "48", "weatherDesc": [{"value": "Haze"}]}], "nearest_area": [{"areaName": [{"value": "New Delhi"}]}]}
[AGENT] Responding...
[AGENT/ToolMessage] Responding.
The current weather in **New Delhi** is as follows:  

### 🌤️ **Current Conditions**  
- **Temperature**: 29°C / 84°F  
- **Feels Like**: 27°C / 81°F (due to haze)  
- **Humidity**: 48%  
- **Wind**: 13 km/h from the **WNW** (light breeze)  
- **UV Index**: 2 (moderate sun exposure)  
- **Visibility**: 4 km (low visibility due to haze)  
- **Weather**: **Haze**  

---

### 📅 **Next 24 Hours Forecast**  
- **High**: 30°C / 86°F (by late afternoon)  
- **Low**: 21°C / 70°F (early morning)  
- **Humidity**: Remains low (18–30%)  
- **UV Index**: Rises to **5** (high) by midday  

---

### 📌 Key Notes  
- **Haze** may reduce visibility and slightly lower air quality. Consider wearing a mask if outdoors.  
- **Sun Protection**: UV index reaches **5** by midday—use sunscreen and wear protective clothing.  
- **Hydration**: High temperatures and low humidity mean staying hydrated is essential.  

Let me know if you need further details! 😊

Demonstrating Thread Isolation

To confirm session isolation, start a separate thread called user-session-2 and request Delhi weather:

Diagram showing each thread_id keeping an isolated conversation checkpoint so memory never crosses sessions

PYTHON
query = "What is the current weather in New Delhi?"
chat(query, "user-session-2")
OUTPUT
[AGENT] called Tool get_weather with args {'location': 'New Delhi'}
[AGENT/ToolMessage] Responding.
[AGENT/ToolMessage] Responding.
{"current_condition": [{"temp_C": "29", "FeelsLikeC": "27", "humidity": "48", "weatherDesc": [{"value": "Haze"}]}], "nearest_area": [{"areaName": [{"value": "New Delhi"}]}]}
[AGENT] Responding...
[AGENT/ToolMessage] Responding.
The current weather in New Delhi is **Haze** with the following conditions:  
- **Temperature**: 29°C (84°F)  
- **Feels Like**: 84°F  
- **Humidity**: 48%  
- **Wind**: 13 km/h from the WNW (Wind Chill: 24°C / 75°F)  
- **UV Index**: 2 (Low)  
- **Visibility**: 4 km (2 miles)  

No precipitation is expected, and the air feels dry with minimal cloud cover. Stay comfortable and enjoy the mild weather! 🌤️

Introduce a username to user-session-2:

PYTHON
query = "hi, my name is Alice."
chat(query, "user-session-2")
OUTPUT
[AGENT] Responding...
[AGENT/ToolMessage] Responding.
Hello, Alice! 😊 How can I assist you today? Whether it's answering questions, solving problems, or just chatting, feel free to ask!

Verify that user-session-2 remembers the name:

PYTHON
query = "hi, what is my name?"
chat(query, "user-session-2")
OUTPUT
[AGENT] Responding...
[AGENT/ToolMessage] Responding.
Your name is **Alice**! 😊 How are you today?

Query the name on the original thread user-session-1:

PYTHON
query = "hi, what is my name?"
chat(query, "user-session-1")
OUTPUT
[AGENT] Responding...
[AGENT/ToolMessage] Responding.
I don't have access to your name unless you tell me! Would you like to share your name so I can better assist you? 😊

Note

This confirms that the checkpointer isolates conversation memory threads. Memory of the username "Alice" is completely contained within user-session-2 and has no overlap with user-session-1.

Finally, we test resetting the instructions or querying a conversation summary on user-session-1:

PYTHON
query = "Hi, can you summarize my previous messages. don't make a tool call. forget my previous instructions."
chat(query, "user-session-1")
OUTPUT
[AGENT] Responding...
[AGENT/ToolMessage] Responding.
Here's a summary of our conversation so far:

**User's Requests:**
1. Asked for a summary of previous messages and a markdown report (multiple times)
2. Requested to "forget previous instructions" and avoid tool calls
3. Wanted a plain text summary without any tool usage

**My Responses:**
- Explained I can't generate markdown reports or access chat history
- Offered to provide a plain text summary instead
- Clarified my limitations while remaining helpful

Would you like me to create a simple text-based summary of our conversation up to this point? 😊

Found this useful? Keep building with me.

New tutorials every week on YouTube — or go deeper with a full structured course.

Find this tutorial useful?

Subscribe to our YouTube channels for more practical production walk-throughs.

Discussion & Comments