Agentic PageRAG: A ReAct Agent with LangGraph

Build a ReAct agent in LangGraph that wraps retrieval as a tool, decides when to call it, decomposes comparison questions, and answers SEC filings with citations.

Jun 17, 20267 min readFollow

Topics You Will Master

Binding a retrieval tool to a local LLM with bind_tools
Building a ReAct loop in LangGraph with an agent node and a ToolNode
Routing with a conditional edge that loops until the agent stops calling tools
Prompting the agent to decompose comparison questions into multiple retrievals

The simplest agentic RAG pattern is a ReAct loop: the LLM is bound to a retrieval tool and decides on its own when to call it. For a simple question it retrieves once; for a comparison it calls the tool multiple times — once per company — before composing a final answer. This is smarter than a fixed RAG chain, which always retrieves exactly once regardless of the question.

This lesson builds on the `retrieve_docs` tool and the ingested ChromaDB collection from earlier in the series.

Prerequisites: The retrieve_docs tool (and its scripts/ helpers) from RAG Data Retrieval and Re-Ranking. Ollama running with qwen3, plus the packages below.

BASH
pip install -U langgraph langchain-ollama langchain-core
ollama pull qwen3
95% OFF

Private Agentic RAG with LangGraph and Ollama

Step-by-step guide to building private, self-correcting RAG systems with LangGraph, ChromaDB, and local models like Qwen3 and gpt-oss.

Enroll Now — 95% OFF →

State and Setup

The graph state is just a growing list of messages. The operator.add reducer appends new messages instead of overwriting them.

PYTHON
from typing_extensions import TypedDict, Annotated
import operator
import os

from langgraph.graph import StateGraph, START, END
from langgraph.prebuilt import ToolNode
from langchain_ollama import ChatOllama
from langchain_core.messages import HumanMessage, SystemMessage
from IPython.display import display, Markdown
from scripts import utils, my_tools

LLM_MODEL = "qwen3"
BASE_URL = "http://localhost:11434"
llm = ChatOllama(model=LLM_MODEL, base_url=BASE_URL)

class AgentState(TypedDict):
    messages: Annotated[list, operator.add]

The Agent Node

A tool-bound LLM choosing to call the retrieval tool or answer directly

The agent node binds the retrieve_docs tool and supplies a detailed system prompt instructing the model to always retrieve before answering, to decompose comparison questions into sub-questions, and to format answers in Markdown with citations.

PYTHON
from scripts.my_tools import retrieve_docs

def agent_node(state: AgentState):
    messages = state['messages']

    tools = [retrieve_docs]
    llm_with_tools = llm.bind_tools(tools)

    system_prompt = """You are a financial document analysis assistant with access to a document retrieval tool.

                CRITICAL RULES:
                1. ALWAYS use the retrieve_docs tool first - NEVER answer from memory
                2. You MUST call the tool before providing any financial information
                3. Answer ONLY based on the retrieved documents
                4. If documents don't contain the answer, clearly state that

                WORKFLOW FOR COMPLEX/COMPARISON QUESTIONS:
                Step 1: Break down the question into sub-questions
                Step 2: Call retrieve_docs for EACH sub-question separately
                Step 3: Analyze all retrieved documents
                Step 4: Present comparison in TABLE format

                ANSWER FORMATTING (Use Markdown):
                - Use **headings** (##, ###) for sections
                - Use **tables** for comparisons and structured data
                - Cite sources: (Company: X, Year: Y, Quarter: Z, Page: N)

                REMEMBER:
                - ALWAYS call the tool first
                - Break complex questions into sub-questions
                - Always cite sources
                - If no relevant documents are found, try with different filters."""

    messages = [SystemMessage(system_prompt)] + messages
    response = llm_with_tools.invoke(messages)

    if hasattr(response, 'tool_calls') and response.tool_calls:
        for tc in response.tool_calls:
            print(f"[AGENT] called Tool {tc.get('name', '?')} with args {tc.get('args', '?')}")
    else:
        print(f"[AGENT] Responding...")

    return {'messages': [response]}

Note

The system prompt is abridged here for readability — the full version includes worked examples for simple, comparison, and multi-part questions. Rich few-shot examples noticeably improve how reliably a small local model follows the "always retrieve first" rule.

Routing

A router checks whether the latest message contains tool calls. If so, control passes to the tool node; otherwise the graph ends.

PYTHON
def should_continue(state: AgentState):
    last = state['messages'][-1]
    if hasattr(last, 'tool_calls') and last.tool_calls:
        return "tools"
    else:
        return END

Building the Graph

The ReAct loop: a conditional edge cycles the agent and tools until it answers

Assemble the graph: agent → (tools → agent)* → END. The conditional edge creates the ReAct loop — the agent runs, optionally calls tools, sees the results, and runs again until it produces a final answer.

PYTHON
def create_agent():
    builder = StateGraph(AgentState)

    builder.add_node('agent', agent_node)
    builder.add_node('tools', ToolNode([retrieve_docs]))

    builder.add_edge(START, 'agent')
    builder.add_conditional_edges('agent', should_continue, ['tools', END])
    builder.add_edge('tools', 'agent')

    return builder.compile()

agent = create_agent()

Testing the Agent

A comparison question split into per-company retrievals, then merged into a table

A simple question triggers a single retrieval and a cited answer.

PYTHON
query = "what is the amazon's revenue in 2023?"
result = agent.invoke({'messages': [HumanMessage(query)]})
display(Markdown(result['messages'][-1].content))
OUTPUT
[AGENT] called Tool retrieve_docs with args {'k': 5, 'query': "Amazon's revenue in 2023"}

[TOOL] retrieve_docs called
[QUERY] Amazon's revenue in 2023
   [1] Doc 17: score=23.4068
   [2] Doc 3: score=22.7389
   [3] Doc 8: score=20.1882
   [4] Doc 14: score=19.2649
   [5] Doc 5: score=18.0985
[RETRIEVED] 5 documents
[AGENT] Responding...

The model's Markdown answer:

Amazon's 2023 Revenue

Amazon's total revenue for 2023 was $574.785 billion, as reported in its consolidated net sales figures from the 10-K filing.

Key Details:

  • Consolidated Net Sales: $574,785 million (Page 24)
  • Year-over-Year Growth: 12% increase compared to 2022
  • Segment Breakdown: North America 131.200 billion (23%), AWS $90.757 billion (16%)

Source: Amazon 10-K 2023, Page 24

A comparison question makes the agent call the tool twice — once for Amazon, once for Google — before synthesizing a table.

PYTHON
query = "what is the revenue of amazon's and google in 2023?"
result = agent.invoke({'messages': [HumanMessage(query)]})
display(Markdown(result['messages'][-1].content))
OUTPUT
[AGENT] called Tool retrieve_docs with args {'k': 5, 'query': 'Amazon revenue 2023'}
[AGENT] called Tool retrieve_docs with args {'k': 5, 'query': 'Google revenue 2023'}

[TOOL] retrieve_docs called
[QUERY] Amazon revenue 2023

[TOOL] retrieve_docs called
[QUERY] Google revenue 2023
[RETRIEVED] 5 documents
[RETRIEVED] 5 documents
[AGENT] Responding...

This works well when the documents contain the answer — but the agent has no way to recover if retrieval returns junk. The next lesson, Corrective RAG (CRAG), adds a grading step and a web-search fallback to handle exactly that.


What You Built

In this lesson you built an Agentic PageRAG agent:

  • Tool-bound LLMbind_tools lets qwen3 decide when to call retrieve_docs
  • Agent node — a system prompt enforces retrieve-first behavior and Markdown citations
  • ReAct loop — a conditional edge cycles agent → tools → agent until the agent stops calling tools
  • Question decomposition — comparison questions trigger one retrieval per company before the answer
  • Grounded answers — every response cites the company, year, quarter, and page

The ReAct pattern is the foundation for the self-correcting patterns that follow — each one adds a quality check or a fallback to this basic loop.

Found this useful? Keep building with me.

New tutorials every week on YouTube — or go deeper with a full structured course.

Find this tutorial useful?

Subscribe to our YouTube channels for more practical production walk-throughs.

Discussion & Comments