Corrective RAG (CRAG) with LangGraph and Ollama

Build a self-correcting CRAG workflow in LangGraph that grades retrieved documents, rewrites weak queries, and falls back to web search before answering.

Jun 17, 20268 min readFollow

Topics You Will Master

Grading retrieved documents for relevance with a structured-output LLM
Rewriting a weak query to be more specific and keyword-rich
Falling back to web search when the vector store has no answer
Wiring a conditional edge that branches between answering and correcting

Corrective RAG (CRAG) (Yan et al., 2024) adds a self-correction step to RAG. After retrieval, an LLM grades whether the documents can actually answer the question. If they can, it generates the answer. If they cannot, it rewrites the query and falls back to web search before answering. A single retry cycle prevents infinite loops.

This lesson builds on the `retrieve_docs` and `web_search` tools from earlier in the series.

Prerequisites: The scripts/my_tools.py tools from RAG Data Retrieval and Re-Ranking. Ollama running with qwen3, plus the packages below.

BASH
pip install -U langgraph langchain-ollama langchain-core pydantic ddgs
ollama pull qwen3
95% OFF

Private Agentic RAG with LangGraph and Ollama

Step-by-step guide to building private, self-correcting RAG systems with LangGraph, ChromaDB, and local models like Qwen3 and gpt-oss.

Enroll Now — 95% OFF →

State and Setup

The shared CRAG state: messages, retrieved documents, the relevance grade, and the rewritten query

The state tracks the retrieved documents, the graded relevance, and the rewritten query alongside the message list.

PYTHON
from typing_extensions import TypedDict, Annotated
import operator, os

from langgraph.graph import StateGraph, START, END
from langchain_ollama import ChatOllama
from langchain_core.messages import HumanMessage, SystemMessage
from pydantic import BaseModel, Field
from scripts import my_tools

llm = ChatOllama(model="qwen3", base_url="http://localhost:11434")

class AgentState(TypedDict):
    messages: Annotated[list, operator.add]
    retrieved_docs: str
    is_relevant: bool
    rewritten_query: str

Retrieve Node

The first node fetches documents from the vector store using the user's question.

PYTHON
def retrieve_node(state: AgentState):
    user_question = state['messages'][-1].content
    result = my_tools.retrieve_docs.invoke({'query': user_question, 'k': 5})

    with open('debug_logs/crag_retrieved_docs.md', 'w', encoding='utf-8') as f:
        f.write(f"Query: {user_question}\n\n")
        f.write(result)

    return {'retrieved_docs': result}

Document Grading

An LLM grading whether the retrieved documents can answer the question

A structured-output grader returns a boolean plus its reasoning — the heart of the "corrective" step.

PYTHON
class GradeDecision(BaseModel):
    is_relevant: bool = Field(description="True if documents are relevant to answer the question, False if irrelevant")
    reasoning: str = Field(description="Brief explanation of why documents are relevant or not.")

def grade_node(state: AgentState):
    llm_structured = llm.with_structured_output(GradeDecision)
    user_question = state['messages'][-1].content
    retrieved_docs = state.get('retrieved_docs', '')

    prompt = f"""You are a document relevance grader.

                TASK: Evaluate if the retrieved documents are relevant to answer the user's question.

                USER QUESTION: {user_question}

                RETRIEVED DOCUMENTS:
                {retrieved_docs}

                GRADING CRITERIA:
                - is_relevant = True: If documents contain information that can answer the question
                - is_relevant = False: If documents are completely irrelevant or off-topic"""

    response = llm_structured.invoke(prompt)
    print(f"[GRADE] Relevant: {response.is_relevant}\nReasoning: {response.reasoning}")
    return {'is_relevant': response.is_relevant}

Rewrite and Web-Search Nodes

The correction path: rewrite the query, search the web, then answer

When grading fails, the rewrite node sharpens the query with financial keywords and the web-search node runs it through DuckDuckGo.

PYTHON
def rewrite_query_node(state: AgentState):
    user_question = state['messages'][-1].content

    prompt = f"""You are a query rewriting expert.

TASK: Rewrite the user's question to make it more specific and targeted for document retrieval.

ORIGINAL QUESTION: {user_question}

INSTRUCTIONS:
- Make the query more specific with keywords
- Add relevant financial terms (revenue, profit, earnings, etc.)
- Include company names, years, quarters if mentioned

Output ONLY the rewritten query, nothing else."""

    rewritten_query = llm.invoke(prompt).content.strip()
    print(f"[REWRITE] New: {rewritten_query}")
    return {'rewritten_query': rewritten_query}

def web_search_node(state: AgentState):
    user_question = state['messages'][-1].content
    rewritten_query = state.get("rewritten_query", user_question)
    result = my_tools.web_search.invoke({'query': rewritten_query})

    with open('debug_logs/crag_retry_websearch_docs.md', 'w', encoding='utf-8') as f:
        f.write(f"Rewritten Query: {rewritten_query}\n\n")
        f.write(result)

    return {'retrieved_docs': result}

Answer Node and Router

The answer node writes a detailed, cited response from whatever documents are in state — vector-store or web. The router decides between answering and rewriting based on the grade.

PYTHON
def answer_node(state: AgentState):
    user_question = state['messages'][-1].content
    retrieved_docs = state.get('retrieved_docs', '')

    prompt = f"""You are an expert financial analyst.

            TASK: Provide a detailed answer (200-300 words) using the retrieved documents.
            Use MARKDOWN, inline citations [1], [2], and a References section at the end.

            User Question: {user_question}

            Retrieved Documents:
            {retrieved_docs}"""

    response = llm.invoke(prompt)
    return {'messages': [response]}

def should_rewrite(state: AgentState):
    if state.get('is_relevant', True):
        print("[ROUTER] Document is relevant - proceeding to answer")
        return "answer"
    else:
        print("[ROUTER] Documents are not relevant - rewriting the user query")
        return 'rewrite'

Building the Graph

The CRAG workflow: relevant documents answer directly, irrelevant ones trigger a rewrite and web search

The conditional edge after grading branches to either answer or the rewrite → web_search → answer correction path.

PYTHON
def create_crag_agent():
    builder = StateGraph(AgentState)

    builder.add_node('retriever', retrieve_node)
    builder.add_node('grade', grade_node)
    builder.add_node('rewrite', rewrite_query_node)
    builder.add_node('web_search', web_search_node)
    builder.add_node('answer', answer_node)

    builder.add_edge(START, 'retriever')
    builder.add_edge('retriever', 'grade')
    builder.add_conditional_edges('grade', should_rewrite, ['rewrite', 'answer'])
    builder.add_edge('rewrite', 'web_search')
    builder.add_edge('web_search', 'answer')
    builder.add_edge('answer', END)

    return builder.compile()

agent = create_crag_agent()

Testing CRAG

When the documents are present, the grader passes and the agent answers directly:

PYTHON
query = "what is amazon's revenue in 2023?"
result = agent.invoke({'messages': [HumanMessage(query)]})
OUTPUT
[RETRIEVE NODE] Fetching documents
[TOOL] retrieve_docs called
[QUERY] what is amazon's revenue in 2023?
[RETRIEVED] 5 documents
[GRADE] Relevant: True
Reasoning: Documents 3 and 4 explicitly state Amazon's consolidated net sales (revenue) for 2023 as $574,785 million, directly answering the question.
[ROUTER] Document is relevant - proceeding to answer

For a company not in the vector store (Tesla), retrieval finds nothing, the grader marks it irrelevant, and CRAG corrects by rewriting and searching the web:

PYTHON
query = "what is Tesla revenue in 2023?"
result = agent.invoke({'messages': [HumanMessage(query)]})
OUTPUT
[RETRIEVE NODE] Fetching documents
[TOOL] retrieve_docs called
[QUERY] what is Tesla revenue in 2023?
Either No doc or keywords found!
[RETRIEVED] 0 documents
[GRADE] Relevant: False
Reasoning: No documents were retrieved to answer the question about Tesla's 2023 revenue.
[ROUTER] Documents are not relevant - rewriting the user query
[REWRITE] New: What is Tesla Inc.'s annual revenue for 2023 as reported in its 2023 annual financial report?

Warning

Web-sourced answers can include figures the model assembled from general knowledge rather than the cited page. CRAG fills coverage gaps, but always treat web-fallback numbers as less authoritative than the filings in your vector store.

The next lesson, Reflexion Agentic RAG, replaces the single retry with an iterative draft-and-revise loop.


What You Built

In this lesson you built a Corrective RAG agent:

  • Relevance grader — a structured-output GradeDecision decides whether retrievals can answer the question
  • Query rewriter — sharpens weak queries with specific financial keywords
  • Web-search fallback — DuckDuckGo fills gaps the vector store cannot
  • Branching routershould_rewrite sends relevant docs straight to the answer, irrelevant ones to correction
  • Single retry cycleretriever → grade → rewrite → web_search → answer avoids infinite loops

CRAG turns a fragile RAG pipeline into one that detects its own failures and recovers from them.

Found this useful? Keep building with me.

New tutorials every week on YouTube — or go deeper with a full structured course.

Find this tutorial useful?

Subscribe to our YouTube channels for more practical production walk-throughs.

Discussion & Comments