Agentic RAG with LangChain, FAISS, and Ollama

Standard RAG always retrieves. It passes every query through the vector store and feeds the top-k chunks to the LLM, whether or not the documents are relevant. Agentic RAG is smarter: the agent decides whether to call the retrieval tool at all. If the query is off-topic (e.g., "tell me 3 facts about Earth"), the agent answers from its own knowledge without touching the vector store. If the query is health-related, the agent calls retrieve_context, gets the relevant chunks, cites the sources, and writes a grounded answer. In this lesson, we will build exactly this: an agent that owns its retriever.

Prerequisites: The health_supplements/ FAISS vector store saved in Vector Stores and Retrievals. langchain, langchain-ollama, langchain-community, langchain-core, faiss-cpu installed. Ollama running with qwen3 and nomic-embed-text.

BASH

pip install -U langchain langchain-ollama langchain-community langchain-core faiss-cpu
ollama pull nomic-embed-text
ollama pull qwen3

Setup

PYTHON

import os
import warnings
os.environ['KMP_DUPLICATE_LIB_OK'] = 'True'
warnings.filterwarnings("ignore")

from langchain_core.tools import tool
from langchain.agents import create_agent
from langchain_ollama import ChatOllama, OllamaEmbeddings
from langchain_community.vectorstores import FAISS
from dotenv import load_dotenv

load_dotenv()

OUTPUT

True

LLM and Vector Store Setup

PYTHON

llm = ChatOllama(
    model="qwen3",
    base_url="http://localhost:11434"
)

embeddings = OllamaEmbeddings(
    model="nomic-embed-text",
    base_url="http://localhost:11434"
)

db_name = "./../09. Vector Stores and Retrievals/health_supplements"
vector_store = FAISS.load_local(
    db_name,
    embeddings,
    allow_dangerous_deserialization=True
)

Let's test the LLM connection:

PYTHON

llm.invoke("Hello")

PYTHON

AIMessage(content='Hello! How can I assist you today? 😊', ...)

Is the Vector Store Ready?

Before building the agent, let's verify the store contains the expected documents and responds to queries:

PYTHON

print("\n🔍 Testing Vector Store Connection...")

doc_count = vector_store.index.ntotal
print(f"✔ Vector store found with {doc_count} documents")

test_query = "creatine"
results = vector_store.similarity_search(test_query, k=3)
print(f"\n✔ Sample search for '{test_query}':")
for i, doc in enumerate(results, 1):
    source = doc.metadata.get('source', '?')
    page = doc.metadata.get('page', '?')
    preview = doc.page_content[:100]
    print(f"  {i}. Source: {source} (Page {page}): {preview}...")

OUTPUT

🔍 Testing Vector Store Connection...
✔ Vector store found with 311 documents

✔ Sample search for 'creatine':
  1. Source: rag-dataset\gym supplements\1. Analysis of Actual Fitness Supplement.pdf (Page 0): acids than traditional protein sources. Its numerous benefits have made it a popular choice for snac...
  2. Source: rag-dataset\gym supplements\1. Analysis of Actual Fitness Supplement.pdf (Page 1): Foods 2024, 13, 1424\n2 of 21\nand sports industry, evidence suggests that creatine can benefit not on...
  3. Source: rag-dataset\gym supplements\2. High Prevalence of Supplement Intake.pdf (Page 10): supplements such as creatine or beta-alanine were used only once a week, which cannot be effective...

Here, we can see 311 documents indexed, and all three results come from the gym supplements research papers.

How Do We Turn the Retriever into a Tool?

The retrieve_context tool wraps the FAISS similarity search as an agent-callable function. The docstring tells the agent when to use it: specifically, for health-related queries:

Diagram showing the FAISS retriever wrapped as a @tool so the agent can call it autonomously

The FAISS retriever is wrapped as a @tool, so the agent can call it autonomously when needed.

PYTHON

@tool()
def retrieve_context(query: str):
    """Retrieve relevant information for health related queries from the document to answer the query.

    """
    print(f"🔍 Searching: '{query}'")

    docs = vector_store.similarity_search(query, k=4)

    content = "\n\n".join(
        f"Source: {doc.metadata.get('source', '?')} (Page {doc.metadata.get('page', '?')}): {doc.page_content}"
        for doc in docs
    )

    print(f"✔ Found {len(docs)} relevant chunks")
    return content

Let's test the retrieval tool directly:

PYTHON

result = retrieve_context.invoke("What is the use of BCAA?")

OUTPUT

🔍 Searching: 'What is the use of BCAA?'
✔ Found 4 relevant chunks

The tool returns a formatted string of 4 source-tagged chunks. This is exactly what the agent will add to the conversation as a ToolMessage.

How Do We Create the Agentic RAG?

Now, we define the system prompt and create the agent with the single retrieval tool:

Diagram of the agentic RAG stack combining Ollama embeddings, a persisted FAISS index, a @tool retriever, and create_agent

Embeddings + persisted FAISS + a @tool retriever + create_agent together form the agentic RAG stack.

PYTHON

tools = [retrieve_context]

system_prompt = """You are a research assistant with a document retrieval tool.

                    Tool:
                    - retrieve_context: Search the document for the health related question

                    Cite page numbers and reference document while writing the answer and be thorough."""

rag_agent = create_agent(llm, tools, system_prompt=system_prompt)

PYTHON

rag_agent

OUTPUT

<langgraph.graph.state.CompiledStateGraph object at 0x0000020D5CD74350>

Invoking the Agent

PYTHON

result = rag_agent.invoke({'messages': "What is the use of BCAA?"})

OUTPUT

🔍 Searching: 'use of BCAA'
✔ Found 4 relevant chunks

The agent decided on its own to call retrieve_context and ran the search. Let's inspect the full message journey:

PLAINTEXT

result

PYTHON

{'messages': [
  HumanMessage(content='What is the use of BCAA?', ...),
  AIMessage(content='', ..., tool_calls=[{'name': 'retrieve_context', 'args': {'query': 'use of BCAA'}, ...}], ...),
  ToolMessage(content='Source: rag-dataset\\gym supplements\\1. Analysis of Actual Fitness Supplement.pdf (Page 1):
    Foods 2024, 13, 1424\n2 of 21\nand sports industry, evidence suggests that creatine can benefit not only athletes
    but also the elderly and the general population [6]. Branched-chain amino acids (BCAA) also offer a plethora
    of benefits for consumers. As explained by Sanz et al. [7], BCAAs are stored directly in muscles and serve
    as the raw materials needed to build new muscle...
    \n\nSource: rag-dataset\\health supplements\\3.health_supplements_side_effects.pdf (Page 7):
    ...DMAA-containing supplements...', name='retrieve_context', ...),
  AIMessage(content='Branched-Chain Amino Acids (BCAA) are primarily used for their role in muscle metabolism and recovery...', ...)
]}

Let's print the final answer:

PYTHON

result['messages'][-1].pretty_print()

OUTPUT

================================== Ai Message ==================================

Branched-Chain Amino Acids (BCAA) are primarily used for their role in muscle metabolism and recovery. According to the document **"1. Analysis of Actual Fitness Supplement.pdf" (Page 1)**, BCAAs are stored directly in muscles and serve as raw materials for building new muscle tissue. This process supports muscle strengthening and reduces post-workout soreness. Consumers often incorporate BCAA supplements into their routines to optimize fitness outcomes and enhance overall well-being [1].

The document also highlights that BCAAs are part of the sports supplement industry, which underscores their popularity among athletes and fitness enthusiasts aiming to improve performance and recovery [7].

**References:**
- Page 1 of *1. Analysis of Actual Fitness Supplement.pdf* (Foods 2024, 13, 1424).

Here, we can see the answer cites the page number and the document source, exactly as the system prompt asked.

Streaming `ask()` Helper

Now, we build a small helper that streams the agent's reasoning step by step. It shows tool calls as they happen and the final answer as it arrives:

PYTHON

def ask(question: str):
    """Ask the agentic RAG a question."""
    print(f"\n{'='*60}")
    print(f"Question: {question}")
    print('='*60)

    for event in rag_agent.stream(
        {"messages": [{"role": "user", "content": question}]},
        stream_mode="values"
    ):
        msg = event["messages"][-1]

        # Show tool usage
        if hasattr(msg, 'tool_calls') and msg.tool_calls:
            for tc in msg.tool_calls:
                print("Tool Call: ")
                print(f"\n🔺 Using: {tc['name']} with {tc['args']}")

        # Show final answer
        elif hasattr(msg, 'content') and msg.content:
            print(f"\n💼 Answer:\n{msg.content}")

Let's test it with a health query:

PYTHON

ask("how to gain muscle mass?")

OUTPUT

============================================================
Question: how to gain muscle mass?
============================================================

💼 Answer:
how to gain muscle mass?
Tool Call:

🔺 Using: retrieve_context with {'query': 'how to gain muscle mass'}
🔍 Searching: 'how to gain muscle mass'
✔ Found 4 relevant chunks

💼 Answer:
Source: rag-dataset\gym supplements\2. High Prevalence of Supplement Intake.pdf (Page 8):
and strength gain among men. We detected more prevalent protein and creatine supplementation
among younger compared to older fitness center users...
Creatine monohydrate is another well-known supplement used to gain muscle mass
and support performance and recovery. It is known not to increase fat mass...

[Final synthesized answer from retrieved chunks with source citations]

What Happens with Off-Topic Questions?

Here comes the key advantage of Agentic RAG: the agent only calls the retrieval tool when the query is health-related. For off-topic questions, it answers directly:

Diagram showing on-topic queries triggering retrieval while off-topic queries are answered from the model's training data

On-topic queries trigger retrieval; off-topic queries are answered directly from training data.

PYTHON

ask("tell me 3 facts about Earth?")

OUTPUT

============================================================
Question: tell me 3 facts about Earth?
============================================================

💼 Answer:
The query about Earth is not related to health, so I cannot use the provided tool to retrieve information. Here are three general facts about Earth:

1. **Third Planet from the Sun**: Earth is the third planet in our solar system and the only known celestial body to support life.
2. **Unique Atmosphere**: Its atmosphere, rich in nitrogen and oxygen, along with the presence of liquid water, creates conditions suitable for life.
3. **Dynamic Geology**: Earth is geologically active, with processes like plate tectonics, volcanism, and erosion shaping its surface over time.

Let me know if you'd like health-related information!

Here, we can see no retrieval tool was called. The agent answered from its general knowledge and even explained why it did not use the tool.

Interactive Chat Loop

For a full conversation, we wrap ask() in a loop:

PYTHON

def chat():
    """Start interactive chat with the agentic RAG."""
    print("\n🤖 Agentic RAG Chat - Type 'quit', 'q', or 'exit' to exit")

    while True:
        question = input("\nYour question: ").strip()
        if question.lower() in ['quit', 'exit', 'q']:
            break
        if question:
            ask(question)

chat()

Example session output:

PLAINTEXT

🤖 Agentic RAG Chat - Type 'quit, q or exit' to exit

============================================================
Question: tell me about sun?
============================================================

💼 Answer:
The sun is a star at the center of our solar system, composed primarily of hydrogen (about 75%)
and helium (about 25%)... If you were asking about health-related aspects of the sun
(e.g., sunlight's role in vitamin D synthesis or skin cancer risks), I could retrieve specific
health information. Let me know if you'd like to focus on a specific aspect!

============================================================
Question: tell me about the protein?
============================================================

💼 Answer:
tell me about the protein?
Tool Call:

🔺 Using: retrieve_context with {'query': 'protein'}
🔍 Searching: 'protein'
✔ Found 4 relevant chunks

💼 Answer:
Source: rag-dataset\health supplements\3.health_supplements_side_effects.pdf (Page 3):
PROTEIN POWDERS AND INFANT FORMULA
Protein powders consisting of the dairy proteins casein, whey and of vegetable proteins in soy
protein isolate (SPI) are popular supplements among athletes and body builders...

Here, we can see the agent telling general questions ("tell me about the sun") apart from health questions ("tell me about the protein"), and calling the retrieval tool only for the second kind.

Standard RAG vs. Agentic RAG

Diagram contrasting standard RAG, which always retrieves, with agentic RAG, which retrieves only when the query is relevant to the documents

Agentic RAG calls the retriever only when the query is relevant to the documents, standard RAG always retrieves.

Feature	Standard RAG	Agentic RAG
Retrieval	Always retrieves for every query	Retrieves only when query is relevant
Off-topic handling	May return irrelevant context	Answers from general knowledge
Architecture	LCEL chain, fixed retrieval → prompt → LLM	Agent loop, LLM decides whether to retrieve
Source citation	Manual (format_docs + prompt instructions)	Agent-driven (docstring instructs citation)
Flexibility	Single tool (retriever)	Can combine multiple tools (retriever + web search + calculator)
Complexity	Simple, predictable	Autonomous, adaptive

What You Built

In this lesson, we built a complete Agentic RAG system:

Vector store verification: index.ntotal check + sample similarity search before building the agent
Retrieval tool: @tool-decorated retrieve_context() that wraps FAISS similarity search into an agent-callable function
Agentic RAG: create_agent(llm, [retrieve_context], system_prompt=...), the agent autonomously decides when to retrieve
Boundary testing: health queries trigger retrieval with source citations; off-topic queries are answered directly
Streaming ask(): step-by-step streaming helper showing tool calls and final answers
Interactive chat(): persistent conversational loop with quit/exit termination

This is how Agentic RAG works. The retriever becomes a tool, the agent decides when to reach for it, and every answer that uses the documents carries its sources.

Agentic RAG with LangChain, FAISS, and Ollama

LangChain & Ollama - Local AI Development

Setup

LLM and Vector Store Setup

Is the Vector Store Ready?

How Do We Turn the Retriever into a Tool?

How Do We Create the Agentic RAG?

Invoking the Agent

Streaming `ask()` Helper

What Happens with Off-Topic Questions?

Interactive Chat Loop

Standard RAG vs. Agentic RAG

What You Built

Found this useful? Keep building with me.

Latest recommendations you might like

LangChain Agents with create_agent

LangChain Expression Language & Chains

LangChain Chat Message Memory

Build Your Own Chatbot with LangChain

Find this tutorial useful?

Discussion & Comments

LangChain & Ollama - Local AI Development

Setup

LLM and Vector Store Setup

Is the Vector Store Ready?

How Do We Turn the Retriever into a Tool?

How Do We Create the Agentic RAG?

Invoking the Agent

Streaming ask() Helper

What Happens with Off-Topic Questions?

Interactive Chat Loop

Standard RAG vs. Agentic RAG

What You Built

Found this useful? Keep building with me.

Latest recommendations you might like

LangChain Agents with create_agent

LangChain Expression Language & Chains

LangChain Chat Message Memory

Build Your Own Chatbot with LangChain

Find this tutorial useful?

Discussion & Comments

Streaming `ask()` Helper