Standard RAG always retrieves — it passes every query through the vector store and feeds the top-k chunks to the LLM, whether or not the documents are relevant. Agentic RAG is smarter: the agent decides whether to call the retrieval tool at all. If the query is off-topic (e.g., "tell me 3 facts about Earth"), the agent answers from its own knowledge without touching the vector store. If the query is health-related, the agent calls retrieve_context, gets the relevant chunks, cites the sources, and synthesizes a grounded answer.
Prerequisites: The health_supplements/ FAISS vector store saved in Vector Stores and Retrievals. langchain, langchain-ollama, langchain-community, langchain-core, faiss-cpu installed. Ollama running with qwen3 and nomic-embed-text.
pip install -U langchain langchain-ollama langchain-community langchain-core faiss-cpu
ollama pull nomic-embed-text
ollama pull qwen3
Setup
import os
import warnings
os.environ['KMP_DUPLICATE_LIB_OK'] = 'True'
warnings.filterwarnings("ignore")
from langchain_core.tools import tool
from langchain.agents import create_agent
from langchain_ollama import ChatOllama, OllamaEmbeddings
from langchain_community.vectorstores import FAISS
from dotenv import load_dotenv
load_dotenv()
True
LLM and Vector Store Setup
llm = ChatOllama(
model="qwen3",
base_url="http://localhost:11434"
)
embeddings = OllamaEmbeddings(
model="nomic-embed-text",
base_url="http://localhost:11434"
)
db_name = "./../09. Vector Stores and Retrievals/health_supplements"
vector_store = FAISS.load_local(
db_name,
embeddings,
allow_dangerous_deserialization=True
)
Test the LLM connection:
llm.invoke("Hello")
AIMessage(content='Hello! How can I assist you today? 😊', ...)
Verifying the Vector Store
Before building the agent, verify the store contains the expected documents and responds to queries:
print("\n🔍 Testing Vector Store Connection...")
doc_count = vector_store.index.ntotal
print(f"✔ Vector store found with {doc_count} documents")
test_query = "creatine"
results = vector_store.similarity_search(test_query, k=3)
print(f"\n✔ Sample search for '{test_query}':")
for i, doc in enumerate(results, 1):
source = doc.metadata.get('source', '?')
page = doc.metadata.get('page', '?')
preview = doc.page_content[:100]
print(f" {i}. Source: {source} (Page {page}): {preview}...")
🔍 Testing Vector Store Connection...
✔ Vector store found with 311 documents
✔ Sample search for 'creatine':
1. Source: rag-dataset\gym supplements\1. Analysis of Actual Fitness Supplement.pdf (Page 0): acids than traditional protein sources. Its numerous benefits have made it a popular choice for snac...
2. Source: rag-dataset\gym supplements\1. Analysis of Actual Fitness Supplement.pdf (Page 1): Foods 2024, 13, 1424\n2 of 21\nand sports industry, evidence suggests that creatine can benefit not on...
3. Source: rag-dataset\gym supplements\2. High Prevalence of Supplement Intake.pdf (Page 10): supplements such as creatine or beta-alanine were used only once a week, which cannot be effective...
311 documents indexed. All three results are correctly sourced from the gym supplements research papers.
Retrieval Tool
The retrieve_context tool wraps the FAISS similarity search as an agent-callable function. The docstring tells the agent when to use it — specifically for health-related queries:
@tool()
def retrieve_context(query: str):
"""Retrieve relevant information for health related queries from the document to answer the query.
"""
print(f"🔍 Searching: '{query}'")
docs = vector_store.similarity_search(query, k=4)
content = "\n\n".join(
f"Source: {doc.metadata.get('source', '?')} (Page {doc.metadata.get('page', '?')}): {doc.page_content}"
for doc in docs
)
print(f"✔ Found {len(docs)} relevant chunks")
return content
Test the retrieval tool directly:
result = retrieve_context.invoke("What is the use of BCAA?")
🔍 Searching: 'What is the use of BCAA?'
✔ Found 4 relevant chunks
The tool returns a formatted string of 4 source-annotated chunks — exactly what the agent will inject into the conversation as a ToolMessage.
Creating the Agentic RAG
Define the system prompt and create the agent with the single retrieval tool:
tools = [retrieve_context]
system_prompt = """You are a research assistant with a document retrieval tool.
Tool:
- retrieve_context: Search the document for the health related question
Cite page numbers and reference document while writing the answer and be thorough."""
rag_agent = create_agent(llm, tools, system_prompt=system_prompt)
rag_agent
<langgraph.graph.state.CompiledStateGraph object at 0x0000020D5CD74350>
Invoking the Agent
result = rag_agent.invoke({'messages': "What is the use of BCAA?"})
🔍 Searching: 'use of BCAA'
✔ Found 4 relevant chunks
The agent automatically decided to call retrieve_context and executed the search. Inspect the full message trajectory:
result
{'messages': [
HumanMessage(content='What is the use of BCAA?', ...),
AIMessage(content='', ..., tool_calls=[{'name': 'retrieve_context', 'args': {'query': 'use of BCAA'}, ...}], ...),
ToolMessage(content='Source: rag-dataset\\gym supplements\\1. Analysis of Actual Fitness Supplement.pdf (Page 1):
Foods 2024, 13, 1424\n2 of 21\nand sports industry, evidence suggests that creatine can benefit not only athletes
but also the elderly and the general population [6]. Branched-chain amino acids (BCAA) also offer a plethora
of benefits for consumers. As explained by Sanz et al. [7], BCAAs are stored directly in muscles and serve
as the raw materials needed to build new muscle...
\n\nSource: rag-dataset\\health supplements\\3.health_supplements_side_effects.pdf (Page 7):
...DMAA-containing supplements...', name='retrieve_context', ...),
AIMessage(content='Branched-Chain Amino Acids (BCAA) are primarily used for their role in muscle metabolism and recovery...', ...)
]}
Print the final answer:
result['messages'][-1].pretty_print()
================================== Ai Message ==================================
Branched-Chain Amino Acids (BCAA) are primarily used for their role in muscle metabolism and recovery. According to the document **"1. Analysis of Actual Fitness Supplement.pdf" (Page 1)**, BCAAs are stored directly in muscles and serve as raw materials for building new muscle tissue. This process supports muscle strengthening and reduces post-workout soreness. Consumers often incorporate BCAA supplements into their routines to optimize fitness outcomes and enhance overall well-being [1].
The document also highlights that BCAAs are part of the sports supplement industry, which underscores their popularity among athletes and fitness enthusiasts aiming to improve performance and recovery [7].
**References:**
- Page 1 of *1. Analysis of Actual Fitness Supplement.pdf* (Foods 2024, 13, 1424).
The answer correctly cites the page number and document source — exactly as instructed by the system prompt.
Testing Agent Boundary Behaviour
The key advantage of Agentic RAG is that the agent only calls the retrieval tool when the query is health-related. For off-topic questions, it answers directly:
ask("tell me 3 facts about Earth?")
============================================================
Question: tell me 3 facts about Earth?
============================================================
💼 Answer:
The query about Earth is not related to health, so I cannot use the provided tool to retrieve information. Here are three general facts about Earth:
1. **Third Planet from the Sun**: Earth is the third planet in our solar system and the only known celestial body to support life.
2. **Unique Atmosphere**: Its atmosphere, rich in nitrogen and oxygen, along with the presence of liquid water, creates conditions suitable for life.
3. **Dynamic Geology**: Earth is geologically active, with processes like plate tectonics, volcanism, and erosion shaping its surface over time.
Let me know if you'd like health-related information!
No retrieval tool was called. The agent answered from its general knowledge and correctly explained why it couldn't use the tool for this query.
Streaming ask() Helper
Build a helper that streams the agent's reasoning step-by-step — showing tool calls as they happen and the final answer as it arrives:
def ask(question: str):
"""Ask the agentic RAG a question."""
print(f"\n{'='*60}")
print(f"Question: {question}")
print('='*60)
for event in rag_agent.stream(
{"messages": [{"role": "user", "content": question}]},
stream_mode="values"
):
msg = event["messages"][-1]
# Show tool usage
if hasattr(msg, 'tool_calls') and msg.tool_calls:
for tc in msg.tool_calls:
print("Tool Call: ")
print(f"\n🔺 Using: {tc['name']} with {tc['args']}")
# Show final answer
elif hasattr(msg, 'content') and msg.content:
print(f"\n💼 Answer:\n{msg.content}")
Test with a health query:
ask("how to gain muscle mass?")
============================================================
Question: how to gain muscle mass?
============================================================
💼 Answer:
how to gain muscle mass?
Tool Call:
🔺 Using: retrieve_context with {'query': 'how to gain muscle mass'}
🔍 Searching: 'how to gain muscle mass'
✔ Found 4 relevant chunks
💼 Answer:
Source: rag-dataset\gym supplements\2. High Prevalence of Supplement Intake.pdf (Page 8):
and strength gain among men. We detected more prevalent protein and creatine supplementation
among younger compared to older fitness center users...
Creatine monohydrate is another well-known supplement used to gain muscle mass
and support performance and recovery. It is known not to increase fat mass...
[Final synthesized answer from retrieved chunks with source citations]
Interactive Chat Loop
For a full conversational experience, wrap ask() in a loop:
def chat():
"""Start interactive chat with the agentic RAG."""
print("\n🤖 Agentic RAG Chat - Type 'quit', 'q', or 'exit' to exit")
while True:
question = input("\nYour question: ").strip()
if question.lower() in ['quit', 'exit', 'q']:
break
if question:
ask(question)
chat()
Example session output:
🤖 Agentic RAG Chat - Type 'quit, q or exit' to exit
============================================================
Question: tell me about sun?
============================================================
💼 Answer:
The sun is a star at the center of our solar system, composed primarily of hydrogen (about 75%)
and helium (about 25%)... If you were asking about health-related aspects of the sun
(e.g., sunlight's role in vitamin D synthesis or skin cancer risks), I could retrieve specific
health information. Let me know if you'd like to focus on a specific aspect!
============================================================
Question: tell me about the protein?
============================================================
💼 Answer:
tell me about the protein?
Tool Call:
🔺 Using: retrieve_context with {'query': 'protein'}
🔍 Searching: 'protein'
✔ Found 4 relevant chunks
💼 Answer:
Source: rag-dataset\health supplements\3.health_supplements_side_effects.pdf (Page 3):
PROTEIN POWDERS AND INFANT FORMULA
Protein powders consisting of the dairy proteins casein, whey and of vegetable proteins in soy
protein isolate (SPI) are popular supplements among athletes and body builders...
The agent correctly distinguishes between general ("tell me about the sun") and health-related ("tell me about the protein") queries — only calling the retrieval tool for the latter.
Standard RAG vs. Agentic RAG
| Feature | Standard RAG | Agentic RAG |
|---|---|---|
| Retrieval | Always retrieves for every query | Retrieves only when query is relevant |
| Off-topic handling | May return irrelevant context | Answers from general knowledge |
| Architecture | LCEL chain — fixed retrieval → prompt → LLM | Agent loop — LLM decides whether to retrieve |
| Source citation | Manual (format_docs + prompt instructions) | Agent-driven (docstring instructs citation) |
| Flexibility | Single tool (retriever) | Can combine multiple tools (retriever + web search + calculator) |
| Complexity | Simple, predictable | Autonomous, adaptive |
What You Built
In this lesson you built a complete Agentic RAG system:
- Vector store verification —
index.ntotalcheck + sample similarity search before building the agent - Retrieval tool —
@tool-decoratedretrieve_context()that wraps FAISS similarity search into an agent-callable function - Agentic RAG —
create_agent(llm, [retrieve_context], system_prompt=...)— the agent autonomously decides when to retrieve - Boundary testing — health queries trigger retrieval with source citations; off-topic queries are answered directly
- Streaming ask() — step-by-step streaming helper showing tool calls and final answers
- Interactive chat() — persistent conversational loop with
quit/exittermination