Building AI Agents in LangChain involves more than binding tools to models. Production-grade systems require robust conversation persistence, execution boundaries, data privacy guards, structured data mapping, and real-time streaming interfaces.
This bootcamp walks through the creation of a financial analysis agent equipped with SQLite memory, built-in security, planning middleware, and multi-format token streaming.
Building a Basic Agent
First, initialize the LLM model, map your tools, and compile a stateless agent:
import os
from dotenv import load_dotenv
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.agents import create_agent
from langchain.messages import SystemMessage, HumanMessage
from scripts import base_tools
load_dotenv()
model = ChatGoogleGenerativeAI(model='gemini-2.5-flash')
system_prompt = (
"You are a financial analyst specializing in tech stocks.\n"
"Provide data-driven analysis with clear insights. "
"You have access to web_search tools and get_weather tools."
)
agent = create_agent(
model=model,
tools=[base_tools.web_search, base_tools.get_weather],
system_prompt=system_prompt
)
Test stateless tool execution:
query = "what's apple's current stock price? and what is the latest weather in Mumbai?"
response = agent.invoke({'messages': [HumanMessage(query)]})
print(response['messages'][-1].text)
Apple's current stock price is $277.89, with an after-hours price of $277.16 (as of December 8, 2025).
The latest weather in Mumbai is overcast, with a temperature of 34.2°C (93.6°F) and 18% humidity.
Short-Term Memory with SQLite
To carry context across multiple conversational turns, bind a SQLite checkpointer to preserve state within active threads.
from langgraph.checkpoint.sqlite import SqliteSaver
import sqlite3
# Set up SQLite database
conn = sqlite3.connect("data/financial_agent.db", check_same_thread=False)
checkpointer = SqliteSaver(conn=conn)
agent_memory = create_agent(
model=model,
tools=[base_tools.web_search, base_tools.get_weather],
system_prompt=system_prompt,
checkpointer=checkpointer
)
Run a persistent session by providing a thread configuration:
config = {"configurable": {"thread_id": "memory_session"}}
# Initial query
agent_memory.invoke({'messages': [HumanMessage("what's apple's current stock price?")]}, config=config)
# Follow-up query
response = agent_memory.invoke({'messages': [HumanMessage("tell me about the weather in Mumbai too")]}, config=config)
print(response['messages'][-1].text)
The latest weather in Mumbai is overcast, with a temperature of 34.2°C (93.6°F) and 18% humidity.
Conversational Middleware
LangChain middleware intercept the execution loop, allowing developers to inject summarizers, call budget constraints, fallback strategies, and data scrubbers.
Summarization Middleware
When the chat history becomes too long, this middleware automatically condenses past messages using a secondary summarization pass:
from langchain.agents.middleware import SummarizationMiddleware
agent_summary = create_agent(
model=model,
tools=[base_tools.web_search, base_tools.get_weather],
system_prompt=system_prompt,
checkpointer=checkpointer,
middleware=[
SummarizationMiddleware(
model=ChatGoogleGenerativeAI(model='gemini-2.5-flash'),
trigger=[("messages", 15)], # Summarizes when messages count reaches 15
keep=("messages", 5) # Retains the last 5 messages in raw format
)
]
)
Execution and Call Limits
To prevent runaway agent loops and unexpected API billings, establish call limits and define fallback models:
from langchain.agents.middleware import ModelCallLimitMiddleware
from langchain.agents.middleware import ToolCallLimitMiddleware
from langchain.agents.middleware import ModelFallbackMiddleware
agent_limit = create_agent(
model=model,
tools=[base_tools.web_search, base_tools.get_weather],
system_prompt=system_prompt,
checkpointer=checkpointer,
middleware=[
# Hard limits on LLM generation iterations
ModelCallLimitMiddleware(run_limit=2, exit_behavior="end"),
# Limits on tool call counts (continue skips tool output injection)
ToolCallLimitMiddleware(run_limit=2, exit_behavior='continue'),
# Fallback model if primary generation encounters an exception
ModelFallbackMiddleware(ChatGoogleGenerativeAI(model='gemini-3-pro-preview'))
]
)
PII Guardrails and Redaction
Identify and redact sensitive user data (emails, credit card strings, or custom regular expressions like API keys) before they reach the model:
from langchain.agents.middleware import PIIMiddleware
agent_pii = create_agent(
model=model,
tools=[base_tools.web_search, base_tools.get_weather],
system_prompt=system_prompt,
checkpointer=checkpointer,
middleware=[
# Blocks the entire execution if a Gemini/OpenAI style API key is leaked
PIIMiddleware("api_key", detector=r"sk-[a-zA-Z0-9]{32}", strategy="block"),
# Redacts email strings from inputs
PIIMiddleware("email", strategy="redact", apply_to_input=True),
# Replaces credit card patterns with asterisks
PIIMiddleware("credit_card", strategy="mask", apply_to_input=True),
# Redacts URLs from inputs
PIIMiddleware("url", strategy="redact", apply_to_input=True)
]
)
Verify PII redaction at runtime:
config = {'configurable': {'thread_id': 'pii_session'}}
query = "Hi, my name is John. Here is my email: info@kgptalkie.com"
response = agent_pii.invoke({'messages': [HumanMessage(query)]}, config=config)
print(response['messages'][0].content)
Hi, my name is John. Here is my email: [REDACTED_EMAIL]
Todo List Planner
The TodoListMiddleware automatically decomposes complex instructions into a visual todo list checklist, tracing tasks sequentially through 'in_progress', 'pending', and 'completed' states.
from langchain.agents.middleware import TodoListMiddleware
agent_todo = create_agent(
model=model,
tools=[base_tools.web_search, base_tools.get_weather],
system_prompt=system_prompt,
checkpointer=checkpointer,
middleware=[TodoListMiddleware()]
)
Output Formats and Streaming Modes
Streaming Execution Modes
LangChain agents support three streaming behaviors:
messages: Yields message chunks as they are generated by the model.updates: Yields state updates after each tool or model step completes.values: Yields the entire state values array at every transition.
config = {'configurable': {'thread_id': 'stream_session'}}
for chunk in agent.stream({'messages':['tell me about apple news']}, stream_mode='messages', config=config):
print(chunk)
print("------\n")
Structured Schema Responses
Enforce type-safe structured JSON outputs from your agent using Pydantic models:
from pydantic import BaseModel, Field
from typing import Optional
class FinancialAnalysis(BaseModel):
company: str = Field(description="Company Name")
stock_symbol: str = Field(description="Company Stock Symbol")
current_price: Optional[str] = Field(description="Company's current stock price")
analysis: str = Field(description="Company's brief analysis")
recommendation: str = Field(description="Action recommendation: Buy/Hold/Sell")
agent_structured = create_agent(
model=model,
tools=[base_tools.web_search, base_tools.get_weather],
system_prompt=system_prompt,
response_format=FinancialAnalysis
)
Invoke and verify structured dictionary serialization:
response = agent_structured.invoke({'messages': [HumanMessage('Analyze the apple stock')]})
print(response['structured_response'].model_dump())
{'company': 'Apple Inc.', 'stock_symbol': 'AAPL', 'current_price': '$283.10', 'analysis': 'Apple Inc. exhibits a strong ecosystem of hardware, software, and services...', 'recommendation': 'Hold'}