LangChain Agent Fundamentals

An AI agent combines a language model with tools to build a system that reasons, decides, and works toward a solution step by step. In this blog, we cover the fundamentals for the series: the building blocks we need before assembling real projects.

We move from a one-line agent to tools, short- and long-term memory, streaming, production middleware, guardrails, human-in-the-loop approval, and prompt engineering. All examples use create_agent from LangChain with Google's Gemini models.

Note

This lesson assumes your environment is set up. If you have not configured your Gemini and LangSmith keys yet, start with Getting Started with Gemini 3 & LangChain.

What Is an AI Agent?

An agent has three core components:

Model / LLM: the reasoning engine.
System prompt: instructions that guide behavior.
Message history: the conversation context.

An agent combines a model, a system prompt, tools, and message history

Create your first agent with a model and a system prompt:

PYTHON

import os
from dotenv import load_dotenv
load_dotenv()

from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.agents import create_agent
from langchain.messages import HumanMessage

model = ChatGoogleGenerativeAI(model='gemini-2.5-flash')

system_prompt = "You are a helpful assistant that provides concise and accurate responses."

agent = create_agent(model=model, system_prompt=system_prompt)

Invoke the agent with a messages key. The agent returns the full message list; the last message holds the answer:

PYTHON

query = "Tell me 3 facts about the earth?"
response = agent.invoke({'messages': HumanMessage(query)})
print(response['messages'][-1].text)

OUTPUT

Here are 3 facts about Earth:
1. It is the only planet known to harbor life.
2. Approximately 71% of its surface is covered by water.
3. It is an oblate spheroid, bulging at the equator and flattened at the poles.

Model and System Prompt Configuration

The system prompt defines the agent's role. A detailed prompt produces sharper, more consistent answers. Here we configure a Gemini 3 model with low thinking depth and a financial-analyst persona:

PYTHON

system_prompt = """You are a financial analyst specializing in tech stocks.

Guidelines:
- Provide data-driven analysis
- Keep responses concise (2-3 paragraphs max)
- Present numbers with proper formatting ($XXX.XX)
- Avoid speculation without data
"""

model = ChatGoogleGenerativeAI(model='gemini-3-flash-preview',
                               thinking_level='low',
                               include_thoughts=True)

agent = create_agent(model=model, system_prompt=system_prompt)

response = agent.invoke({'messages': "What was Apple's earning in 2020?"})
print(response['messages'][-1].text)

OUTPUT

For the fiscal year ending September 26, 2020, Apple reported total revenue of
$274.52 billion, a 6% increase over $260.17 billion in 2019. Net income was
$57.41 billion, with diluted EPS of $3.28 and a gross margin near 38.2%. Growth
was driven by Services ($53.77B) and Wearables (+25% to $30.62B).

Role-Based Agents

The same model behaves very differently depending on its role. Compare a support agent with a technical expert on the identical question:

PYTHON

support_prompt = """You are a friendly customer support agent.
- Use simple language (avoid jargon)
- Ask clarifying questions when needed
- Maintain a warm, empathetic tone
"""

support_agent = create_agent(model=model, system_prompt=support_prompt)
response = support_agent.invoke({'messages': [HumanMessage('I can not login into my account')]})
print(response['messages'][-1].text)

OUTPUT

Hi there! I'm sorry you're having trouble getting into your account. To help, could
you tell me what you see on screen (for example "incorrect password")? Are you on our
website or the mobile app? Have you tried the "Forgot Password" link yet?

PYTHON

tech_prompt = """You are a technical expert.
- Provide detailed technical responses
- Use precise terminology
- Include code examples when relevant
"""

tech_agent = create_agent(model=model, system_prompt=tech_prompt)
response = tech_agent.invoke({'messages': [HumanMessage('I can not login into my account')]})
print(response['messages'][-1].text)

PYTHON

First identify the error state: 401 Unauthorized (bad credentials/expired token),
403 Forbidden (account locked), 429 Too Many Requests (rate limiting), or 500/503
(auth service down). For web logins, clear stale cookies/LocalStorage or use an
incognito window, and verify NTP clock sync for TOTP/JWT validation...

Giving Agents Tools

Tools let agents take actions. This series ships two reusable tools in scripts/base_tools.py, web_search (live web search) and get_weather (current weather). The agent picks a tool based on its docstring, so clear descriptions matter.

PYTHON

import warnings
warnings.filterwarnings('ignore')
import sys
sys.path.append('../')

from scripts import base_tools

model = ChatGoogleGenerativeAI(model='gemini-2.5-flash')

agent = create_agent(model=model,
                     tools=[base_tools.web_search, base_tools.get_weather],
                     system_prompt='You are a helpful AI assistant.')

response = agent.invoke({'messages': [HumanMessage('what is the current weather in Mumbai?')]})
print(response['messages'][-1].text)

OUTPUT

The current weather in Mumbai is around 30°C with overcast/misty skies, light
north-westerly winds, and moderate humidity.

Tip

Invoke a tool directly with a dict argument to test it, base_tools.web_search.invoke({'query': 'kgp talkie'}). Do not call the tool object like a function (base_tools.web_search({'query': ...})); that is the wrong calling convention.

Sequential vs Parallel Tool Calls

The model decides whether tools run one after another or together based on how the request is phrased. "...then..." implies a sequence; "...also..." implies parallel calls.

PYTHON

# Sequential, "then"
agent.invoke({'messages': [HumanMessage('Tell me news about the Apple stock then tell me weather in Mumbai')]})

# Parallel, "also"
response = agent.invoke({'messages': [HumanMessage('Tell me news about the Apple stock also tell me weather in Mumbai')]})
response['messages'][1].tool_calls

OUTPUT

[{'name': 'web_search', 'args': {'query': 'Apple stock news'}, 'id': '...', 'type': 'tool_call'}, {'name': 'get_weather', 'args': {'location': 'Mumbai'}, 'id': '...', 'type': 'tool_call'}]

Tool Error Handling

Wrap tool calls with @wrap_tool_call middleware to catch exceptions and return a graceful message instead of crashing the agent:

PYTHON

from langchain.tools import tool
from langchain.agents.middleware import wrap_tool_call
from langchain.messages import ToolMessage

@tool
def divide(a: float, b: float):
    """Divide the two numbers"""
    return a / b

@wrap_tool_call
def handle_tool_errors(request, handler):
    try:
        return handler(request)
    except Exception as e:
        return ToolMessage(
            content=f"Error: {str(e)}. Try different Input.",
            tool_call_id=request.tool_call['id']
        )

agent = create_agent(model=model,
                     tools=[base_tools.web_search, base_tools.get_weather, divide],
                     system_prompt='You are a helpful AI assistant.',
                     middleware=[handle_tool_errors])

agent.invoke({'messages': [HumanMessage('what is the current weather in Mumbai and what is 1/0?')]})

Here, we can see the 1/0 division raise an exception that the middleware converts into a recoverable ToolMessage, so the agent still answers the weather part.

Accessing Agent State from a Tool

A tool can read the running agent state and an immutable user context using ToolRuntime:

PYTHON

from langchain.tools import ToolRuntime
from dataclasses import dataclass

@tool
def get_message_count(runtime: ToolRuntime):
    """Get the total number of messages exchanged in the conversation."""
    messages = runtime.state['messages']
    context = runtime.context
    return f"User '{context.user_id}' with Session '{context.session_id}' has '{len(messages)}' messages."

@dataclass
class UserContext:
    user_id: str
    session_id: str

agent = create_agent(model=model,
                     tools=[base_tools.get_weather, get_message_count],
                     system_prompt='You are a helpful AI assistant.',
                     context_schema=UserContext)

user_context = UserContext(user_id='kgptalkie', session_id='session_1')
agent.invoke({'messages': [HumanMessage('weather in Mumbai then how many messages are in this conversation')]},
             context=user_context)

Short-Term Memory

Without a checkpointer, an agent forgets everything between calls:

PYTHON

agent = create_agent(model=model, system_prompt=system_prompt)

agent.invoke({'messages': [HumanMessage("My name is Laxmi Kant")]})
response = agent.invoke({'messages': [HumanMessage("What's my name?")]})
print(response['messages'][-1].content)

OUTPUT

I don't know your name. You haven't told it to me yet!

A checkpointer persists conversation history per thread. Use SQLite for development and PostgreSQL for production.

Type	Use Case	Setup
SQLite	Development, testing	Simple file-based
PostgreSQL	Production, multi-user	Database connection

SQLite Checkpointer

PYTHON

from langgraph.checkpoint.sqlite import SqliteSaver
import sqlite3

os.makedirs('db', exist_ok=True)
conn = sqlite3.connect("db/31_checkpoints.db", check_same_thread=False)
checkpointer = SqliteSaver(conn)
checkpointer.setup()

config = {"configurable": {"thread_id": "user_123"}}

agent = create_agent(model=model, system_prompt=system_prompt, checkpointer=checkpointer)

agent.invoke({'messages': [HumanMessage("My name is Laxmi Kant")]}, config=config)
response = agent.invoke({'messages': [HumanMessage("What's my name?")]}, config=config)
print(response['messages'][-1].content)

OUTPUT

Your name is Laxmi Kant.

The thread_id isolates sessions. A different thread_id starts a fresh conversation, and we can inspect any thread's saved state with agent.get_state(config=config).

Note

On Linux/macOS: the same code runs unchanged. Only the database file path differs by convention, use forward slashes like db/31_checkpoints.db.

PostgreSQL Checkpointer

For production, swap SqliteSaver for PostgresSaver. Set a POSTGRESQL_URL in your .env:

PYTHON

from langgraph.checkpoint.postgres import PostgresSaver
import psycopg

pg_conn = psycopg.connect(os.getenv("POSTGRESQL_URL"), autocommit=True)
checkpointer = PostgresSaver(pg_conn)
checkpointer.setup()

config = {"configurable": {"thread_id": "user_123"}}
agent = create_agent(model=model, system_prompt=system_prompt, checkpointer=checkpointer)

Context Offloading: Read and Modify State from Tools

For very long conversations we can offload context to disk. A tool reads the running state and writes a summary, scoped per user and thread:

PYTHON

from langchain.tools import tool, ToolRuntime
from pathlib import Path

@tool
def save_conversation_summary(summary: str, runtime: ToolRuntime):
    """Save conversation summary to disk for context offloading."""
    user_id = runtime.context.user_id
    thread_id = runtime.context.thread_id

    summary_dir = Path(f"data/{user_id}/{thread_id}")
    summary_dir.mkdir(parents=True, exist_ok=True)
    summary_path = summary_dir / "summary.md"
    summary_path.write_text(summary)
    return f"Summary saved to {summary_path}"

A second tool loads a saved summary back into state. It returns a Command that clears existing messages and injects the summary as fresh context:

PYTHON

from langchain.messages import RemoveMessage, ToolMessage
from langgraph.graph.message import REMOVE_ALL_MESSAGES
from langgraph.types import Command

@tool
def load_conversation_summary(runtime: ToolRuntime):
    """Load previous conversation summary from disk."""
    user_id = runtime.context.user_id
    thread_id = runtime.config['configurable']['thread_id']
    summary_path = Path(f"data/{user_id}/{thread_id}/summary.md")

    if not summary_path.exists():
        return Command(update={'messages': [
            ToolMessage("No previous summary found.", tool_call_id=runtime.tool_call_id)]})

    summary_text = summary_path.read_text()
    messages = runtime.state.get('messages', [])
    last_ai_message = messages[-1] if messages else None

    new_messages = [
        RemoveMessage(id=REMOVE_ALL_MESSAGES),
        HumanMessage(f"Previous conversation summary:\n{summary_text}"),
    ]
    if last_ai_message:
        new_messages.append(last_ai_message)
    new_messages.append(ToolMessage("Successfully loaded previous summary.", tool_call_id=runtime.tool_call_id))

    return Command(update={'messages': new_messages})

This pattern keeps the active context small while preserving the meaning of earlier turns.

Long-Term Memory

Short-term memory lives in a checkpointer and lasts a session. Long-term memory lives in a store and persists across sessions and threads, ideal for user preferences and facts. With embeddings, the store also supports semantic search.

Checkpointers hold session memory; stores persist facts across sessions

Type	Storage	Use Case	Persistence
Short-term	Checkpointer	Conversation history	Session
Long-term	Store	User preferences, facts	Cross-session

Configure a PostgresStore with a Gemini embedding function:

PYTHON

from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langgraph.store.postgres import PostgresStore
import psycopg

embeddings = GoogleGenerativeAIEmbeddings(model='gemini-embedding-001')

def embed(texts: list[str]):
    return embeddings.embed_documents(texts, output_dimensionality=768)

pg_conn = psycopg.connect(os.getenv('POSTGRESQL_URL'), autocommit=True)
store = PostgresStore(pg_conn, index={'embed': embed, 'dims': 768})
store.setup()

Define memory tools that read and write the store through runtime.store, organized into hierarchical namespaces:

PYTHON

from langchain.tools import tool, ToolRuntime
from dataclasses import dataclass

@dataclass
class UserContext:
    user_id: str

@tool
def save_user_memory(category: str, information: dict, runtime: ToolRuntime):
    """Save user preference or information to long-term memory.

    Examples:
        category='food', information={'diet': 'vegetarian', 'likes': ['pasta']}
        category='work', information={'role': 'Data Scientist', 'interests': ['AI', 'ML']}
    """
    store = runtime.store
    user_id = runtime.context.user_id
    namespace = (user_id, "preferences")
    store.put(namespace=namespace, key=category, value=information)
    return f"Saved {category} preferences for {user_id}"

@tool
def get_user_memory(category: str, runtime: ToolRuntime):
    """Retrieve user preference or information from long-term memory."""
    store = runtime.store
    user_id = runtime.context.user_id
    namespace = (user_id, 'preferences')
    item = store.get(namespace=namespace, key=category)
    if item:
        return f"{category}: {item.value}"
    return f"No '{category}' information found"

Connect both the checkpointer (short-term) and the store (long-term) into the agent:

PYTHON

from langgraph.checkpoint.postgres import PostgresSaver

checkpointer = PostgresSaver(psycopg.connect(os.getenv("POSTGRESQL_URL"), autocommit=True))
store = PostgresStore(psycopg.connect(os.getenv('POSTGRESQL_URL'), autocommit=True),
                      index={'embed': embed, 'dims': 768})

agent = create_agent(
    model=model,
    tools=[base_tools.web_search, save_user_memory, get_user_memory],
    checkpointer=checkpointer,
    store=store,
    context_schema=UserContext,
    system_prompt="You are a helpful assistant with long-term memory."
)

In one session the agent saves facts. In a brand-new session (a different thread) it can still recall them, because the store is independent of the thread. We can also query the store directly with semantic search, which matches by meaning, not keywords:

PYTHON

namespace = ('kgptalkie', 'preferences')
memories = store.search(namespace, query="What does the user like to eat?", limit=2)
for m in memories:
    print(f"{m.key}: {m.value}")

OUTPUT

food: {'diet': 'vegetarian', 'likes': ['pasta']}
work: {'role': 'Data Scientist', 'interests': ['AI', 'ML']}

Streaming and Structured Output

Streaming keeps interfaces responsive. LangGraph agents support three stream modes:

Mode	Use Case	Returns
messages	Real-time token display	Message chunks as generated
updates	Debugging agent flow	Node name + output after each node
values	Track full state	Complete state snapshot after each step

PYTHON

from langgraph.checkpoint.sqlite import SqliteSaver
import sqlite3

conn = sqlite3.connect("db/5_streaming_agent.db", check_same_thread=False)
agent = create_agent(model=model,
                     tools=[base_tools.web_search, base_tools.get_weather],
                     checkpointer=SqliteSaver(conn))

config = {'configurable': {'thread_id': '05_session_1'}}

# updates mode, see each node fire
for chunk in agent.stream(
        {'messages': [HumanMessage('what is the weather in Mumbai?')]},
        config=config, stream_mode='updates'):
    print(chunk)

The series also includes a helper, scripts/agent_utils.stream_agent_response, that prints tool calls, tool responses, and the final text cleanly:

PYTHON

from scripts import agent_utils
agent_utils.stream_agent_response(agent, 'what is the weather in mumbai', '5_session_3')

OUTPUT

  Tool Called: get_weather
   Args: {'location': 'Mumbai'}

  Tool Response: {"location": {"name": "Mumbai", ...
  Tool Result (length: 858 chars)

The weather in Mumbai is 30.1°C and overcast. Wind 7.9 kph WNW, humidity 55%, UV index 7.1.

Structured Output with Pydantic

Pass a Pydantic model as response_format to get type-safe, validated output:

PYTHON

from pydantic import BaseModel, Field
from typing import Optional

class FinancialAnalysis(BaseModel):
    company: str = Field(description="Company name")
    stock_symbol: str = Field(description="Stock ticker")
    current_price: Optional[str] = Field(description="Current price", default=None)
    analysis: str = Field(description="Brief analysis")
    recommendation: str = Field(description="Buy/Hold/Sell")

agent = create_agent(model=model, tools=[base_tools.web_search], response_format=FinancialAnalysis)

response = agent.invoke({'messages': [HumanMessage('tell me latest news about Apple stock')]})
response['structured_response'].model_dump()

OUTPUT

{'company': 'Apple Inc.', 'stock_symbol': 'AAPL', 'current_price': 'Around $212 - $260', 'analysis': 'Apple competes with Google on AI; mixed near-term trading...', 'recommendation': 'Hold'}

Production Middleware

Middleware adds production features without changing our agent logic. We attach middleware via the middleware=[...] argument.

Trim and Delete Messages

@before_model runs before each model call, perfect for trimming the context window. Here we keep only the first and last message:

PYTHON

from langchain.messages import RemoveMessage
from langgraph.graph.message import REMOVE_ALL_MESSAGES
from langchain.agents.middleware import before_model
from langgraph.runtime import Runtime
from langchain.agents import AgentState

@before_model
def trim_messages(state: AgentState, runtime: Runtime):
    """Keep only the first and last message to fit the context window."""
    messages = state['messages']
    if len(messages) <= 3:
        return None
    return {'messages': [RemoveMessage(id=REMOVE_ALL_MESSAGES), messages[0], messages[-1]]}

agent = create_agent(model=model,
                     tools=[base_tools.web_search, base_tools.get_weather],
                     checkpointer=InMemorySaver(),
                     middleware=[trim_messages])

@after_model runs after the model responds, useful for deleting old messages with RemoveMessage(id=m.id).

Summarization Middleware

Instead of dropping old messages, compress them. SummarizationMiddleware triggers a summary once the conversation crosses a length threshold while keeping the most recent messages intact:

PYTHON

from langchain.agents.middleware import SummarizationMiddleware

agent = create_agent(
    model=model,
    tools=[base_tools.get_weather, base_tools.web_search],
    checkpointer=InMemorySaver(),
    middleware=[SummarizationMiddleware(
        model=ChatGoogleGenerativeAI(model='gemini-3-pro-preview'),
        trigger=[('messages', 15)],
        keep=("messages", 5)
    )]
)

Todo List Middleware

For complex multi-step tasks, TodoListMiddleware() gives the agent a planning and tracking tool. The agent breaks work into todos and marks each one completed, in-progress, or pending as it goes, visible in agent.get_state(config).

Dynamic Model Selection

Route simple turns to a cheap model and complex turns to a stronger one with @wrap_model_call:

PYTHON

from langchain.agents.middleware import wrap_model_call, ModelRequest

basic_model = ChatGoogleGenerativeAI(model='gemini-2.5-flash')
advanced_model = ChatGoogleGenerativeAI(model='gemini-3-pro-preview')

@wrap_model_call
def dynamic_model_selection(request: ModelRequest, handler):
    """Choose model based on conversation complexity."""
    count = len(request.state['messages'])
    model = advanced_model if count > 5 else basic_model
    return handler(request.override(model=model))

Call Limits and Model Fallback

Cap runaway costs with ModelCallLimitMiddleware and ToolCallLimitMiddleware, and improve reliability with ModelFallbackMiddleware:

PYTHON

from langchain.agents.middleware import (
    ModelCallLimitMiddleware, ToolCallLimitMiddleware,
    TodoListMiddleware, ModelFallbackMiddleware
)

agent = create_agent(
    model=basic_model,
    tools=[base_tools.web_search, base_tools.get_weather],
    checkpointer=InMemorySaver(),
    middleware=[
        dynamic_model_selection,
        TodoListMiddleware(),
        ModelCallLimitMiddleware(run_limit=5, exit_behavior='end'),
        ToolCallLimitMiddleware(run_limit=5, exit_behavior='continue'),
        ModelFallbackMiddleware(first_model=ChatGoogleGenerativeAI(model='gemini-3-flash-preview')),
    ]
)

Dynamic System Prompt

Adapt the system prompt at runtime based on user context with @dynamic_prompt:

PYTHON

from langchain.agents.middleware import dynamic_prompt, ModelRequest
from dataclasses import dataclass

@dataclass
class UserContext:
    user_role: str

@dynamic_prompt
def user_role_prompt(request: ModelRequest):
    """Generate a system prompt based on user role."""
    user_role = request.runtime.context.user_role
    base = "You are a helpful assistant."
    if user_role == 'expert':
        return f"{base} Provide detailed technical responses."
    elif user_role == "beginner":
        return f"{base} Explain concepts simply and avoid jargon."
    return base

agent = create_agent(model=model,
                     tools=[base_tools.web_search, base_tools.get_weather],
                     checkpointer=InMemorySaver(),
                     middleware=[user_role_prompt],
                     context_schema=UserContext)

response = agent.invoke({"messages": [HumanMessage("Explain machine learning")]},
                        context=UserContext(user_role='beginner'))
print(response['messages'][-1].text)

OUTPUT

Machine learning is like teaching a computer to learn from examples. Instead of
giving it rules for every task, you show it lots of data and it finds patterns.
To recognize cats, you show it thousands of pictures rather than describing ears
and whiskers, the more it sees, the better it gets.

Guardrails and Human-in-the-Loop

Production agents need protection against leaking PII, processing secrets, and running dangerous actions.

Middleware redacts PII and pauses sensitive tools for human approval

PII Protection Strategies

PIIMiddleware detects and handles personally identifiable information with several strategies:

Strategy	Original	After Protection	Description
redact	`https://kgptalkie.com`	`[REDACTED_URL]`	Removes completely
mask	`5105-1051-0510-5100`	`**--**-5100`	Shows last few characters
hash	`udemy@kgptalkie.com`	`<email_hash:8ea1aedb>`	Deterministic hash
block	`sk-...32 chars`	Execution blocked	Throws error, stops processing

PYTHON

from langchain.agents.middleware import PIIMiddleware

agent = create_agent(
    model=model,
    system_prompt="You are a helpful customer service assistant.",
    middleware=[
        PIIMiddleware("email", strategy="hash", apply_to_input=True),
        PIIMiddleware("credit_card", strategy="mask", apply_to_input=True),
        PIIMiddleware("url", strategy="redact", apply_to_input=True),
        PIIMiddleware("api_key", detector=r"sk-[a-zA-Z0-9]{32}", strategy='mask', apply_to_input=True),
        PIIMiddleware("phone", detector=r"\d{3}-\d{3}-\d{4}", strategy="redact", apply_to_input=True),
    ]
)

response = agent.invoke({'messages': [HumanMessage("""
        My email is udemy@kgptalkie.com
        My phone is 555-123-4567
        My card is 5105-1051-0510-5100
        My website is https://kgptalkie.com
    """)]})
print(response['messages'][0].content)

OUTPUT

My email is <email_hash:8ea1aedb>
My phone is [REDACTED_PHONE]
My card is ****-****-****-5100
My website is [REDACTED_URL]

We can also define custom PII patterns with our own regex detectors, for example employee IDs (EMP-\d{6}) or order IDs (ORD-[A-Z0-9]{6}).

Human-in-the-Loop (HITL)

HumanInTheLoopMiddleware pauses the agent before sensitive tool calls and waits for a human decision.

Decision	Effect	Use Case
approve	Execute as-is	Safe operations
edit	Modify, then execute	Adjust parameters
reject	Block with feedback	Dangerous operations

PYTHON

from langchain.agents.middleware import HumanInTheLoopMiddleware
from langchain.tools import tool
from langgraph.types import Command

@tool
def write_file(path: str, content: str):
    """Write content to file."""
    with open(path, 'w') as f:
        f.write(content)
    return f"Successfully wrote to {path}"

@tool
def execute_sql(query: str):
    """Execute SQL query. Use this tool for any database related question."""
    return f"Would execute: {query}"

agent = create_agent(
    model=model,
    tools=[write_file, execute_sql],
    checkpointer=InMemorySaver(),
    middleware=[HumanInTheLoopMiddleware(
        interrupt_on={
            "write_file": True,                                   # approve, edit, or reject
            "execute_sql": {"allowed_decisions": ["approve", "reject"]},  # no editing
        },
        description_prefix="Tool execution pending approval"
    )]
)

When a guarded tool is about to run, the result contains an __interrupt__. Resume with a decision:

PYTHON

config = {"configurable": {"thread_id": "hitl_approve_1"}}
result = agent.invoke({"messages": [HumanMessage("Write 'Hello World' to data/test.txt")]}, config=config)

if "__interrupt__" in result:
    print("Interrupt:", result['__interrupt__'][0].value['action_requests'][0])
    result = agent.invoke(Command(resume={"decisions": [{"type": "approve"}]}), config=config)

OUTPUT

Interrupt: {'name': 'write_file', 'args': {'path': 'data/test.txt', 'content': 'Hello World'}, ...}

To edit before running, resume with {"type": "edit", "edited_action": {...}}, for example redirecting the write to data/earth_essay.txt. To reject a dangerous action like "delete all records", resume with {"type": "reject", "message": "Too dangerous. Use a WHERE clause."} and the agent revises its plan.

Caution

Always guard destructive tools (file writes, SQL execution, payments) with HITL. The reject path lets a human stop an irreversible action and hand corrective feedback back to the agent.

Prompt Engineering Patterns

The system prompt is the single most important lever on agent behavior. These nine patterns cover the techniques used throughout the series.

1. Basic vs detailed prompts. A vague prompt ("You are a helpful assistant") often refuses or hedges. A detailed prompt with role, guidelines, and formatting rules produces concrete, cited answers, the same query yields a far stronger result.

2. Role-based prompts. Define responsibilities and communication style (e.g., a patient support agent that always ends with "Is there anything else I can help with?").

3. Constraint-based prompts. Spell out what the agent cannot do. A medical-information assistant lists allowed actions, forbidden actions, and a mandatory disclaimer:

PYTHON

constrained_prompt = """You are a medical information assistant.

What you CAN do:
- Provide general health information and explain terminology

What you CANNOT do:
- Diagnose conditions, prescribe medication, or replace professional advice

ALWAYS include disclaimer: "This is not medical advice. Consult a healthcare professional."
"""

4. Few-shot prompting. Show examples of the exact output format you want, and the agent mirrors them:

PYTHON

few_shot_prompt = """You are a product description writer for an e-commerce site.

Format like these examples:

Example 1:
Product: Wireless Mouse
Description: Glide through your workday with our ergonomic wireless mouse.
- 18-month battery life | 6 programmable buttons | Up to 30ft range
Perfect for: Professionals, gamers, everyday users

Use this format for all product descriptions.
"""

5. Context-aware prompts. Inject dynamic context (current date, user tier) into the prompt with an f-string so the agent tailors responses and reasons about "today."

6. Tool-usage guidance. Tell the agent explicitly when to use a tool and when not to, always search for current prices and news, never search for math or pre-2025 historical facts.

7. Output-format control. Provide an exact template (headings, bullets, fields) and instruct the model to never deviate, producing consistent, parseable output:

PYTHON

format_control_prompt = """You are a stock analysis agent.

ALWAYS format your analysis exactly like this:

## [COMPANY NAME] Analysis
**Current Price:** $XXX.XX
**Change:** ±X.XX%
### Key Metrics
• Market Cap: $XXX billion
### Recommendation
BUY | HOLD | SELL

NEVER deviate from this format.
"""

8. Chain-of-thought prompting. Ask the agent to reason step by step, Understand → Break down → Gather data → Analyze → Conclude, for transparent, auditable answers.

9. Reusable templates. Keep a library of parameterized prompt templates (customer support, data analyst, content creator) so teams stay consistent.

Tip

Best practices: be specific, show examples, set boundaries, structure information, provide context, and test iteratively. Avoid being vague, over-complicating, or contradicting yourself within a single prompt.

With these fundamentals, tools, memory, streaming, middleware, guardrails, and prompts, we have everything we need to build real agents. This is the fundamentals reference for the whole series. Next, we put them to work by connecting an agent to external services through the Model Context Protocol (MCP) in Build a Hotel Search AI Agent with MCP.

LangChain Agent Fundamentals

What Is an AI Agent?

Model and System Prompt Configuration

Role-Based Agents

Giving Agents Tools

Sequential vs Parallel Tool Calls

Tool Error Handling

Accessing Agent State from a Tool

Short-Term Memory

SQLite Checkpointer

PostgreSQL Checkpointer

Context Offloading: Read and Modify State from Tools

Long-Term Memory

Streaming and Structured Output

Structured Output with Pydantic

Production Middleware

Trim and Delete Messages

Summarization Middleware

Todo List Middleware

Dynamic Model Selection

Call Limits and Model Fallback

Dynamic System Prompt

Guardrails and Human-in-the-Loop

PII Protection Strategies

Human-in-the-Loop (HITL)

Prompt Engineering Patterns

Found this useful? Keep building with me.

Latest recommendations you might like

Real-World Agent Project: MySQL & Streaming

Deploy AI Agents with FastAPI

Build a Daily Briefing AI Agent

Build a Google Sheets Analysis Agent with MCP

Find this tutorial useful?

Discussion & Comments