LangChain Agent Fundamentals

Master LangChain agents end to end: tools, short- and long-term memory, streaming, middleware, guardrails, human-in-the-loop, and prompt engineering.

Jun 19, 202630 min readFollow

Topics You Will Master

Building agents with create_agent, tools, and role-based prompts
Persisting short-term and long-term memory with semantic search
Streaming agent output and adding production middleware
Applying PII guardrails, human-in-the-loop, and prompt engineering

An AI agent combines a language model with tools to create a system that reasons, decides, and works toward a solution iteratively. This lesson is the complete fundamentals reference for the series — every building block you need before assembling real projects.

We move from a one-line agent to tools, short- and long-term memory, streaming, production middleware, guardrails, human-in-the-loop approval, and prompt engineering. All examples use create_agent from LangChain with Google's Gemini models.

Note

This lesson assumes your environment is set up. If you have not configured your Gemini and LangSmith keys yet, start with Getting Started with Gemini 3 & LangChain.

What Is an AI Agent?

An agent has three core components:

  1. Model / LLM — the reasoning engine.
  2. System prompt — instructions that guide behavior.
  3. Message history — the conversation context.

Create your first agent with a model and a system prompt:

PYTHON
import os
from dotenv import load_dotenv
load_dotenv()

from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.agents import create_agent
from langchain.messages import HumanMessage

model = ChatGoogleGenerativeAI(model='gemini-2.5-flash')

system_prompt = "You are a helpful assistant that provides concise and accurate responses."

agent = create_agent(model=model, system_prompt=system_prompt)

Invoke the agent with a messages key. The agent returns the full message list; the last message holds the answer:

PYTHON
query = "Tell me 3 facts about the earth?"
response = agent.invoke({'messages': HumanMessage(query)})
print(response['messages'][-1].text)
OUTPUT
Here are 3 facts about Earth:
1. It is the only planet known to harbor life.
2. Approximately 71% of its surface is covered by water.
3. It is an oblate spheroid, bulging at the equator and flattened at the poles.

Model and System Prompt Configuration

The system prompt defines the agent's role. A detailed prompt produces sharper, more consistent answers. Here we configure a Gemini 3 model with low thinking depth and a financial-analyst persona:

PYTHON
system_prompt = """You are a financial analyst specializing in tech stocks.

Guidelines:
- Provide data-driven analysis
- Keep responses concise (2-3 paragraphs max)
- Present numbers with proper formatting ($XXX.XX)
- Avoid speculation without data
"""

model = ChatGoogleGenerativeAI(model='gemini-3-flash-preview',
                               thinking_level='low',
                               include_thoughts=True)

agent = create_agent(model=model, system_prompt=system_prompt)

response = agent.invoke({'messages': "What was Apple's earning in 2020?"})
print(response['messages'][-1].text)
OUTPUT
For the fiscal year ending September 26, 2020, Apple reported total revenue of
$274.52 billion, a 6% increase over $260.17 billion in 2019. Net income was
$57.41 billion, with diluted EPS of $3.28 and a gross margin near 38.2%. Growth
was driven by Services ($53.77B) and Wearables (+25% to $30.62B).

Role-Based Agents

The same model behaves very differently depending on its role. Compare a support agent with a technical expert on the identical question:

PYTHON
support_prompt = """You are a friendly customer support agent.
- Use simple language (avoid jargon)
- Ask clarifying questions when needed
- Maintain a warm, empathetic tone
"""

support_agent = create_agent(model=model, system_prompt=support_prompt)
response = support_agent.invoke({'messages': [HumanMessage('I can not login into my account')]})
print(response['messages'][-1].text)
OUTPUT
Hi there! I'm sorry you're having trouble getting into your account. To help, could
you tell me what you see on screen (for example "incorrect password")? Are you on our
website or the mobile app? Have you tried the "Forgot Password" link yet?
PYTHON
tech_prompt = """You are a technical expert.
- Provide detailed technical responses
- Use precise terminology
- Include code examples when relevant
"""

tech_agent = create_agent(model=model, system_prompt=tech_prompt)
response = tech_agent.invoke({'messages': [HumanMessage('I can not login into my account')]})
print(response['messages'][-1].text)
PYTHON
First identify the error state: 401 Unauthorized (bad credentials/expired token),
403 Forbidden (account locked), 429 Too Many Requests (rate limiting), or 500/503
(auth service down). For web logins, clear stale cookies/LocalStorage or use an
incognito window, and verify NTP clock sync for TOTP/JWT validation...

Giving Agents Tools

Tools let agents take actions. This series ships two reusable tools in scripts/base_tools.pyweb_search (live web search) and get_weather (current weather). The agent picks a tool based on its docstring, so clear descriptions matter.

PYTHON
import warnings
warnings.filterwarnings('ignore')
import sys
sys.path.append('../')

from scripts import base_tools

model = ChatGoogleGenerativeAI(model='gemini-2.5-flash')

agent = create_agent(model=model,
                     tools=[base_tools.web_search, base_tools.get_weather],
                     system_prompt='You are a helpful AI assistant.')

response = agent.invoke({'messages': [HumanMessage('what is the current weather in Mumbai?')]})
print(response['messages'][-1].text)
OUTPUT
The current weather in Mumbai is around 30°C with overcast/misty skies, light
north-westerly winds, and moderate humidity.

Tip

Invoke a tool directly with a dict argument to test it — base_tools.web_search.invoke({'query': 'kgp talkie'}). Do not call the tool object like a function (base_tools.web_search({'query': ...})); that is the wrong calling convention.

Sequential vs Parallel Tool Calls

The model decides whether tools run one after another or together based on how the request is phrased. "...then..." implies a sequence; "...also..." implies parallel calls.

PYTHON
# Sequential — "then"
agent.invoke({'messages': [HumanMessage('Tell me news about the Apple stock then tell me weather in Mumbai')]})

# Parallel — "also"
response = agent.invoke({'messages': [HumanMessage('Tell me news about the Apple stock also tell me weather in Mumbai')]})
response['messages'][1].tool_calls
OUTPUT
[{'name': 'web_search', 'args': {'query': 'Apple stock news'}, 'id': '...', 'type': 'tool_call'}, {'name': 'get_weather', 'args': {'location': 'Mumbai'}, 'id': '...', 'type': 'tool_call'}]

Tool Error Handling

Wrap tool calls with @wrap_tool_call middleware to catch exceptions and return a graceful message instead of crashing the agent:

PYTHON
from langchain.tools import tool
from langchain.agents.middleware import wrap_tool_call
from langchain.messages import ToolMessage

@tool
def divide(a: float, b: float):
    """Divide the two numbers"""
    return a / b

@wrap_tool_call
def handle_tool_errors(request, handler):
    try:
        return handler(request)
    except Exception as e:
        return ToolMessage(
            content=f"Error: {str(e)}. Try different Input.",
            tool_call_id=request.tool_call['id']
        )

agent = create_agent(model=model,
                     tools=[base_tools.web_search, base_tools.get_weather, divide],
                     system_prompt='You are a helpful AI assistant.',
                     middleware=[handle_tool_errors])

agent.invoke({'messages': [HumanMessage('what is the current weather in Mumbai and what is 1/0?')]})

The 1/0 division raises an exception that the middleware converts into a recoverable ToolMessage, so the agent still answers the weather part.

Accessing Agent State from a Tool

A tool can read the running agent state and an immutable user context using ToolRuntime:

PYTHON
from langchain.tools import ToolRuntime
from dataclasses import dataclass

@tool
def get_message_count(runtime: ToolRuntime):
    """Get the total number of messages exchanged in the conversation."""
    messages = runtime.state['messages']
    context = runtime.context
    return f"User '{context.user_id}' with Session '{context.session_id}' has '{len(messages)}' messages."

@dataclass
class UserContext:
    user_id: str
    session_id: str

agent = create_agent(model=model,
                     tools=[base_tools.get_weather, get_message_count],
                     system_prompt='You are a helpful AI assistant.',
                     context_schema=UserContext)

user_context = UserContext(user_id='kgptalkie', session_id='session_1')
agent.invoke({'messages': [HumanMessage('weather in Mumbai then how many messages are in this conversation')]},
             context=user_context)

Short-Term Memory

Without a checkpointer, an agent forgets everything between calls:

PYTHON
agent = create_agent(model=model, system_prompt=system_prompt)

agent.invoke({'messages': [HumanMessage("My name is Laxmi Kant")]})
response = agent.invoke({'messages': [HumanMessage("What's my name?")]})
print(response['messages'][-1].content)
OUTPUT
I don't know your name. You haven't told it to me yet!

A checkpointer persists conversation history per thread. Use SQLite for development and PostgreSQL for production.

Type Use Case Setup
SQLite Development, testing Simple file-based
PostgreSQL Production, multi-user Database connection

SQLite Checkpointer

PYTHON
from langgraph.checkpoint.sqlite import SqliteSaver
import sqlite3

os.makedirs('db', exist_ok=True)
conn = sqlite3.connect("db/31_checkpoints.db", check_same_thread=False)
checkpointer = SqliteSaver(conn)
checkpointer.setup()

config = {"configurable": {"thread_id": "user_123"}}

agent = create_agent(model=model, system_prompt=system_prompt, checkpointer=checkpointer)

agent.invoke({'messages': [HumanMessage("My name is Laxmi Kant")]}, config=config)
response = agent.invoke({'messages': [HumanMessage("What's my name?")]}, config=config)
print(response['messages'][-1].content)
OUTPUT
Your name is Laxmi Kant.

The thread_id isolates sessions. A different thread_id starts a fresh conversation, and you can inspect any thread's saved state with agent.get_state(config=config).

Note

On Linux/macOS: the same code runs unchanged. Only the database file path differs by convention — use forward slashes like db/31_checkpoints.db.

PostgreSQL Checkpointer

For production, swap SqliteSaver for PostgresSaver. Set a POSTGRESQL_URL in your .env:

PYTHON
from langgraph.checkpoint.postgres import PostgresSaver
import psycopg

pg_conn = psycopg.connect(os.getenv("POSTGRESQL_URL"), autocommit=True)
checkpointer = PostgresSaver(pg_conn)
checkpointer.setup()

config = {"configurable": {"thread_id": "user_123"}}
agent = create_agent(model=model, system_prompt=system_prompt, checkpointer=checkpointer)

Context Offloading: Read and Modify State from Tools

For very long conversations you can offload context to disk. A tool reads the running state and writes a summary, scoped per user and thread:

PYTHON
from langchain.tools import tool, ToolRuntime
from pathlib import Path

@tool
def save_conversation_summary(summary: str, runtime: ToolRuntime):
    """Save conversation summary to disk for context offloading."""
    user_id = runtime.context.user_id
    thread_id = runtime.context.thread_id

    summary_dir = Path(f"data/{user_id}/{thread_id}")
    summary_dir.mkdir(parents=True, exist_ok=True)
    summary_path = summary_dir / "summary.md"
    summary_path.write_text(summary)
    return f"Summary saved to {summary_path}"

A second tool loads a saved summary back into state. It returns a Command that clears existing messages and injects the summary as fresh context:

PYTHON
from langchain.messages import RemoveMessage, ToolMessage
from langgraph.graph.message import REMOVE_ALL_MESSAGES
from langgraph.types import Command

@tool
def load_conversation_summary(runtime: ToolRuntime):
    """Load previous conversation summary from disk."""
    user_id = runtime.context.user_id
    thread_id = runtime.config['configurable']['thread_id']
    summary_path = Path(f"data/{user_id}/{thread_id}/summary.md")

    if not summary_path.exists():
        return Command(update={'messages': [
            ToolMessage("No previous summary found.", tool_call_id=runtime.tool_call_id)]})

    summary_text = summary_path.read_text()
    messages = runtime.state.get('messages', [])
    last_ai_message = messages[-1] if messages else None

    new_messages = [
        RemoveMessage(id=REMOVE_ALL_MESSAGES),
        HumanMessage(f"Previous conversation summary:\n{summary_text}"),
    ]
    if last_ai_message:
        new_messages.append(last_ai_message)
    new_messages.append(ToolMessage("Successfully loaded previous summary.", tool_call_id=runtime.tool_call_id))

    return Command(update={'messages': new_messages})

This pattern keeps the active context small while preserving the meaning of earlier turns.

Long-Term Memory

Short-term memory lives in a checkpointer and lasts a session. Long-term memory lives in a store and persists across sessions and threads — ideal for user preferences and facts. With embeddings, the store also supports semantic search.

Type Storage Use Case Persistence
Short-term Checkpointer Conversation history Session
Long-term Store User preferences, facts Cross-session

Configure a PostgresStore with a Gemini embedding function:

PYTHON
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langgraph.store.postgres import PostgresStore
import psycopg

embeddings = GoogleGenerativeAIEmbeddings(model='gemini-embedding-001')

def embed(texts: list[str]):
    return embeddings.embed_documents(texts, output_dimensionality=768)

pg_conn = psycopg.connect(os.getenv('POSTGRESQL_URL'), autocommit=True)
store = PostgresStore(pg_conn, index={'embed': embed, 'dims': 768})
store.setup()

Define memory tools that read and write the store through runtime.store, organized into hierarchical namespaces:

PYTHON
from langchain.tools import tool, ToolRuntime
from dataclasses import dataclass

@dataclass
class UserContext:
    user_id: str

@tool
def save_user_memory(category: str, information: dict, runtime: ToolRuntime):
    """Save user preference or information to long-term memory.

    Examples:
        category='food', information={'diet': 'vegetarian', 'likes': ['pasta']}
        category='work', information={'role': 'Data Scientist', 'interests': ['AI', 'ML']}
    """
    store = runtime.store
    user_id = runtime.context.user_id
    namespace = (user_id, "preferences")
    store.put(namespace=namespace, key=category, value=information)
    return f"Saved {category} preferences for {user_id}"

@tool
def get_user_memory(category: str, runtime: ToolRuntime):
    """Retrieve user preference or information from long-term memory."""
    store = runtime.store
    user_id = runtime.context.user_id
    namespace = (user_id, 'preferences')
    item = store.get(namespace=namespace, key=category)
    if item:
        return f"{category}: {item.value}"
    return f"No '{category}' information found"

Wire both the checkpointer (short-term) and the store (long-term) into the agent:

PYTHON
from langgraph.checkpoint.postgres import PostgresSaver

checkpointer = PostgresSaver(psycopg.connect(os.getenv("POSTGRESQL_URL"), autocommit=True))
store = PostgresStore(psycopg.connect(os.getenv('POSTGRESQL_URL'), autocommit=True),
                      index={'embed': embed, 'dims': 768})

agent = create_agent(
    model=model,
    tools=[base_tools.web_search, save_user_memory, get_user_memory],
    checkpointer=checkpointer,
    store=store,
    context_schema=UserContext,
    system_prompt="You are a helpful assistant with long-term memory."
)

In one session the agent saves facts; in a brand-new session (different thread) it can still recall them, because the store is independent of the thread. You can also query the store directly with semantic search — matching by meaning, not keywords:

PYTHON
namespace = ('kgptalkie', 'preferences')
memories = store.search(namespace, query="What does the user like to eat?", limit=2)
for m in memories:
    print(f"{m.key}: {m.value}")
OUTPUT
food: {'diet': 'vegetarian', 'likes': ['pasta']}
work: {'role': 'Data Scientist', 'interests': ['AI', 'ML']}

Streaming and Structured Output

Streaming keeps interfaces responsive. LangGraph agents support three stream modes:

Mode Use Case Returns
messages Real-time token display Message chunks as generated
updates Debugging agent flow Node name + output after each node
values Track full state Complete state snapshot after each step
PYTHON
from langgraph.checkpoint.sqlite import SqliteSaver
import sqlite3

conn = sqlite3.connect("db/5_streaming_agent.db", check_same_thread=False)
agent = create_agent(model=model,
                     tools=[base_tools.web_search, base_tools.get_weather],
                     checkpointer=SqliteSaver(conn))

config = {'configurable': {'thread_id': '05_session_1'}}

# updates mode — see each node fire
for chunk in agent.stream(
        {'messages': [HumanMessage('what is the weather in Mumbai?')]},
        config=config, stream_mode='updates'):
    print(chunk)

The series also includes a helper, scripts/agent_utils.stream_agent_response, that prints tool calls, tool responses, and the final text cleanly:

PYTHON
from scripts import agent_utils
agent_utils.stream_agent_response(agent, 'what is the weather in mumbai', '5_session_3')
OUTPUT
  Tool Called: get_weather
   Args: {'location': 'Mumbai'}

  Tool Response: {"location": {"name": "Mumbai", ...
  Tool Result (length: 858 chars)

The weather in Mumbai is 30.1°C and overcast. Wind 7.9 kph WNW, humidity 55%, UV index 7.1.

Structured Output with Pydantic

Pass a Pydantic model as response_format to get type-safe, validated output:

PYTHON
from pydantic import BaseModel, Field
from typing import Optional

class FinancialAnalysis(BaseModel):
    company: str = Field(description="Company name")
    stock_symbol: str = Field(description="Stock ticker")
    current_price: Optional[str] = Field(description="Current price", default=None)
    analysis: str = Field(description="Brief analysis")
    recommendation: str = Field(description="Buy/Hold/Sell")

agent = create_agent(model=model, tools=[base_tools.web_search], response_format=FinancialAnalysis)

response = agent.invoke({'messages': [HumanMessage('tell me latest news about Apple stock')]})
response['structured_response'].model_dump()
OUTPUT
{'company': 'Apple Inc.', 'stock_symbol': 'AAPL', 'current_price': 'Around $212 - $260', 'analysis': 'Apple competes with Google on AI; mixed near-term trading...', 'recommendation': 'Hold'}

Production Middleware

Middleware adds production capabilities without changing your agent logic. You attach middleware via the middleware=[...] argument.

Trim and Delete Messages

@before_model runs before each model call — perfect for trimming the context window. Here we keep only the first and last message:

PYTHON
from langchain.messages import RemoveMessage
from langgraph.graph.message import REMOVE_ALL_MESSAGES
from langchain.agents.middleware import before_model
from langgraph.runtime import Runtime
from langchain.agents import AgentState

@before_model
def trim_messages(state: AgentState, runtime: Runtime):
    """Keep only the first and last message to fit the context window."""
    messages = state['messages']
    if len(messages) <= 3:
        return None
    return {'messages': [RemoveMessage(id=REMOVE_ALL_MESSAGES), messages[0], messages[-1]]}

agent = create_agent(model=model,
                     tools=[base_tools.web_search, base_tools.get_weather],
                     checkpointer=InMemorySaver(),
                     middleware=[trim_messages])

@after_model runs after the model responds — useful for deleting old messages with RemoveMessage(id=m.id).

Summarization Middleware

Instead of dropping old messages, compress them. SummarizationMiddleware triggers a summary once the conversation crosses a length threshold while keeping the most recent messages intact:

PYTHON
from langchain.agents.middleware import SummarizationMiddleware

agent = create_agent(
    model=model,
    tools=[base_tools.get_weather, base_tools.web_search],
    checkpointer=InMemorySaver(),
    middleware=[SummarizationMiddleware(
        model=ChatGoogleGenerativeAI(model='gemini-3-pro-preview'),
        trigger=[('messages', 15)],
        keep=("messages", 5)
    )]
)

Todo List Middleware

For complex multi-step tasks, TodoListMiddleware() gives the agent a planning and tracking tool. The agent breaks work into todos and marks each one completed, in-progress, or pending as it goes — visible in agent.get_state(config).

Dynamic Model Selection

Route simple turns to a cheap model and complex turns to a stronger one with @wrap_model_call:

PYTHON
from langchain.agents.middleware import wrap_model_call, ModelRequest

basic_model = ChatGoogleGenerativeAI(model='gemini-2.5-flash')
advanced_model = ChatGoogleGenerativeAI(model='gemini-3-pro-preview')

@wrap_model_call
def dynamic_model_selection(request: ModelRequest, handler):
    """Choose model based on conversation complexity."""
    count = len(request.state['messages'])
    model = advanced_model if count > 5 else basic_model
    return handler(request.override(model=model))

Call Limits and Model Fallback

Cap runaway costs with ModelCallLimitMiddleware and ToolCallLimitMiddleware, and improve reliability with ModelFallbackMiddleware:

PYTHON
from langchain.agents.middleware import (
    ModelCallLimitMiddleware, ToolCallLimitMiddleware,
    TodoListMiddleware, ModelFallbackMiddleware
)

agent = create_agent(
    model=basic_model,
    tools=[base_tools.web_search, base_tools.get_weather],
    checkpointer=InMemorySaver(),
    middleware=[
        dynamic_model_selection,
        TodoListMiddleware(),
        ModelCallLimitMiddleware(run_limit=5, exit_behavior='end'),
        ToolCallLimitMiddleware(run_limit=5, exit_behavior='continue'),
        ModelFallbackMiddleware(first_model=ChatGoogleGenerativeAI(model='gemini-3-flash-preview')),
    ]
)

Dynamic System Prompt

Adapt the system prompt at runtime based on user context with @dynamic_prompt:

PYTHON
from langchain.agents.middleware import dynamic_prompt, ModelRequest
from dataclasses import dataclass

@dataclass
class UserContext:
    user_role: str

@dynamic_prompt
def user_role_prompt(request: ModelRequest):
    """Generate a system prompt based on user role."""
    user_role = request.runtime.context.user_role
    base = "You are a helpful assistant."
    if user_role == 'expert':
        return f"{base} Provide detailed technical responses."
    elif user_role == "beginner":
        return f"{base} Explain concepts simply and avoid jargon."
    return base

agent = create_agent(model=model,
                     tools=[base_tools.web_search, base_tools.get_weather],
                     checkpointer=InMemorySaver(),
                     middleware=[user_role_prompt],
                     context_schema=UserContext)

response = agent.invoke({"messages": [HumanMessage("Explain machine learning")]},
                        context=UserContext(user_role='beginner'))
print(response['messages'][-1].text)
OUTPUT
Machine learning is like teaching a computer to learn from examples. Instead of
giving it rules for every task, you show it lots of data and it finds patterns.
To recognize cats, you show it thousands of pictures rather than describing ears
and whiskers — the more it sees, the better it gets.

Guardrails and Human-in-the-Loop

Production agents need protection against leaking PII, processing secrets, and running dangerous actions.

PII Protection Strategies

PIIMiddleware detects and handles personally identifiable information with several strategies:

Strategy Original After Protection Description
redact https://kgptalkie.com [REDACTED_URL] Removes completely
mask 5105-1051-0510-5100 ****-****-****-5100 Shows last few characters
hash udemy@kgptalkie.com <email_hash:8ea1aedb> Deterministic hash
block sk-...32 chars Execution blocked Throws error, stops processing
PYTHON
from langchain.agents.middleware import PIIMiddleware

agent = create_agent(
    model=model,
    system_prompt="You are a helpful customer service assistant.",
    middleware=[
        PIIMiddleware("email", strategy="hash", apply_to_input=True),
        PIIMiddleware("credit_card", strategy="mask", apply_to_input=True),
        PIIMiddleware("url", strategy="redact", apply_to_input=True),
        PIIMiddleware("api_key", detector=r"sk-[a-zA-Z0-9]{32}", strategy='mask', apply_to_input=True),
        PIIMiddleware("phone", detector=r"\d{3}-\d{3}-\d{4}", strategy="redact", apply_to_input=True),
    ]
)

response = agent.invoke({'messages': [HumanMessage("""
        My email is udemy@kgptalkie.com
        My phone is 555-123-4567
        My card is 5105-1051-0510-5100
        My website is https://kgptalkie.com
    """)]})
print(response['messages'][0].content)
OUTPUT
My email is <email_hash:8ea1aedb>
My phone is [REDACTED_PHONE]
My card is ****-****-****-5100
My website is [REDACTED_URL]

You can also define custom PII patterns with your own regex detectors — for example employee IDs (EMP-\d{6}) or order IDs (ORD-[A-Z0-9]{6}).

Human-in-the-Loop (HITL)

HumanInTheLoopMiddleware pauses the agent before sensitive tool calls and waits for a human decision.

Decision Effect Use Case
approve Execute as-is Safe operations
edit Modify, then execute Adjust parameters
reject Block with feedback Dangerous operations
PYTHON
from langchain.agents.middleware import HumanInTheLoopMiddleware
from langchain.tools import tool
from langgraph.types import Command

@tool
def write_file(path: str, content: str):
    """Write content to file."""
    with open(path, 'w') as f:
        f.write(content)
    return f"Successfully wrote to {path}"

@tool
def execute_sql(query: str):
    """Execute SQL query. Use this tool for any database related question."""
    return f"Would execute: {query}"

agent = create_agent(
    model=model,
    tools=[write_file, execute_sql],
    checkpointer=InMemorySaver(),
    middleware=[HumanInTheLoopMiddleware(
        interrupt_on={
            "write_file": True,                                   # approve, edit, or reject
            "execute_sql": {"allowed_decisions": ["approve", "reject"]},  # no editing
        },
        description_prefix="Tool execution pending approval"
    )]
)

When a guarded tool is about to run, the result contains an __interrupt__. Resume with a decision:

PYTHON
config = {"configurable": {"thread_id": "hitl_approve_1"}}
result = agent.invoke({"messages": [HumanMessage("Write 'Hello World' to data/test.txt")]}, config=config)

if "__interrupt__" in result:
    print("Interrupt:", result['__interrupt__'][0].value['action_requests'][0])
    result = agent.invoke(Command(resume={"decisions": [{"type": "approve"}]}), config=config)
OUTPUT
Interrupt: {'name': 'write_file', 'args': {'path': 'data/test.txt', 'content': 'Hello World'}, ...}

To edit before running, resume with {"type": "edit", "edited_action": {...}} — for example redirecting the write to data/earth_essay.txt. To reject a dangerous action like "delete all records", resume with {"type": "reject", "message": "Too dangerous. Use a WHERE clause."} and the agent revises its plan.

Caution

Always guard destructive tools (file writes, SQL execution, payments) with HITL. The reject path lets a human stop an irreversible action and hand corrective feedback back to the agent.

Prompt Engineering Patterns

The system prompt is the single most important lever on agent behavior. These nine patterns cover the techniques used throughout the series.

1. Basic vs detailed prompts. A vague prompt ("You are a helpful assistant") often refuses or hedges. A detailed prompt with role, guidelines, and formatting rules produces concrete, cited answers — the same query yields a far stronger result.

2. Role-based prompts. Define responsibilities and communication style (e.g., a patient support agent that always ends with "Is there anything else I can help with?").

3. Constraint-based prompts. Spell out what the agent cannot do. A medical-information assistant lists allowed actions, forbidden actions, and a mandatory disclaimer:

PYTHON
constrained_prompt = """You are a medical information assistant.

What you CAN do:
- Provide general health information and explain terminology

What you CANNOT do:
- Diagnose conditions, prescribe medication, or replace professional advice

ALWAYS include disclaimer: "This is not medical advice. Consult a healthcare professional."
"""

4. Few-shot prompting. Show examples of the exact output format you want, and the agent mirrors them:

PYTHON
few_shot_prompt = """You are a product description writer for an e-commerce site.

Format like these examples:

Example 1:
Product: Wireless Mouse
Description: Glide through your workday with our ergonomic wireless mouse.
- 18-month battery life | 6 programmable buttons | Up to 30ft range
Perfect for: Professionals, gamers, everyday users

Use this format for all product descriptions.
"""

5. Context-aware prompts. Inject dynamic context (current date, user tier) into the prompt with an f-string so the agent tailors responses and reasons about "today."

6. Tool-usage guidance. Tell the agent explicitly when to use a tool and when not to — always search for current prices and news, never search for math or pre-2025 historical facts.

7. Output-format control. Provide an exact template (headings, bullets, fields) and instruct the model to never deviate, producing consistent, parseable output:

PYTHON
format_control_prompt = """You are a stock analysis agent.

ALWAYS format your analysis exactly like this:

## [COMPANY NAME] Analysis
**Current Price:** $XXX.XX
**Change:** ±X.XX%
### Key Metrics
• Market Cap: $XXX billion
### Recommendation
BUY | HOLD | SELL

NEVER deviate from this format.
"""

8. Chain-of-thought prompting. Ask the agent to reason step by step — Understand → Break down → Gather data → Analyze → Conclude — for transparent, auditable answers.

9. Reusable templates. Keep a library of parameterized prompt templates (customer support, data analyst, content creator) so teams stay consistent.

Tip

Best practices: be specific, show examples, set boundaries, structure information, provide context, and test iteratively. Avoid being vague, over-complicating, or contradicting yourself within a single prompt.

With these fundamentals — tools, memory, streaming, middleware, guardrails, and prompts — you have everything needed to build real agents. The next lesson puts them to work by connecting an agent to external services through the Model Context Protocol (MCP) in Build a Hotel Search AI Agent with MCP.

Found this useful? Keep building with me.

New tutorials every week on YouTube — or go deeper with a full structured course.

Find this tutorial useful?

Subscribe to our YouTube channels for more practical production walk-throughs.

Discussion & Comments