An AI agent combines a language model with tools to create a system that reasons, decides, and works toward a solution iteratively. This lesson is the complete fundamentals reference for the series — every building block you need before assembling real projects.
We move from a one-line agent to tools, short- and long-term memory, streaming, production middleware, guardrails, human-in-the-loop approval, and prompt engineering. All examples use create_agent from LangChain with Google's Gemini models.
Note
This lesson assumes your environment is set up. If you have not configured your Gemini and LangSmith keys yet, start with Getting Started with Gemini 3 & LangChain.
What Is an AI Agent?
An agent has three core components:
- Model / LLM — the reasoning engine.
- System prompt — instructions that guide behavior.
- Message history — the conversation context.
Create your first agent with a model and a system prompt:
import os
from dotenv import load_dotenv
load_dotenv()
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.agents import create_agent
from langchain.messages import HumanMessage
model = ChatGoogleGenerativeAI(model='gemini-2.5-flash')
system_prompt = "You are a helpful assistant that provides concise and accurate responses."
agent = create_agent(model=model, system_prompt=system_prompt)
Invoke the agent with a messages key. The agent returns the full message list; the last message holds the answer:
query = "Tell me 3 facts about the earth?"
response = agent.invoke({'messages': HumanMessage(query)})
print(response['messages'][-1].text)
Here are 3 facts about Earth:
1. It is the only planet known to harbor life.
2. Approximately 71% of its surface is covered by water.
3. It is an oblate spheroid, bulging at the equator and flattened at the poles.
Model and System Prompt Configuration
The system prompt defines the agent's role. A detailed prompt produces sharper, more consistent answers. Here we configure a Gemini 3 model with low thinking depth and a financial-analyst persona:
system_prompt = """You are a financial analyst specializing in tech stocks.
Guidelines:
- Provide data-driven analysis
- Keep responses concise (2-3 paragraphs max)
- Present numbers with proper formatting ($XXX.XX)
- Avoid speculation without data
"""
model = ChatGoogleGenerativeAI(model='gemini-3-flash-preview',
thinking_level='low',
include_thoughts=True)
agent = create_agent(model=model, system_prompt=system_prompt)
response = agent.invoke({'messages': "What was Apple's earning in 2020?"})
print(response['messages'][-1].text)
For the fiscal year ending September 26, 2020, Apple reported total revenue of
$274.52 billion, a 6% increase over $260.17 billion in 2019. Net income was
$57.41 billion, with diluted EPS of $3.28 and a gross margin near 38.2%. Growth
was driven by Services ($53.77B) and Wearables (+25% to $30.62B).
Role-Based Agents
The same model behaves very differently depending on its role. Compare a support agent with a technical expert on the identical question:
support_prompt = """You are a friendly customer support agent.
- Use simple language (avoid jargon)
- Ask clarifying questions when needed
- Maintain a warm, empathetic tone
"""
support_agent = create_agent(model=model, system_prompt=support_prompt)
response = support_agent.invoke({'messages': [HumanMessage('I can not login into my account')]})
print(response['messages'][-1].text)
Hi there! I'm sorry you're having trouble getting into your account. To help, could
you tell me what you see on screen (for example "incorrect password")? Are you on our
website or the mobile app? Have you tried the "Forgot Password" link yet?
tech_prompt = """You are a technical expert.
- Provide detailed technical responses
- Use precise terminology
- Include code examples when relevant
"""
tech_agent = create_agent(model=model, system_prompt=tech_prompt)
response = tech_agent.invoke({'messages': [HumanMessage('I can not login into my account')]})
print(response['messages'][-1].text)
First identify the error state: 401 Unauthorized (bad credentials/expired token),
403 Forbidden (account locked), 429 Too Many Requests (rate limiting), or 500/503
(auth service down). For web logins, clear stale cookies/LocalStorage or use an
incognito window, and verify NTP clock sync for TOTP/JWT validation...
Giving Agents Tools
Tools let agents take actions. This series ships two reusable tools in scripts/base_tools.py — web_search (live web search) and get_weather (current weather). The agent picks a tool based on its docstring, so clear descriptions matter.
import warnings
warnings.filterwarnings('ignore')
import sys
sys.path.append('../')
from scripts import base_tools
model = ChatGoogleGenerativeAI(model='gemini-2.5-flash')
agent = create_agent(model=model,
tools=[base_tools.web_search, base_tools.get_weather],
system_prompt='You are a helpful AI assistant.')
response = agent.invoke({'messages': [HumanMessage('what is the current weather in Mumbai?')]})
print(response['messages'][-1].text)
The current weather in Mumbai is around 30°C with overcast/misty skies, light
north-westerly winds, and moderate humidity.
Tip
Invoke a tool directly with a dict argument to test it — base_tools.web_search.invoke({'query': 'kgp talkie'}). Do not call the tool object like a function (base_tools.web_search({'query': ...})); that is the wrong calling convention.
Sequential vs Parallel Tool Calls
The model decides whether tools run one after another or together based on how the request is phrased. "...then..." implies a sequence; "...also..." implies parallel calls.
# Sequential — "then"
agent.invoke({'messages': [HumanMessage('Tell me news about the Apple stock then tell me weather in Mumbai')]})
# Parallel — "also"
response = agent.invoke({'messages': [HumanMessage('Tell me news about the Apple stock also tell me weather in Mumbai')]})
response['messages'][1].tool_calls
[{'name': 'web_search', 'args': {'query': 'Apple stock news'}, 'id': '...', 'type': 'tool_call'}, {'name': 'get_weather', 'args': {'location': 'Mumbai'}, 'id': '...', 'type': 'tool_call'}]
Tool Error Handling
Wrap tool calls with @wrap_tool_call middleware to catch exceptions and return a graceful message instead of crashing the agent:
from langchain.tools import tool
from langchain.agents.middleware import wrap_tool_call
from langchain.messages import ToolMessage
@tool
def divide(a: float, b: float):
"""Divide the two numbers"""
return a / b
@wrap_tool_call
def handle_tool_errors(request, handler):
try:
return handler(request)
except Exception as e:
return ToolMessage(
content=f"Error: {str(e)}. Try different Input.",
tool_call_id=request.tool_call['id']
)
agent = create_agent(model=model,
tools=[base_tools.web_search, base_tools.get_weather, divide],
system_prompt='You are a helpful AI assistant.',
middleware=[handle_tool_errors])
agent.invoke({'messages': [HumanMessage('what is the current weather in Mumbai and what is 1/0?')]})
The 1/0 division raises an exception that the middleware converts into a recoverable ToolMessage, so the agent still answers the weather part.
Accessing Agent State from a Tool
A tool can read the running agent state and an immutable user context using ToolRuntime:
from langchain.tools import ToolRuntime
from dataclasses import dataclass
@tool
def get_message_count(runtime: ToolRuntime):
"""Get the total number of messages exchanged in the conversation."""
messages = runtime.state['messages']
context = runtime.context
return f"User '{context.user_id}' with Session '{context.session_id}' has '{len(messages)}' messages."
@dataclass
class UserContext:
user_id: str
session_id: str
agent = create_agent(model=model,
tools=[base_tools.get_weather, get_message_count],
system_prompt='You are a helpful AI assistant.',
context_schema=UserContext)
user_context = UserContext(user_id='kgptalkie', session_id='session_1')
agent.invoke({'messages': [HumanMessage('weather in Mumbai then how many messages are in this conversation')]},
context=user_context)
Short-Term Memory
Without a checkpointer, an agent forgets everything between calls:
agent = create_agent(model=model, system_prompt=system_prompt)
agent.invoke({'messages': [HumanMessage("My name is Laxmi Kant")]})
response = agent.invoke({'messages': [HumanMessage("What's my name?")]})
print(response['messages'][-1].content)
I don't know your name. You haven't told it to me yet!
A checkpointer persists conversation history per thread. Use SQLite for development and PostgreSQL for production.
| Type | Use Case | Setup |
|---|---|---|
| SQLite | Development, testing | Simple file-based |
| PostgreSQL | Production, multi-user | Database connection |
SQLite Checkpointer
from langgraph.checkpoint.sqlite import SqliteSaver
import sqlite3
os.makedirs('db', exist_ok=True)
conn = sqlite3.connect("db/31_checkpoints.db", check_same_thread=False)
checkpointer = SqliteSaver(conn)
checkpointer.setup()
config = {"configurable": {"thread_id": "user_123"}}
agent = create_agent(model=model, system_prompt=system_prompt, checkpointer=checkpointer)
agent.invoke({'messages': [HumanMessage("My name is Laxmi Kant")]}, config=config)
response = agent.invoke({'messages': [HumanMessage("What's my name?")]}, config=config)
print(response['messages'][-1].content)
Your name is Laxmi Kant.
The thread_id isolates sessions. A different thread_id starts a fresh conversation, and you can inspect any thread's saved state with agent.get_state(config=config).
Note
On Linux/macOS: the same code runs unchanged. Only the database file path differs by convention — use forward slashes like db/31_checkpoints.db.
PostgreSQL Checkpointer
For production, swap SqliteSaver for PostgresSaver. Set a POSTGRESQL_URL in your .env:
from langgraph.checkpoint.postgres import PostgresSaver
import psycopg
pg_conn = psycopg.connect(os.getenv("POSTGRESQL_URL"), autocommit=True)
checkpointer = PostgresSaver(pg_conn)
checkpointer.setup()
config = {"configurable": {"thread_id": "user_123"}}
agent = create_agent(model=model, system_prompt=system_prompt, checkpointer=checkpointer)
Context Offloading: Read and Modify State from Tools
For very long conversations you can offload context to disk. A tool reads the running state and writes a summary, scoped per user and thread:
from langchain.tools import tool, ToolRuntime
from pathlib import Path
@tool
def save_conversation_summary(summary: str, runtime: ToolRuntime):
"""Save conversation summary to disk for context offloading."""
user_id = runtime.context.user_id
thread_id = runtime.context.thread_id
summary_dir = Path(f"data/{user_id}/{thread_id}")
summary_dir.mkdir(parents=True, exist_ok=True)
summary_path = summary_dir / "summary.md"
summary_path.write_text(summary)
return f"Summary saved to {summary_path}"
A second tool loads a saved summary back into state. It returns a Command that clears existing messages and injects the summary as fresh context:
from langchain.messages import RemoveMessage, ToolMessage
from langgraph.graph.message import REMOVE_ALL_MESSAGES
from langgraph.types import Command
@tool
def load_conversation_summary(runtime: ToolRuntime):
"""Load previous conversation summary from disk."""
user_id = runtime.context.user_id
thread_id = runtime.config['configurable']['thread_id']
summary_path = Path(f"data/{user_id}/{thread_id}/summary.md")
if not summary_path.exists():
return Command(update={'messages': [
ToolMessage("No previous summary found.", tool_call_id=runtime.tool_call_id)]})
summary_text = summary_path.read_text()
messages = runtime.state.get('messages', [])
last_ai_message = messages[-1] if messages else None
new_messages = [
RemoveMessage(id=REMOVE_ALL_MESSAGES),
HumanMessage(f"Previous conversation summary:\n{summary_text}"),
]
if last_ai_message:
new_messages.append(last_ai_message)
new_messages.append(ToolMessage("Successfully loaded previous summary.", tool_call_id=runtime.tool_call_id))
return Command(update={'messages': new_messages})
This pattern keeps the active context small while preserving the meaning of earlier turns.
Long-Term Memory
Short-term memory lives in a checkpointer and lasts a session. Long-term memory lives in a store and persists across sessions and threads — ideal for user preferences and facts. With embeddings, the store also supports semantic search.
| Type | Storage | Use Case | Persistence |
|---|---|---|---|
| Short-term | Checkpointer | Conversation history | Session |
| Long-term | Store | User preferences, facts | Cross-session |
Configure a PostgresStore with a Gemini embedding function:
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langgraph.store.postgres import PostgresStore
import psycopg
embeddings = GoogleGenerativeAIEmbeddings(model='gemini-embedding-001')
def embed(texts: list[str]):
return embeddings.embed_documents(texts, output_dimensionality=768)
pg_conn = psycopg.connect(os.getenv('POSTGRESQL_URL'), autocommit=True)
store = PostgresStore(pg_conn, index={'embed': embed, 'dims': 768})
store.setup()
Define memory tools that read and write the store through runtime.store, organized into hierarchical namespaces:
from langchain.tools import tool, ToolRuntime
from dataclasses import dataclass
@dataclass
class UserContext:
user_id: str
@tool
def save_user_memory(category: str, information: dict, runtime: ToolRuntime):
"""Save user preference or information to long-term memory.
Examples:
category='food', information={'diet': 'vegetarian', 'likes': ['pasta']}
category='work', information={'role': 'Data Scientist', 'interests': ['AI', 'ML']}
"""
store = runtime.store
user_id = runtime.context.user_id
namespace = (user_id, "preferences")
store.put(namespace=namespace, key=category, value=information)
return f"Saved {category} preferences for {user_id}"
@tool
def get_user_memory(category: str, runtime: ToolRuntime):
"""Retrieve user preference or information from long-term memory."""
store = runtime.store
user_id = runtime.context.user_id
namespace = (user_id, 'preferences')
item = store.get(namespace=namespace, key=category)
if item:
return f"{category}: {item.value}"
return f"No '{category}' information found"
Wire both the checkpointer (short-term) and the store (long-term) into the agent:
from langgraph.checkpoint.postgres import PostgresSaver
checkpointer = PostgresSaver(psycopg.connect(os.getenv("POSTGRESQL_URL"), autocommit=True))
store = PostgresStore(psycopg.connect(os.getenv('POSTGRESQL_URL'), autocommit=True),
index={'embed': embed, 'dims': 768})
agent = create_agent(
model=model,
tools=[base_tools.web_search, save_user_memory, get_user_memory],
checkpointer=checkpointer,
store=store,
context_schema=UserContext,
system_prompt="You are a helpful assistant with long-term memory."
)
In one session the agent saves facts; in a brand-new session (different thread) it can still recall them, because the store is independent of the thread. You can also query the store directly with semantic search — matching by meaning, not keywords:
namespace = ('kgptalkie', 'preferences')
memories = store.search(namespace, query="What does the user like to eat?", limit=2)
for m in memories:
print(f"{m.key}: {m.value}")
food: {'diet': 'vegetarian', 'likes': ['pasta']}
work: {'role': 'Data Scientist', 'interests': ['AI', 'ML']}
Streaming and Structured Output
Streaming keeps interfaces responsive. LangGraph agents support three stream modes:
| Mode | Use Case | Returns |
|---|---|---|
| messages | Real-time token display | Message chunks as generated |
| updates | Debugging agent flow | Node name + output after each node |
| values | Track full state | Complete state snapshot after each step |
from langgraph.checkpoint.sqlite import SqliteSaver
import sqlite3
conn = sqlite3.connect("db/5_streaming_agent.db", check_same_thread=False)
agent = create_agent(model=model,
tools=[base_tools.web_search, base_tools.get_weather],
checkpointer=SqliteSaver(conn))
config = {'configurable': {'thread_id': '05_session_1'}}
# updates mode — see each node fire
for chunk in agent.stream(
{'messages': [HumanMessage('what is the weather in Mumbai?')]},
config=config, stream_mode='updates'):
print(chunk)
The series also includes a helper, scripts/agent_utils.stream_agent_response, that prints tool calls, tool responses, and the final text cleanly:
from scripts import agent_utils
agent_utils.stream_agent_response(agent, 'what is the weather in mumbai', '5_session_3')
Tool Called: get_weather
Args: {'location': 'Mumbai'}
Tool Response: {"location": {"name": "Mumbai", ...
Tool Result (length: 858 chars)
The weather in Mumbai is 30.1°C and overcast. Wind 7.9 kph WNW, humidity 55%, UV index 7.1.
Structured Output with Pydantic
Pass a Pydantic model as response_format to get type-safe, validated output:
from pydantic import BaseModel, Field
from typing import Optional
class FinancialAnalysis(BaseModel):
company: str = Field(description="Company name")
stock_symbol: str = Field(description="Stock ticker")
current_price: Optional[str] = Field(description="Current price", default=None)
analysis: str = Field(description="Brief analysis")
recommendation: str = Field(description="Buy/Hold/Sell")
agent = create_agent(model=model, tools=[base_tools.web_search], response_format=FinancialAnalysis)
response = agent.invoke({'messages': [HumanMessage('tell me latest news about Apple stock')]})
response['structured_response'].model_dump()
{'company': 'Apple Inc.', 'stock_symbol': 'AAPL', 'current_price': 'Around $212 - $260', 'analysis': 'Apple competes with Google on AI; mixed near-term trading...', 'recommendation': 'Hold'}
Production Middleware
Middleware adds production capabilities without changing your agent logic. You attach middleware via the middleware=[...] argument.
Trim and Delete Messages
@before_model runs before each model call — perfect for trimming the context window. Here we keep only the first and last message:
from langchain.messages import RemoveMessage
from langgraph.graph.message import REMOVE_ALL_MESSAGES
from langchain.agents.middleware import before_model
from langgraph.runtime import Runtime
from langchain.agents import AgentState
@before_model
def trim_messages(state: AgentState, runtime: Runtime):
"""Keep only the first and last message to fit the context window."""
messages = state['messages']
if len(messages) <= 3:
return None
return {'messages': [RemoveMessage(id=REMOVE_ALL_MESSAGES), messages[0], messages[-1]]}
agent = create_agent(model=model,
tools=[base_tools.web_search, base_tools.get_weather],
checkpointer=InMemorySaver(),
middleware=[trim_messages])
@after_model runs after the model responds — useful for deleting old messages with RemoveMessage(id=m.id).
Summarization Middleware
Instead of dropping old messages, compress them. SummarizationMiddleware triggers a summary once the conversation crosses a length threshold while keeping the most recent messages intact:
from langchain.agents.middleware import SummarizationMiddleware
agent = create_agent(
model=model,
tools=[base_tools.get_weather, base_tools.web_search],
checkpointer=InMemorySaver(),
middleware=[SummarizationMiddleware(
model=ChatGoogleGenerativeAI(model='gemini-3-pro-preview'),
trigger=[('messages', 15)],
keep=("messages", 5)
)]
)
Todo List Middleware
For complex multi-step tasks, TodoListMiddleware() gives the agent a planning and tracking tool. The agent breaks work into todos and marks each one completed, in-progress, or pending as it goes — visible in agent.get_state(config).
Dynamic Model Selection
Route simple turns to a cheap model and complex turns to a stronger one with @wrap_model_call:
from langchain.agents.middleware import wrap_model_call, ModelRequest
basic_model = ChatGoogleGenerativeAI(model='gemini-2.5-flash')
advanced_model = ChatGoogleGenerativeAI(model='gemini-3-pro-preview')
@wrap_model_call
def dynamic_model_selection(request: ModelRequest, handler):
"""Choose model based on conversation complexity."""
count = len(request.state['messages'])
model = advanced_model if count > 5 else basic_model
return handler(request.override(model=model))
Call Limits and Model Fallback
Cap runaway costs with ModelCallLimitMiddleware and ToolCallLimitMiddleware, and improve reliability with ModelFallbackMiddleware:
from langchain.agents.middleware import (
ModelCallLimitMiddleware, ToolCallLimitMiddleware,
TodoListMiddleware, ModelFallbackMiddleware
)
agent = create_agent(
model=basic_model,
tools=[base_tools.web_search, base_tools.get_weather],
checkpointer=InMemorySaver(),
middleware=[
dynamic_model_selection,
TodoListMiddleware(),
ModelCallLimitMiddleware(run_limit=5, exit_behavior='end'),
ToolCallLimitMiddleware(run_limit=5, exit_behavior='continue'),
ModelFallbackMiddleware(first_model=ChatGoogleGenerativeAI(model='gemini-3-flash-preview')),
]
)
Dynamic System Prompt
Adapt the system prompt at runtime based on user context with @dynamic_prompt:
from langchain.agents.middleware import dynamic_prompt, ModelRequest
from dataclasses import dataclass
@dataclass
class UserContext:
user_role: str
@dynamic_prompt
def user_role_prompt(request: ModelRequest):
"""Generate a system prompt based on user role."""
user_role = request.runtime.context.user_role
base = "You are a helpful assistant."
if user_role == 'expert':
return f"{base} Provide detailed technical responses."
elif user_role == "beginner":
return f"{base} Explain concepts simply and avoid jargon."
return base
agent = create_agent(model=model,
tools=[base_tools.web_search, base_tools.get_weather],
checkpointer=InMemorySaver(),
middleware=[user_role_prompt],
context_schema=UserContext)
response = agent.invoke({"messages": [HumanMessage("Explain machine learning")]},
context=UserContext(user_role='beginner'))
print(response['messages'][-1].text)
Machine learning is like teaching a computer to learn from examples. Instead of
giving it rules for every task, you show it lots of data and it finds patterns.
To recognize cats, you show it thousands of pictures rather than describing ears
and whiskers — the more it sees, the better it gets.
Guardrails and Human-in-the-Loop
Production agents need protection against leaking PII, processing secrets, and running dangerous actions.
PII Protection Strategies
PIIMiddleware detects and handles personally identifiable information with several strategies:
| Strategy | Original | After Protection | Description |
|---|---|---|---|
| redact | https://kgptalkie.com |
[REDACTED_URL] |
Removes completely |
| mask | 5105-1051-0510-5100 |
****-****-****-5100 |
Shows last few characters |
| hash | udemy@kgptalkie.com |
<email_hash:8ea1aedb> |
Deterministic hash |
| block | sk-...32 chars |
Execution blocked | Throws error, stops processing |
from langchain.agents.middleware import PIIMiddleware
agent = create_agent(
model=model,
system_prompt="You are a helpful customer service assistant.",
middleware=[
PIIMiddleware("email", strategy="hash", apply_to_input=True),
PIIMiddleware("credit_card", strategy="mask", apply_to_input=True),
PIIMiddleware("url", strategy="redact", apply_to_input=True),
PIIMiddleware("api_key", detector=r"sk-[a-zA-Z0-9]{32}", strategy='mask', apply_to_input=True),
PIIMiddleware("phone", detector=r"\d{3}-\d{3}-\d{4}", strategy="redact", apply_to_input=True),
]
)
response = agent.invoke({'messages': [HumanMessage("""
My email is udemy@kgptalkie.com
My phone is 555-123-4567
My card is 5105-1051-0510-5100
My website is https://kgptalkie.com
""")]})
print(response['messages'][0].content)
My email is <email_hash:8ea1aedb>
My phone is [REDACTED_PHONE]
My card is ****-****-****-5100
My website is [REDACTED_URL]
You can also define custom PII patterns with your own regex detectors — for example employee IDs (EMP-\d{6}) or order IDs (ORD-[A-Z0-9]{6}).
Human-in-the-Loop (HITL)
HumanInTheLoopMiddleware pauses the agent before sensitive tool calls and waits for a human decision.
| Decision | Effect | Use Case |
|---|---|---|
| approve | Execute as-is | Safe operations |
| edit | Modify, then execute | Adjust parameters |
| reject | Block with feedback | Dangerous operations |
from langchain.agents.middleware import HumanInTheLoopMiddleware
from langchain.tools import tool
from langgraph.types import Command
@tool
def write_file(path: str, content: str):
"""Write content to file."""
with open(path, 'w') as f:
f.write(content)
return f"Successfully wrote to {path}"
@tool
def execute_sql(query: str):
"""Execute SQL query. Use this tool for any database related question."""
return f"Would execute: {query}"
agent = create_agent(
model=model,
tools=[write_file, execute_sql],
checkpointer=InMemorySaver(),
middleware=[HumanInTheLoopMiddleware(
interrupt_on={
"write_file": True, # approve, edit, or reject
"execute_sql": {"allowed_decisions": ["approve", "reject"]}, # no editing
},
description_prefix="Tool execution pending approval"
)]
)
When a guarded tool is about to run, the result contains an __interrupt__. Resume with a decision:
config = {"configurable": {"thread_id": "hitl_approve_1"}}
result = agent.invoke({"messages": [HumanMessage("Write 'Hello World' to data/test.txt")]}, config=config)
if "__interrupt__" in result:
print("Interrupt:", result['__interrupt__'][0].value['action_requests'][0])
result = agent.invoke(Command(resume={"decisions": [{"type": "approve"}]}), config=config)
Interrupt: {'name': 'write_file', 'args': {'path': 'data/test.txt', 'content': 'Hello World'}, ...}
To edit before running, resume with {"type": "edit", "edited_action": {...}} — for example redirecting the write to data/earth_essay.txt. To reject a dangerous action like "delete all records", resume with {"type": "reject", "message": "Too dangerous. Use a WHERE clause."} and the agent revises its plan.
Caution
Always guard destructive tools (file writes, SQL execution, payments) with HITL. The reject path lets a human stop an irreversible action and hand corrective feedback back to the agent.
Prompt Engineering Patterns
The system prompt is the single most important lever on agent behavior. These nine patterns cover the techniques used throughout the series.
1. Basic vs detailed prompts. A vague prompt ("You are a helpful assistant") often refuses or hedges. A detailed prompt with role, guidelines, and formatting rules produces concrete, cited answers — the same query yields a far stronger result.
2. Role-based prompts. Define responsibilities and communication style (e.g., a patient support agent that always ends with "Is there anything else I can help with?").
3. Constraint-based prompts. Spell out what the agent cannot do. A medical-information assistant lists allowed actions, forbidden actions, and a mandatory disclaimer:
constrained_prompt = """You are a medical information assistant.
What you CAN do:
- Provide general health information and explain terminology
What you CANNOT do:
- Diagnose conditions, prescribe medication, or replace professional advice
ALWAYS include disclaimer: "This is not medical advice. Consult a healthcare professional."
"""
4. Few-shot prompting. Show examples of the exact output format you want, and the agent mirrors them:
few_shot_prompt = """You are a product description writer for an e-commerce site.
Format like these examples:
Example 1:
Product: Wireless Mouse
Description: Glide through your workday with our ergonomic wireless mouse.
- 18-month battery life | 6 programmable buttons | Up to 30ft range
Perfect for: Professionals, gamers, everyday users
Use this format for all product descriptions.
"""
5. Context-aware prompts. Inject dynamic context (current date, user tier) into the prompt with an f-string so the agent tailors responses and reasons about "today."
6. Tool-usage guidance. Tell the agent explicitly when to use a tool and when not to — always search for current prices and news, never search for math or pre-2025 historical facts.
7. Output-format control. Provide an exact template (headings, bullets, fields) and instruct the model to never deviate, producing consistent, parseable output:
format_control_prompt = """You are a stock analysis agent.
ALWAYS format your analysis exactly like this:
## [COMPANY NAME] Analysis
**Current Price:** $XXX.XX
**Change:** ±X.XX%
### Key Metrics
• Market Cap: $XXX billion
### Recommendation
BUY | HOLD | SELL
NEVER deviate from this format.
"""
8. Chain-of-thought prompting. Ask the agent to reason step by step — Understand → Break down → Gather data → Analyze → Conclude — for transparent, auditable answers.
9. Reusable templates. Keep a library of parameterized prompt templates (customer support, data analyst, content creator) so teams stay consistent.
Tip
Best practices: be specific, show examples, set boundaries, structure information, provide context, and test iteratively. Avoid being vague, over-complicating, or contradicting yourself within a single prompt.
With these fundamentals — tools, memory, streaming, middleware, guardrails, and prompts — you have everything needed to build real agents. The next lesson puts them to work by connecting an agent to external services through the Model Context Protocol (MCP) in Build a Hotel Search AI Agent with MCP.