Gemini 3 is Google's most capable model family, and LangChain is the framework that turns it into a programmable agent. This first lesson sets up everything you need — API keys, tracing, and the full feature surface of Gemini — so the projects that follow have a solid foundation.
We cover API key setup, LangSmith tracing, and then every core Gemini capability through LangChain: basic messaging, streaming, multimodal input (images, PDFs), tool calling, reasoning control, built-in tools, context caching, and image generation. We close with the message-state lifecycle that every agent follows.
Note
This is the foundation post for the AI Agent Projects series. Later lessons such as LangChain Agent Fundamentals assume your keys and environment are working as shown here.
Creating a Gemini API Key
Gemini API access is free to start through Google AI Studio.
- Go to Google AI Studio and sign in with your Google account.
- Click Get API Key in the left sidebar, then Create API Key.
- Select an existing Google Cloud project or create a new one, then click Create API key.
- Copy the key immediately and store it securely.
Add the key to a .env file in your project root:
GOOGLE_API_KEY=your_gemini_api_key_here
Tip
Gemini API pricing and free-tier limits are listed at the official pricing page: https://ai.google.dev/gemini-api/docs/pricing
Creating LangSmith API Keys
LangSmith records every model call, tool invocation, and token count so you can debug agents visually.
- Go to LangSmith and sign up with GitHub, Google, or email.
- Open your profile menu → Settings → API Keys.
- Click Create API Key, give it a descriptive name, and copy it immediately — it is shown only once.
- Note your project name under Projects (the default is usually
default).
Add the LangSmith configuration to the same .env file:
LANGSMITH_API_KEY="your_api_key_here"
LANGSMITH_TRACING=true
LANGSMITH_ENDPOINT="https://api.smith.langchain.com"
LANGCHAIN_PROJECT="multi-agent-deep-rag"
Verifying the Setup
Load the environment variables and confirm both keys are visible to Python:
import os
from dotenv import load_dotenv
load_dotenv()
google_api_key = os.getenv("GOOGLE_API_KEY")
langchain_api_key = os.getenv("LANGSMITH_API_KEY")
if google_api_key:
print("Gemini API Key loaded successfully")
else:
print("Gemini API Key not found")
if langchain_api_key:
print("Langsmith API Key loaded successfully")
else:
print("Langsmith API Key not found")
Gemini API Key loaded successfully
Langsmith API Key loaded successfully
Test the raw Gemini client with a quick generation call:
from google import genai
client = genai.Client()
response = client.models.generate_content(
model="gemini-3-pro-preview",
contents="Explain how AI works in a few words",
)
print(response.text)
It processes massive amounts of data to recognize patterns and make predictions.
In short: Data + Math = Predictions.
Confirm that LangSmith can connect and see your project:
from langsmith import Client
client = Client()
try:
projects = list(client.list_projects(limit=1))
print("Langsmith connection successful!")
print(f"Connected to project: {os.getenv('LANGCHAIN_PROJECT')}")
except Exception as e:
print(f"Langsmith connection failed: {e}")
Langsmith connection successful!
Connected to project: multi-agent-deep-rag
Gemini 3 Model Family Overview
Gemini 3 Pro is built on state-of-the-art reasoning, with a dynamic thinking process and very large context windows.
Key features:
- Advanced reasoning with configurable thinking levels
- 1M token context — up to 1 million input tokens, 64k output tokens
- Multimodal — images, PDFs, audio, and video with granular resolution control
- Knowledge cutoff of January 2025
- Image generation at up to 4K resolution
| Model | Context (In/Out) | Best For |
|---|---|---|
gemini-3-pro-preview |
1M / 64k | Complex reasoning, coding, analysis |
gemini-3-pro-image-preview |
65k / 32k | Image generation & editing |
gemini-2.5-flash |
1M / 8k | Fast, cost-effective tasks |
New controls introduced in Gemini 3:
- Thinking level — control reasoning depth (
loworhigh). - Media resolution — granular control per media type (
low,medium,high,ultra_high). - Temperature — keep at the default
1.0; changing it can degrade performance. - Thought signatures — reasoning context is preserved automatically.
Basic Usage
Import the chat model and message types. We define two model identifiers so we can compare a Gemini 3 model against a Gemini 2.5 model:
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.messages import HumanMessage, SystemMessage, AIMessage, ToolMessage
gemini3 = 'gemini-3-flash-preview' # or gemini-3-pro-preview
gemini2 = 'gemini-2.5-flash'
system_msg = SystemMessage("You are a helpful AI Assistant")
query = HumanMessage("Explain the theory of relativity in the simple terms")
messages = [system_msg, query]
Invoke the Gemini 3 model with the message list:
model = ChatGoogleGenerativeAI(model=gemini3)
response = model.invoke(messages)
Gemini returns structured content blocks. Inspect them with response.content or response.content_blocks:
response.content
[{'type': 'text', 'text': 'To understand the Theory of Relativity, it helps to break it down into two parts: Special Relativity (speed and time) and General Relativity (gravity)...'}]
Switch to the Gemini 2.5 model with the same messages, then inspect token usage and response metadata:
model = ChatGoogleGenerativeAI(model=gemini2)
response = model.invoke(messages)
response.usage_metadata
{'input_tokens': 18, 'output_tokens': 2304, 'total_tokens': 2322, 'input_token_details': {'cache_read': 0}, 'output_token_details': {'reasoning': 1114}}
response.response_metadata
{'finish_reason': 'STOP', 'model_name': 'gemini-2.5-flash', 'safety_ratings': [], 'model_provider': 'google_genai'}
Streaming
For long responses, stream tokens as they are generated instead of waiting for the full reply:
model = ChatGoogleGenerativeAI(model=gemini2)
query = "Explain the theory of relativity in the simple terms."
for chunk in model.stream(query):
print(chunk.text, end="", flush=True)
Imagine you're trying to understand how the universe works at really high speeds or near
really massive objects. That's what Einstein's Theory of Relativity helps us do.
### 1. Special Relativity (1905): The Rules for Constant Speed
- The laws of physics are the same for everyone in constant motion.
- The speed of light is constant for every observer.
...
Multimodal Capabilities
Gemini processes images, PDFs, audio, and video alongside text. Pass a HumanMessage whose content is a list of typed blocks.
Image from a URL
model = ChatGoogleGenerativeAI(model=gemini3)
human_msg = HumanMessage(
[
{'type': 'text', 'text': 'Describe the image provided'},
{'type': 'image',
'url': 'https://www.shutterstock.com/image-vector/vector-cute-baby-panda-cartoon-600nw-2427356853.jpg'}
]
)
response = model.invoke([system_msg, human_msg])
print(response.text)
A high-contrast, cartoon-style illustration of a cute giant panda sitting upright.
It is rendered in a clean vector style with bold black outlines, large round white head,
black eye patches, pink-centered ears, surrounded by tall green blades of grass on a
solid white background.
Image from a Local File
Base64-encode a local image (or PDF/audio) before sending it. Read the file as bytes and encode:
import base64
mime_type = "image/png"
image_bytes = open("data/images/panda.png", 'rb').read()
bytes_base64 = base64.b64encode(image_bytes).decode('utf-8')
Send the encoded image with a base64 block and its mime_type:
model = ChatGoogleGenerativeAI(model=gemini2)
human_msg = HumanMessage(
[
{'type': 'text', 'text': 'Describe the image provided'},
{'type': 'image',
'base64': bytes_base64,
"mime_type": mime_type}
]
)
response = model.invoke([system_msg, human_msg])
response.pretty_print()
================================== Ai Message ==================================
This image is a vibrant cartoon illustration of a baby panda sitting amongst green
foliage on a white background — thick clean outlines, iconic black-and-white fur,
large expressive eyes, a friendly smile, and soft pink paw pads. Overall impression:
cheerful, friendly, and innocent.
Note
Local file paths in this lesson use relative paths like data/images/panda.png. Keep your data files inside your project folder so the same code runs on any machine.
PDF Document Analysis
PDFs use the file block type with mime_type set to application/pdf. For PDFs, a medium media resolution is recommended.
pdf_bytes = open(r'data\apple 10-q q1 2024.pdf', 'rb').read()
pdf_base64 = base64.b64encode(pdf_bytes).decode('utf-8')
mime_type = "application/pdf"
human_msg = HumanMessage(
[
{'type': 'text', 'text': 'summarize the key financial highlights from this quarterly report.'},
{'type': 'file',
'base64': pdf_base64,
'mime_type': mime_type}
]
)
model = ChatGoogleGenerativeAI(model=gemini2)
response = model.invoke([system_msg, human_msg])
print(response.text)
Based on Apple's Form 10-Q for the period ended March 30, 2024:
- Total net sales decreased ~4% to $90.753 billion.
- Product net sales fell ~9.5% to $66.886 billion (iPhone -10%, iPad -17%, Mac +4%).
- Services net sales grew ~14% to $23.867 billion.
- Net income was $23.636 billion; basic EPS flat at $1.53.
- Gross margin improved to 46.6% from 44.3%.
- Apple repurchased $44.0B of stock and paid $7.5B dividends over six months.
- The Board authorized an additional $110B for buybacks and raised the dividend to $0.25.
Important
Windows path strings often contain backslashes. Use a raw string (r'data\apple 10-q q1 2024.pdf') so \a and similar sequences are not interpreted as escape characters.
On Linux/macOS: paths use forward slashes, so a raw string is unnecessary — 'data/apple 10-q q1 2024.pdf' works directly.
Tool Calling (Function Calling)
Bind custom tools to the model so it can fetch live data. This series ships two reusable tools — web_search and get_weather — in a shared scripts/base_tools.py module.
import sys
sys.path.append('../')
from scripts import base_tools
Each tool can be invoked directly to confirm it works:
response = base_tools.web_search.invoke({'query': "what is the latest stock news"})
print(response[0].content[:200])
UAE stocks sell off as markets reopen ... investors track ongoing market events ...
Bind both tools to the model. When you ask a question that needs them, Gemini returns tool_calls instead of a final answer:
model = ChatGoogleGenerativeAI(model=gemini2)
model_with_tools = model.bind_tools([base_tools.web_search, base_tools.get_weather])
response = model_with_tools.invoke(
"What is the weather in mumbai? and What is the US stock news today?"
)
response.tool_calls
[{'name': 'get_weather', 'args': {'location': 'Mumbai'}, 'id': 'call_1', 'type': 'tool_call'}, {'name': 'web_search', 'args': {'query': 'US stock news today'}, 'id': 'call_2', 'type': 'tool_call'}]
The model correctly issues two parallel tool calls — one per sub-question. Executing those tools and feeding the results back is exactly what an agent does for you automatically, as you will see in the next lesson.
Thinking Support (Reasoning)
Gemini exposes its reasoning process. Control it with thinking_budget (token count) or thinking_level (low/high), and reveal it with include_thoughts=True.
model = ChatGoogleGenerativeAI(model=gemini2,
thinking_budget=100,
include_thoughts=True)
response = model.invoke(query)
response.content_blocks
[{'type': 'reasoning', 'reasoning': '**Relativity: Breaking It Down for Clarity** ... I need relatable analogies, avoiding jargon ...'}, {'type': 'text', 'text': "Imagine you're trying to understand how time, space, and gravity work ..."}]
Setting thinking_budget=0 disables reasoning entirely, which lowers output tokens and latency:
model = ChatGoogleGenerativeAI(model=gemini2,
thinking_budget=0,
include_thoughts=True)
response = model.invoke(query)
response.usage_metadata
{'input_tokens': 11, 'output_tokens': 594, 'total_tokens': 605, 'input_token_details': {'cache_read': 0}}
Tip
Full reasoning controls are documented at: https://ai.google.dev/gemini-api/docs/thinking
Built-in Tools
Gemini ships native tools — Google Search and Code Execution — that require no setup. Bind them by name:
model = ChatGoogleGenerativeAI(model=gemini2)
model_with_tools = model.bind_tools([{'google_search': {}}, {'code_execution': {}}])
query = "When is the next total solar eclipse in the US and what is 3 + 2?"
response = model_with_tools.invoke(query)
print(response.text)
The answer to 3 + 2 is 5. The next total solar eclipse in the US will be on March 30,
2033 (Alaska). The next visible in the contiguous US is August 23, 2044 (Montana and
the Dakotas), followed by a widespread eclipse on August 12, 2045.
The model used code execution to compute 3 + 2 and Google Search to look up eclipse dates — both in a single call. Inspect response.content_blocks to see the server_tool_call and server_tool_result entries.
Context Caching
Caching large documents avoids re-sending and re-billing the same tokens on every query. A cache needs at least 2,048 tokens.
import time
from google import genai
from google.genai.types import CreateCachedContentConfig, Content, Part
client = genai.Client()
Upload the documents and wait until each finishes processing:
file_paths = [
"data/apple 10-q q1 2024.pdf",
"data/apple 10-q q2 2024.pdf"
]
uploaded_files = []
for path in file_paths:
file = client.files.upload(file=path)
while file.state.name == "PROCESSING":
time.sleep(2)
file = client.files.get(name=file.name)
uploaded_files.append(file)
Wrap the uploaded files as content parts and create a cache with system instructions and a 30-minute TTL:
parts = [Part.from_uri(file_uri=f.uri, mime_type=f.mime_type) for f in uploaded_files]
contents = [Content(role='user', parts=parts)]
cache = client.caches.create(
model=gemini2,
config=CreateCachedContentConfig(
display_name='Apple Q1 Q2 2024 reports',
system_instruction="You are a financial analyst. Use these Apple quarterly reports to answer questions.",
contents=contents,
ttl='1800s'
)
)
Point a model at the cache by passing cached_content. The first query reads the cached tokens — notice cache_read in the usage metadata:
model = ChatGoogleGenerativeAI(
model=gemini2,
cached_content=cache.name
)
query = "Compare the revenue growth between Q1 and Q2 2024."
response = model.invoke(query)
response.usage_metadata
{'input_tokens': 14482, 'output_tokens': 1822, 'total_tokens': 16304, 'input_token_details': {'cache_read': 14465}, 'output_token_details': {'reasoning': 1504}}
Every subsequent query reuses the same cache, so the 14,465 document tokens are read from cache rather than re-processed — cutting cost and latency.
Note
Caching details and pricing are documented at: https://ai.google.dev/gemini-api/docs/caching
Image Generation
Generate images up to 4K with gemini-3-pro-image-preview. Supported aspect ratios are 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, and 21:9; resolutions are 1K, 2K, and 4K.
from langchain_google_genai import Modality
from IPython.display import Image
image_model = ChatGoogleGenerativeAI(model="gemini-3-pro-image-preview")
image_content = f"Create a professional infographic with this data:\n\n{response.text}"
image_response = image_model.invoke(
image_content, response_modalities=[Modality.TEXT, Modality.IMAGE]
)
Decode the returned base64 image to display or save it:
display(Image(base64.b64decode(image_response.content_blocks[0]['base64'])))
with open("data/images/apple_info.png", 'wb') as f:
f.write(base64.b64decode(image_response.content_blocks[0]['base64']))
How an Agent Moves Through States
Everything above are the building blocks. An agent chains them together, moving through a sequence of message states instead of answering in one step.
Consider the query "tell me about the apple news":
- Initial input — a
HumanMessagecarries the user query; no tools used yet. - Tool decision — the model returns an
AIMessagewith atool_callsuch asweb_search({"query": "Apple Inc. news"}). - Tool execution — a
ToolMessagereturns raw articles and metadata from the web. - Processing — the model reads the tool output and extracts the key points.
- Final answer — a final
AIMessageconverts the raw data into a clean, human-readable summary.
The full flow is:
HumanMessage
↓
AIMessage (tool call)
↓
ToolMessage (results)
↓
AIMessage (final answer)
This loop — decide, act, observe, respond — is the heart of every agent in this series. In the next lesson, LangChain Agent Fundamentals, we wire these states into a real agent with create_agent, memory, streaming, and guardrails.