Getting Started with Gemini 3 & LangChain

Gemini 3 is Google's latest model family, and LangChain wraps it in a consistent agent API. Before we can build agents, we need working API keys and a way to trace calls. In this blog, we set both up, then run through each Gemini feature the rest of the series relies on.

Those features are plain messaging, streaming, multimodal input for images and PDFs, tool calling, reasoning control, the built-in Google tools, context caching, and image generation. The final section shows how they combine into the message loop an agent runs.

Note

This is the first post in the AI Agent Projects series. Later lessons such as LangChain Agent Fundamentals assume your keys and environment work as shown here.

Creating a Gemini API Key

Gemini API access is free to start through Google AI Studio.

Go to Google AI Studio and sign in with your Google account.
Click Get API Key in the left sidebar, then Create API Key.
Select an existing Google Cloud project or create a new one, then click Create API key.
Copy the key immediately and store it securely.

Add the key to a .env file in your project root:

BASH

GOOGLE_API_KEY=your_gemini_api_key_here

Tip

Gemini API pricing and free-tier limits are listed at the official pricing page: https://ai.google.dev/gemini-api/docs/pricing

Creating LangSmith API Keys

LangSmith records every model call, tool invocation, and token count, so we can debug agents visually.

Go to LangSmith and sign up with GitHub, Google, or email.
Open your profile menu → Settings → API Keys.
Click Create API Key, give it a descriptive name, and copy it immediately. It is shown only once.
Note your project name under Projects (the default is usually default).

Add the LangSmith configuration to the same .env file:

BASH

LANGSMITH_API_KEY="your_api_key_here"
LANGSMITH_TRACING=true
LANGSMITH_ENDPOINT="https://api.smith.langchain.com"
LANGCHAIN_PROJECT="multi-agent-deep-rag"

Verifying the Setup

Load the environment variables and confirm both keys are visible to Python:

Configure Gemini and LangSmith keys, then trace every LangChain call

PYTHON

import os
from dotenv import load_dotenv

load_dotenv()

google_api_key = os.getenv("GOOGLE_API_KEY")
langchain_api_key = os.getenv("LANGSMITH_API_KEY")

if google_api_key:
    print("Gemini API Key loaded successfully")
else:
    print("Gemini API Key not found")

if langchain_api_key:
    print("Langsmith API Key loaded successfully")
else:
    print("Langsmith API Key not found")

OUTPUT

Gemini API Key loaded successfully
Langsmith API Key loaded successfully

Test the raw Gemini client with a quick generation call:

PYTHON

from google import genai

client = genai.Client()

response = client.models.generate_content(
    model="gemini-3-pro-preview",
    contents="Explain how AI works in a few words",
)

print(response.text)

OUTPUT

It processes massive amounts of data to recognize patterns and make predictions.

In short: Data + Math = Predictions.

Confirm that LangSmith can connect and see your project:

PYTHON

from langsmith import Client

client = Client()

try:
    projects = list(client.list_projects(limit=1))
    print("Langsmith connection successful!")
    print(f"Connected to project: {os.getenv('LANGCHAIN_PROJECT')}")
except Exception as e:
    print(f"Langsmith connection failed: {e}")

OUTPUT

Langsmith connection successful!
Connected to project: multi-agent-deep-rag

Gemini 3 Model Family Overview

Gemini 3 Pro is the top model in the family. It supports a configurable thinking process and very large context windows.

Key features:

Configurable thinking levels for reasoning depth
Up to 1 million input tokens and 64k output tokens
Multimodal input: images, PDFs, audio, and video, with resolution control
Knowledge cutoff of January 2025
Image generation at up to 4K resolution

Model	Context (In/Out)	Best For
`gemini-3-pro-preview`	1M / 64k	Complex reasoning, coding, analysis
`gemini-3-pro-image-preview`	65k / 32k	Image generation & editing
`gemini-2.5-flash`	1M / 8k	Fast, cost-effective tasks

Gemini 3 adds a few new controls:

Thinking level sets reasoning depth (low or high).
Media resolution sets detail per media type (low, medium, high, ultra_high).
Temperature should stay at the default 1.0; changing it can degrade output.
Thought signatures preserve reasoning context automatically.

Basic Usage

Import the chat model and message types. We define two model identifiers so we can compare a Gemini 3 model against a Gemini 2.5 model:

PYTHON

from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.messages import HumanMessage, SystemMessage, AIMessage, ToolMessage

gemini3 = 'gemini-3-flash-preview'  # or gemini-3-pro-preview
gemini2 = 'gemini-2.5-flash'

system_msg = SystemMessage("You are a helpful AI Assistant")
query = HumanMessage("Explain the theory of relativity in the simple terms")

messages = [system_msg, query]

Invoke the Gemini 3 model with the message list:

PYTHON

model = ChatGoogleGenerativeAI(model=gemini3)
response = model.invoke(messages)

Gemini returns structured content blocks. Inspect them with response.content or response.content_blocks:

PYTHON

response.content

OUTPUT

[{'type': 'text', 'text': 'To understand the Theory of Relativity, it helps to break it down into two parts: Special Relativity (speed and time) and General Relativity (gravity)...'}]

Switch to the Gemini 2.5 model with the same messages, then inspect token usage and response metadata:

PYTHON

model = ChatGoogleGenerativeAI(model=gemini2)
response = model.invoke(messages)

response.usage_metadata

OUTPUT

{'input_tokens': 18, 'output_tokens': 2304, 'total_tokens': 2322, 'input_token_details': {'cache_read': 0}, 'output_token_details': {'reasoning': 1114}}

PYTHON

response.response_metadata

OUTPUT

{'finish_reason': 'STOP', 'model_name': 'gemini-2.5-flash', 'safety_ratings': [], 'model_provider': 'google_genai'}

Streaming

For long responses, stream tokens as they are generated instead of waiting for the full reply:

PYTHON

model = ChatGoogleGenerativeAI(model=gemini2)

query = "Explain the theory of relativity in the simple terms."

for chunk in model.stream(query):
    print(chunk.text, end="", flush=True)

OUTPUT

Imagine you're trying to understand how the universe works at really high speeds or near
really massive objects. That's what Einstein's Theory of Relativity helps us do.

### 1. Special Relativity (1905): The Rules for Constant Speed
- The laws of physics are the same for everyone in constant motion.
- The speed of light is constant for every observer.
...

Multimodal Capabilities

Gemini processes images, PDFs, audio, and video alongside text. Pass a HumanMessage whose content is a list of typed blocks.

Gemini accepts text, images, and PDFs in one multimodal message

Image from a URL

PYTHON

model = ChatGoogleGenerativeAI(model=gemini3)

human_msg = HumanMessage(
    [
        {'type': 'text', 'text': 'Describe the image provided'},
        {'type': 'image',
         'url': 'https://www.shutterstock.com/image-vector/vector-cute-baby-panda-cartoon-600nw-2427356853.jpg'}
    ]
)

response = model.invoke([system_msg, human_msg])
print(response.text)

OUTPUT

A high-contrast, cartoon-style illustration of a cute giant panda sitting upright.
It is rendered in a clean vector style with bold black outlines, large round white head,
black eye patches, pink-centered ears, surrounded by tall green blades of grass on a
solid white background.

Image from a Local File

Base64-encode a local image (or PDF/audio) before sending it. Read the file as bytes and encode:

PYTHON

import base64

mime_type = "image/png"

image_bytes = open("data/images/panda.png", 'rb').read()
bytes_base64 = base64.b64encode(image_bytes).decode('utf-8')

Send the encoded image with a base64 block and its mime_type:

PYTHON

model = ChatGoogleGenerativeAI(model=gemini2)

human_msg = HumanMessage(
    [
        {'type': 'text', 'text': 'Describe the image provided'},
        {'type': 'image',
         'base64': bytes_base64,
         "mime_type": mime_type}
    ]
)

response = model.invoke([system_msg, human_msg])
response.pretty_print()

OUTPUT

================================== Ai Message ==================================

This image is a vibrant cartoon illustration of a baby panda sitting amongst green
foliage on a white background. It has thick clean outlines, iconic black-and-white fur,
large expressive eyes, a friendly smile, and soft pink paw pads. Overall impression:
cheerful, friendly, and innocent.

Note

Local file paths in this lesson use relative paths like data/images/panda.png. Keep your data files inside your project folder so the same code runs on any machine.

PDF Document Analysis

PDFs use the file block type with mime_type set to application/pdf. Use a medium media resolution for PDFs.

PYTHON

pdf_bytes = open(r'data\apple 10-q q1 2024.pdf', 'rb').read()
pdf_base64 = base64.b64encode(pdf_bytes).decode('utf-8')

mime_type = "application/pdf"

human_msg = HumanMessage(
    [
        {'type': 'text', 'text': 'summarize the key financial highlights from this quarterly report.'},
        {'type': 'file',
         'base64': pdf_base64,
         'mime_type': mime_type}
    ]
)

model = ChatGoogleGenerativeAI(model=gemini2)
response = model.invoke([system_msg, human_msg])
print(response.text)

OUTPUT

Based on Apple's Form 10-Q for the period ended March 30, 2024:

- Total net sales decreased ~4% to $90.753 billion.
- Product net sales fell ~9.5% to $66.886 billion (iPhone -10%, iPad -17%, Mac +4%).
- Services net sales grew ~14% to $23.867 billion.
- Net income was $23.636 billion; basic EPS flat at $1.53.
- Gross margin improved to 46.6% from 44.3%.
- Apple repurchased $44.0B of stock and paid $7.5B dividends over six months.
- The Board authorized an additional $110B for buybacks and raised the dividend to $0.25.

Important

Windows path strings often contain backslashes. Use a raw string (r'data\apple 10-q q1 2024.pdf') so \a and similar sequences are not interpreted as escape characters.

On Linux/macOS: paths use forward slashes, so a raw string is unnecessary, and 'data/apple 10-q q1 2024.pdf' works directly.

Tool Calling (Function Calling)

Bind custom tools to the model so it can fetch live data. This series ships two reusable tools, web_search and get_weather, in a shared scripts/base_tools.py module.

PYTHON

import sys
sys.path.append('../')

from scripts import base_tools

Each tool can be invoked directly to confirm it works:

PYTHON

response = base_tools.web_search.invoke({'query': "what is the latest stock news"})
print(response[0].content[:200])

OUTPUT

UAE stocks sell off as markets reopen ... investors track ongoing market events ...

Bind both tools to the model. When we ask a question that needs them, Gemini returns tool_calls instead of a final answer:

PYTHON

model = ChatGoogleGenerativeAI(model=gemini2)
model_with_tools = model.bind_tools([base_tools.web_search, base_tools.get_weather])

response = model_with_tools.invoke(
    "What is the weather in mumbai? and What is the US stock news today?"
)
response.tool_calls

OUTPUT

[{'name': 'get_weather', 'args': {'location': 'Mumbai'}, 'id': 'call_1', 'type': 'tool_call'}, {'name': 'web_search', 'args': {'query': 'US stock news today'}, 'id': 'call_2', 'type': 'tool_call'}]

Here, we can see the model issue two parallel tool calls, one per sub-question. Running those tools and feeding the results back is what an agent does for us automatically, as the next lesson shows.

Thinking Support (Reasoning)

Gemini exposes its reasoning process. Control it with thinking_budget (token count) or thinking_level (low/high), and reveal it with include_thoughts=True.

PYTHON

model = ChatGoogleGenerativeAI(model=gemini2,
                               thinking_budget=100,
                               include_thoughts=True)

response = model.invoke(query)
response.content_blocks

OUTPUT

[{'type': 'reasoning', 'reasoning': '**Relativity: Breaking It Down for Clarity** ... I need relatable analogies, avoiding jargon ...'}, {'type': 'text', 'text': "Imagine you're trying to understand how time, space, and gravity work ..."}]

Setting thinking_budget=0 disables reasoning entirely, which lowers output tokens and latency:

PYTHON

model = ChatGoogleGenerativeAI(model=gemini2,
                               thinking_budget=0,
                               include_thoughts=True)

response = model.invoke(query)
response.usage_metadata

OUTPUT

{'input_tokens': 11, 'output_tokens': 594, 'total_tokens': 605, 'input_token_details': {'cache_read': 0}}

Tip

Full reasoning controls are documented at: https://ai.google.dev/gemini-api/docs/thinking

Built-in Tools

Gemini ships two native tools, Google Search and Code Execution, that require no setup. Bind them by name:

PYTHON

model = ChatGoogleGenerativeAI(model=gemini2)
model_with_tools = model.bind_tools([{'google_search': {}}, {'code_execution': {}}])

query = "When is the next total solar eclipse in the US and what is 3 + 2?"
response = model_with_tools.invoke(query)
print(response.text)

PYTHON

The answer to 3 + 2 is 5. The next total solar eclipse in the US will be on March 30,
2033 (Alaska). The next visible in the contiguous US is August 23, 2044 (Montana and
the Dakotas), followed by a widespread eclipse on August 12, 2045.

Here, we can see the model use code execution to compute 3 + 2 and Google Search to look up the eclipse dates, both in a single call. Inspect response.content_blocks to see the server_tool_call and server_tool_result entries.

Context Caching

Caching large documents avoids re-sending and re-billing the same tokens on every query. A cache needs at least 2,048 tokens.

PYTHON

import time
from google import genai
from google.genai.types import CreateCachedContentConfig, Content, Part

client = genai.Client()

Upload the documents and wait until each finishes processing:

PYTHON

file_paths = [
    "data/apple 10-q q1 2024.pdf",
    "data/apple 10-q q2 2024.pdf"
]

uploaded_files = []
for path in file_paths:
    file = client.files.upload(file=path)
    while file.state.name == "PROCESSING":
        time.sleep(2)
        file = client.files.get(name=file.name)
    uploaded_files.append(file)

Wrap the uploaded files as content parts and create a cache with system instructions and a 30-minute TTL:

PYTHON

parts = [Part.from_uri(file_uri=f.uri, mime_type=f.mime_type) for f in uploaded_files]
contents = [Content(role='user', parts=parts)]

cache = client.caches.create(
    model=gemini2,
    config=CreateCachedContentConfig(
        display_name='Apple Q1 Q2 2024 reports',
        system_instruction="You are a financial analyst. Use these Apple quarterly reports to answer questions.",
        contents=contents,
        ttl='1800s'
    )
)

Point a model at the cache by passing cached_content. The first query reads the cached tokens; notice cache_read in the usage metadata:

PYTHON

model = ChatGoogleGenerativeAI(
    model=gemini2,
    cached_content=cache.name
)

query = "Compare the revenue growth between Q1 and Q2 2024."
response = model.invoke(query)
response.usage_metadata

OUTPUT

{'input_tokens': 14482, 'output_tokens': 1822, 'total_tokens': 16304, 'input_token_details': {'cache_read': 14465}, 'output_token_details': {'reasoning': 1504}}

Every subsequent query reuses the same cache, so the 14,465 document tokens are read from cache rather than re-processed, which cuts cost and latency.

Note

Caching details and pricing are documented at: https://ai.google.dev/gemini-api/docs/caching

Image Generation

Generate images up to 4K with gemini-3-pro-image-preview. Supported aspect ratios are 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, and 21:9; resolutions are 1K, 2K, and 4K.

PYTHON

from langchain_google_genai import Modality
from IPython.display import Image

image_model = ChatGoogleGenerativeAI(model="gemini-3-pro-image-preview")

image_content = f"Create a professional infographic with this data:\n\n{response.text}"

image_response = image_model.invoke(
    image_content, response_modalities=[Modality.TEXT, Modality.IMAGE]
)

Decode the returned base64 image to display or save it:

PYTHON

display(Image(base64.b64decode(image_response.content_blocks[0]['base64'])))

with open("data/images/apple_info.png", 'wb') as f:
    f.write(base64.b64decode(image_response.content_blocks[0]['base64']))

How an Agent Moves Through States

The features above are the building blocks. An agent chains them together and moves through a sequence of message states instead of answering in one step. Take the query "tell me about the apple news":

An agent moves from query to a tool call to a grounded answer

Initial input: a HumanMessage carries the user query, with no tools used yet.
Tool decision: the model returns an AIMessage with a tool_call such as web_search({"query": "Apple Inc. news"}).
Tool execution: a ToolMessage returns raw articles and metadata from the web.
Processing: the model reads the tool output and pulls out the key points.
Final answer: a final AIMessage turns the raw data into a readable summary.

The full flow is:

PYTHON

HumanMessage
   ↓
AIMessage (tool call)
   ↓
ToolMessage (results)
   ↓
AIMessage (final answer)

This loop of decide, act, observe, and respond runs under every agent in this series. This is how a Gemini agent moves through states. Next, LangChain Agent Fundamentals turns these states into a real agent with create_agent, memory, streaming, and guardrails.

Getting Started with Gemini 3 & LangChain

Creating a Gemini API Key

Creating LangSmith API Keys

Verifying the Setup

Gemini 3 Model Family Overview

Basic Usage

Streaming

Multimodal Capabilities

Image from a URL

Image from a Local File

PDF Document Analysis

Tool Calling (Function Calling)

Thinking Support (Reasoning)

Built-in Tools

Context Caching

Image Generation

How an Agent Moves Through States

Found this useful? Keep building with me.

Latest recommendations you might like

Real-World Agent Project: MySQL & Streaming

Deploy AI Agents with FastAPI

Build a Daily Briefing AI Agent

Build a Google Sheets Analysis Agent with MCP

Find this tutorial useful?

Discussion & Comments