Getting Started with Gemini 3 & LangChain

Set up Gemini 3, LangChain, and LangSmith, then explore streaming, multimodal input, tool calling, reasoning, and context caching for AI agents.

Jun 19, 202628 min readFollow

Topics You Will Master

Generating Gemini and LangSmith API keys and tracing every call
Calling Gemini 3 and 2.5 with messages, streaming, and multimodal input
Binding custom tools and using built-in Google Search and code execution
Controlling reasoning depth, caching documents, and generating images

Gemini 3 is Google's most capable model family, and LangChain is the framework that turns it into a programmable agent. This first lesson sets up everything you need — API keys, tracing, and the full feature surface of Gemini — so the projects that follow have a solid foundation.

We cover API key setup, LangSmith tracing, and then every core Gemini capability through LangChain: basic messaging, streaming, multimodal input (images, PDFs), tool calling, reasoning control, built-in tools, context caching, and image generation. We close with the message-state lifecycle that every agent follows.

Note

This is the foundation post for the AI Agent Projects series. Later lessons such as LangChain Agent Fundamentals assume your keys and environment are working as shown here.

Creating a Gemini API Key

Gemini API access is free to start through Google AI Studio.

  1. Go to Google AI Studio and sign in with your Google account.
  2. Click Get API Key in the left sidebar, then Create API Key.
  3. Select an existing Google Cloud project or create a new one, then click Create API key.
  4. Copy the key immediately and store it securely.

Add the key to a .env file in your project root:

BASH
GOOGLE_API_KEY=your_gemini_api_key_here

Tip

Gemini API pricing and free-tier limits are listed at the official pricing page: https://ai.google.dev/gemini-api/docs/pricing

Creating LangSmith API Keys

LangSmith records every model call, tool invocation, and token count so you can debug agents visually.

  1. Go to LangSmith and sign up with GitHub, Google, or email.
  2. Open your profile menu → SettingsAPI Keys.
  3. Click Create API Key, give it a descriptive name, and copy it immediately — it is shown only once.
  4. Note your project name under Projects (the default is usually default).

Add the LangSmith configuration to the same .env file:

BASH
LANGSMITH_API_KEY="your_api_key_here"
LANGSMITH_TRACING=true
LANGSMITH_ENDPOINT="https://api.smith.langchain.com"
LANGCHAIN_PROJECT="multi-agent-deep-rag"

Verifying the Setup

Load the environment variables and confirm both keys are visible to Python:

PYTHON
import os
from dotenv import load_dotenv

load_dotenv()

google_api_key = os.getenv("GOOGLE_API_KEY")
langchain_api_key = os.getenv("LANGSMITH_API_KEY")

if google_api_key:
    print("Gemini API Key loaded successfully")
else:
    print("Gemini API Key not found")

if langchain_api_key:
    print("Langsmith API Key loaded successfully")
else:
    print("Langsmith API Key not found")
OUTPUT
Gemini API Key loaded successfully
Langsmith API Key loaded successfully

Test the raw Gemini client with a quick generation call:

PYTHON
from google import genai

client = genai.Client()

response = client.models.generate_content(
    model="gemini-3-pro-preview",
    contents="Explain how AI works in a few words",
)

print(response.text)
OUTPUT
It processes massive amounts of data to recognize patterns and make predictions.

In short: Data + Math = Predictions.

Confirm that LangSmith can connect and see your project:

PYTHON
from langsmith import Client

client = Client()

try:
    projects = list(client.list_projects(limit=1))
    print("Langsmith connection successful!")
    print(f"Connected to project: {os.getenv('LANGCHAIN_PROJECT')}")
except Exception as e:
    print(f"Langsmith connection failed: {e}")
OUTPUT
Langsmith connection successful!
Connected to project: multi-agent-deep-rag

Gemini 3 Model Family Overview

Gemini 3 Pro is built on state-of-the-art reasoning, with a dynamic thinking process and very large context windows.

Key features:

  • Advanced reasoning with configurable thinking levels
  • 1M token context — up to 1 million input tokens, 64k output tokens
  • Multimodal — images, PDFs, audio, and video with granular resolution control
  • Knowledge cutoff of January 2025
  • Image generation at up to 4K resolution
Model Context (In/Out) Best For
gemini-3-pro-preview 1M / 64k Complex reasoning, coding, analysis
gemini-3-pro-image-preview 65k / 32k Image generation & editing
gemini-2.5-flash 1M / 8k Fast, cost-effective tasks

New controls introduced in Gemini 3:

  1. Thinking level — control reasoning depth (low or high).
  2. Media resolution — granular control per media type (low, medium, high, ultra_high).
  3. Temperature — keep at the default 1.0; changing it can degrade performance.
  4. Thought signatures — reasoning context is preserved automatically.

Basic Usage

Import the chat model and message types. We define two model identifiers so we can compare a Gemini 3 model against a Gemini 2.5 model:

PYTHON
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.messages import HumanMessage, SystemMessage, AIMessage, ToolMessage

gemini3 = 'gemini-3-flash-preview'  # or gemini-3-pro-preview
gemini2 = 'gemini-2.5-flash'

system_msg = SystemMessage("You are a helpful AI Assistant")
query = HumanMessage("Explain the theory of relativity in the simple terms")

messages = [system_msg, query]

Invoke the Gemini 3 model with the message list:

PYTHON
model = ChatGoogleGenerativeAI(model=gemini3)
response = model.invoke(messages)

Gemini returns structured content blocks. Inspect them with response.content or response.content_blocks:

PYTHON
response.content
OUTPUT
[{'type': 'text', 'text': 'To understand the Theory of Relativity, it helps to break it down into two parts: Special Relativity (speed and time) and General Relativity (gravity)...'}]

Switch to the Gemini 2.5 model with the same messages, then inspect token usage and response metadata:

PYTHON
model = ChatGoogleGenerativeAI(model=gemini2)
response = model.invoke(messages)

response.usage_metadata
OUTPUT
{'input_tokens': 18, 'output_tokens': 2304, 'total_tokens': 2322, 'input_token_details': {'cache_read': 0}, 'output_token_details': {'reasoning': 1114}}
PYTHON
response.response_metadata
OUTPUT
{'finish_reason': 'STOP', 'model_name': 'gemini-2.5-flash', 'safety_ratings': [], 'model_provider': 'google_genai'}

Streaming

For long responses, stream tokens as they are generated instead of waiting for the full reply:

PYTHON
model = ChatGoogleGenerativeAI(model=gemini2)

query = "Explain the theory of relativity in the simple terms."

for chunk in model.stream(query):
    print(chunk.text, end="", flush=True)
OUTPUT
Imagine you're trying to understand how the universe works at really high speeds or near
really massive objects. That's what Einstein's Theory of Relativity helps us do.

### 1. Special Relativity (1905): The Rules for Constant Speed
- The laws of physics are the same for everyone in constant motion.
- The speed of light is constant for every observer.
...

Multimodal Capabilities

Gemini processes images, PDFs, audio, and video alongside text. Pass a HumanMessage whose content is a list of typed blocks.

Image from a URL

PYTHON
model = ChatGoogleGenerativeAI(model=gemini3)

human_msg = HumanMessage(
    [
        {'type': 'text', 'text': 'Describe the image provided'},
        {'type': 'image',
         'url': 'https://www.shutterstock.com/image-vector/vector-cute-baby-panda-cartoon-600nw-2427356853.jpg'}
    ]
)

response = model.invoke([system_msg, human_msg])
print(response.text)
OUTPUT
A high-contrast, cartoon-style illustration of a cute giant panda sitting upright.
It is rendered in a clean vector style with bold black outlines, large round white head,
black eye patches, pink-centered ears, surrounded by tall green blades of grass on a
solid white background.

Image from a Local File

Base64-encode a local image (or PDF/audio) before sending it. Read the file as bytes and encode:

PYTHON
import base64

mime_type = "image/png"

image_bytes = open("data/images/panda.png", 'rb').read()
bytes_base64 = base64.b64encode(image_bytes).decode('utf-8')

Send the encoded image with a base64 block and its mime_type:

PYTHON
model = ChatGoogleGenerativeAI(model=gemini2)

human_msg = HumanMessage(
    [
        {'type': 'text', 'text': 'Describe the image provided'},
        {'type': 'image',
         'base64': bytes_base64,
         "mime_type": mime_type}
    ]
)

response = model.invoke([system_msg, human_msg])
response.pretty_print()
OUTPUT
================================== Ai Message ==================================

This image is a vibrant cartoon illustration of a baby panda sitting amongst green
foliage on a white background — thick clean outlines, iconic black-and-white fur,
large expressive eyes, a friendly smile, and soft pink paw pads. Overall impression:
cheerful, friendly, and innocent.

Note

Local file paths in this lesson use relative paths like data/images/panda.png. Keep your data files inside your project folder so the same code runs on any machine.

PDF Document Analysis

PDFs use the file block type with mime_type set to application/pdf. For PDFs, a medium media resolution is recommended.

PYTHON
pdf_bytes = open(r'data\apple 10-q q1 2024.pdf', 'rb').read()
pdf_base64 = base64.b64encode(pdf_bytes).decode('utf-8')

mime_type = "application/pdf"

human_msg = HumanMessage(
    [
        {'type': 'text', 'text': 'summarize the key financial highlights from this quarterly report.'},
        {'type': 'file',
         'base64': pdf_base64,
         'mime_type': mime_type}
    ]
)

model = ChatGoogleGenerativeAI(model=gemini2)
response = model.invoke([system_msg, human_msg])
print(response.text)
OUTPUT
Based on Apple's Form 10-Q for the period ended March 30, 2024:

- Total net sales decreased ~4% to $90.753 billion.
- Product net sales fell ~9.5% to $66.886 billion (iPhone -10%, iPad -17%, Mac +4%).
- Services net sales grew ~14% to $23.867 billion.
- Net income was $23.636 billion; basic EPS flat at $1.53.
- Gross margin improved to 46.6% from 44.3%.
- Apple repurchased $44.0B of stock and paid $7.5B dividends over six months.
- The Board authorized an additional $110B for buybacks and raised the dividend to $0.25.

Important

Windows path strings often contain backslashes. Use a raw string (r'data\apple 10-q q1 2024.pdf') so \a and similar sequences are not interpreted as escape characters.

On Linux/macOS: paths use forward slashes, so a raw string is unnecessary — 'data/apple 10-q q1 2024.pdf' works directly.

Tool Calling (Function Calling)

Bind custom tools to the model so it can fetch live data. This series ships two reusable tools — web_search and get_weather — in a shared scripts/base_tools.py module.

PYTHON
import sys
sys.path.append('../')

from scripts import base_tools

Each tool can be invoked directly to confirm it works:

PYTHON
response = base_tools.web_search.invoke({'query': "what is the latest stock news"})
print(response[0].content[:200])
OUTPUT
UAE stocks sell off as markets reopen ... investors track ongoing market events ...

Bind both tools to the model. When you ask a question that needs them, Gemini returns tool_calls instead of a final answer:

PYTHON
model = ChatGoogleGenerativeAI(model=gemini2)
model_with_tools = model.bind_tools([base_tools.web_search, base_tools.get_weather])

response = model_with_tools.invoke(
    "What is the weather in mumbai? and What is the US stock news today?"
)
response.tool_calls
OUTPUT
[{'name': 'get_weather', 'args': {'location': 'Mumbai'}, 'id': 'call_1', 'type': 'tool_call'}, {'name': 'web_search', 'args': {'query': 'US stock news today'}, 'id': 'call_2', 'type': 'tool_call'}]

The model correctly issues two parallel tool calls — one per sub-question. Executing those tools and feeding the results back is exactly what an agent does for you automatically, as you will see in the next lesson.

Thinking Support (Reasoning)

Gemini exposes its reasoning process. Control it with thinking_budget (token count) or thinking_level (low/high), and reveal it with include_thoughts=True.

PYTHON
model = ChatGoogleGenerativeAI(model=gemini2,
                               thinking_budget=100,
                               include_thoughts=True)

response = model.invoke(query)
response.content_blocks
OUTPUT
[{'type': 'reasoning', 'reasoning': '**Relativity: Breaking It Down for Clarity** ... I need relatable analogies, avoiding jargon ...'}, {'type': 'text', 'text': "Imagine you're trying to understand how time, space, and gravity work ..."}]

Setting thinking_budget=0 disables reasoning entirely, which lowers output tokens and latency:

PYTHON
model = ChatGoogleGenerativeAI(model=gemini2,
                               thinking_budget=0,
                               include_thoughts=True)

response = model.invoke(query)
response.usage_metadata
OUTPUT
{'input_tokens': 11, 'output_tokens': 594, 'total_tokens': 605, 'input_token_details': {'cache_read': 0}}

Tip

Full reasoning controls are documented at: https://ai.google.dev/gemini-api/docs/thinking

Built-in Tools

Gemini ships native tools — Google Search and Code Execution — that require no setup. Bind them by name:

PYTHON
model = ChatGoogleGenerativeAI(model=gemini2)
model_with_tools = model.bind_tools([{'google_search': {}}, {'code_execution': {}}])

query = "When is the next total solar eclipse in the US and what is 3 + 2?"
response = model_with_tools.invoke(query)
print(response.text)
PYTHON
The answer to 3 + 2 is 5. The next total solar eclipse in the US will be on March 30,
2033 (Alaska). The next visible in the contiguous US is August 23, 2044 (Montana and
the Dakotas), followed by a widespread eclipse on August 12, 2045.

The model used code execution to compute 3 + 2 and Google Search to look up eclipse dates — both in a single call. Inspect response.content_blocks to see the server_tool_call and server_tool_result entries.

Context Caching

Caching large documents avoids re-sending and re-billing the same tokens on every query. A cache needs at least 2,048 tokens.

PYTHON
import time
from google import genai
from google.genai.types import CreateCachedContentConfig, Content, Part

client = genai.Client()

Upload the documents and wait until each finishes processing:

PYTHON
file_paths = [
    "data/apple 10-q q1 2024.pdf",
    "data/apple 10-q q2 2024.pdf"
]

uploaded_files = []
for path in file_paths:
    file = client.files.upload(file=path)
    while file.state.name == "PROCESSING":
        time.sleep(2)
        file = client.files.get(name=file.name)
    uploaded_files.append(file)

Wrap the uploaded files as content parts and create a cache with system instructions and a 30-minute TTL:

PYTHON
parts = [Part.from_uri(file_uri=f.uri, mime_type=f.mime_type) for f in uploaded_files]
contents = [Content(role='user', parts=parts)]

cache = client.caches.create(
    model=gemini2,
    config=CreateCachedContentConfig(
        display_name='Apple Q1 Q2 2024 reports',
        system_instruction="You are a financial analyst. Use these Apple quarterly reports to answer questions.",
        contents=contents,
        ttl='1800s'
    )
)

Point a model at the cache by passing cached_content. The first query reads the cached tokens — notice cache_read in the usage metadata:

PYTHON
model = ChatGoogleGenerativeAI(
    model=gemini2,
    cached_content=cache.name
)

query = "Compare the revenue growth between Q1 and Q2 2024."
response = model.invoke(query)
response.usage_metadata
OUTPUT
{'input_tokens': 14482, 'output_tokens': 1822, 'total_tokens': 16304, 'input_token_details': {'cache_read': 14465}, 'output_token_details': {'reasoning': 1504}}

Every subsequent query reuses the same cache, so the 14,465 document tokens are read from cache rather than re-processed — cutting cost and latency.

Note

Caching details and pricing are documented at: https://ai.google.dev/gemini-api/docs/caching

Image Generation

Generate images up to 4K with gemini-3-pro-image-preview. Supported aspect ratios are 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, and 21:9; resolutions are 1K, 2K, and 4K.

PYTHON
from langchain_google_genai import Modality
from IPython.display import Image

image_model = ChatGoogleGenerativeAI(model="gemini-3-pro-image-preview")

image_content = f"Create a professional infographic with this data:\n\n{response.text}"

image_response = image_model.invoke(
    image_content, response_modalities=[Modality.TEXT, Modality.IMAGE]
)

Decode the returned base64 image to display or save it:

PYTHON
display(Image(base64.b64decode(image_response.content_blocks[0]['base64'])))

with open("data/images/apple_info.png", 'wb') as f:
    f.write(base64.b64decode(image_response.content_blocks[0]['base64']))

How an Agent Moves Through States

Everything above are the building blocks. An agent chains them together, moving through a sequence of message states instead of answering in one step.

Consider the query "tell me about the apple news":

  1. Initial input — a HumanMessage carries the user query; no tools used yet.
  2. Tool decision — the model returns an AIMessage with a tool_call such as web_search({"query": "Apple Inc. news"}).
  3. Tool execution — a ToolMessage returns raw articles and metadata from the web.
  4. Processing — the model reads the tool output and extracts the key points.
  5. Final answer — a final AIMessage converts the raw data into a clean, human-readable summary.

The full flow is:

PYTHON
HumanMessage
   ↓
AIMessage (tool call)
   ↓
ToolMessage (results)
   ↓
AIMessage (final answer)

This loop — decide, act, observe, respond — is the heart of every agent in this series. In the next lesson, LangChain Agent Fundamentals, we wire these states into a real agent with create_agent, memory, streaming, and guardrails.

Found this useful? Keep building with me.

New tutorials every week on YouTube — or go deeper with a full structured course.

Find this tutorial useful?

Subscribe to our YouTube channels for more practical production walk-throughs.

Discussion & Comments