Gemini 3 and LangChain Bootcamp

Master Google Gemini 3 with LangChain: configure API keys, streaming, multimodal analysis, context caching, tool calling, and image generation.

Jun 19, 202617 min readFollow

Topics You Will Master

Setting up API keys and tracing configurations for Google Gemini and LangSmith
Invoking model chat completions, streaming tokens, and extracting reasoning thought signatures
Running multimodal document analysis on images and complex PDF files
Implementing native built-in search/code tools, custom tool binding, and cost-saving context caches

Google's Gemini 3 model family introduces state-of-the-art reasoning, million-token context windows, multimodal intelligence, and built-in execution tools. Integrated with LangChain, developers can leverage these capabilities to construct structured, private, and highly adaptive agent systems.

This bootcamp walks through API key initialization, environment configuration, basic and streamed prompt execution, multimodal file analysis, tool calling, reasoning configuration, and context caching.

95% OFF

Agentic RAG with LangChain and LangGraph - Ollama

Step-by-step guide to RAG with LangChain, LangGraph, and Ollama | DeepSeek R1, QWEN, LLAMA, FAISS.

Enroll Now — 95% OFF →

API Key and Environment Setup

To deploy Gemini 3 and track its calls, you must configure API access for both Google AI Studio and LangSmith.

Part 1: Creating a Gemini API Key

  1. Go to Google AI Studio and sign in with your Google account.
  2. Click "Get API Key" in the left sidebar, then click the "Create API Key" button.
  3. Select an existing Google Cloud project or create a new one.
  4. Generate the key, copy it immediately, and save it in a .env file in your project root:
PYTHON
GOOGLE_API_KEY=your_gemini_api_key_here

Part 2: Setting up LangSmith Tracing

  1. Go to LangSmith and register an account.
  2. Click your profile icon (top-right), navigate to "Settings", and click "API Keys" in the left sidebar.
  3. Click "Create API Key" and give it a descriptive name (e.g., "Multi-Agent Deep RAG").
  4. Save the generated key and trace parameters into your .env file:
PYTHON
# LANGSMITH_API_KEY="your_api_key_here"
# LANGSMITH_TRACING=true
# LANGSMITH_ENDPOINT="https://api.smith.langchain.com"
# LANGCHAIN_PROJECT="multi-agent-deep-rag"

Testing the Configuration

Install the required packages:

BASH
pip install python-dotenv langchain-google-genai google-genai langsmith

On Linux/macOS: The command is identical.

Verify the loaded environment keys in your python session:

PYTHON
import os
from dotenv import load_dotenv

load_dotenv()

google_api_key = os.getenv("GOOGLE_API_KEY")
langchain_api_key = os.getenv("LANGSMITH_API_KEY")

if google_api_key:
    print("Gemini API Key loaded successfully")
else:
    print("Gemini API Key not found")

if langchain_api_key:
    print("Langsmith API Key loaded successfully")
else:
    print("Langsmith API Key not found")
OUTPUT
Gemini API Key loaded successfully
Langsmith API Key loaded successfully

Test connection to the model:

PYTHON
from google import genai

client = genai.Client()

response = client.models.generate_content(
    model="gemini-3-pro-preview",
    contents="Explain how AI works in a few words",
)

print(response.text)
OUTPUT
It processes massive amounts of data to **recognize patterns** and **make predictions**.

In short: **Data + Math = Predictions.**

Ensure LangSmith connection functions correctly:

PYTHON
from langsmith import Client

client = Client()

try:
    projects = list(client.list_projects(limit=1))
    print("Langsmith connection successful!")
    print(f"Connected to project: {os.getenv('LANGCHAIN_PROJECT')}")
except Exception as e:
    print(f"Langsmith connection failed: {e}")
OUTPUT
Langsmith connection successful!
Connected to project: multi-agent-deep-rag

Google Gemini 3 Model Lineup

Gemini 3 offers specialized variants tuned for specific latency, cost, and reasoning profiles:

Model Context Window (In/Out) Recommended Use Case
gemini-3-pro-preview 1M / 64k Complex reasoning, code generation, structural analysis
gemini-3-pro-image-preview 65k / 32k Image generation, image-to-image editing, graphic rendering
gemini-2.5-flash 1M / 8k Fast, cost-efficient processing, standard text operations

Key Improvements

  • Thinking Level: Reasoning depth configuration (low or high).
  • Media Resolution: Configurable processing resolution per media asset (low, medium, high, ultra_high).
  • Thought Signatures: Automatic internal recording and preservation of the reasoning chain.

Basic Invocation and Token Streaming

Basic Message Structure

Invoke the model using system and user message formats:

PYTHON
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.messages import HumanMessage, SystemMessage

gemini3 = 'gemini-3-pro-preview'
gemini2 = 'gemini-2.5-flash'

system_msg = SystemMessage("You are a helpful AI Assistant")
query = HumanMessage("Explain the theory of relativity in simple terms")

messages = [system_msg, query]

model = ChatGoogleGenerativeAI(model=gemini3)
response = model.invoke(messages)

print(response.content)
OUTPUT
The Theory of Relativity, developed by Albert Einstein, is split into two parts:

1. **Special Relativity**: Time and space are relative and depend on how fast you are moving. The speed of light is the universal constant. As speed increases, time slows down (time dilation) and space contracts.
2. **General Relativity**: Gravity is not a pulling force, but rather the bending and warping of spacetime fabric caused by mass and energy. Heavy objects create a "dent" in spacetime that smaller objects roll along.

Real-Time Token Streaming

Stream tokens chunk-by-chunk to improve responsiveness in visual UIs:

PYTHON
model = ChatGoogleGenerativeAI(model=gemini2)
query = "Explain the theory of relativity in simple terms."

for chunk in model.stream(query):
    print(chunk.text, end="", flush=True)
OUTPUT
Okay, let's break down the theory of relativity...

Multimodal Document Analysis

Gemini 3 processes images, audio, video, and PDF files natively.

Image Analysis (URL/Local File)

You can pass image URLs directly to the model inside a content dictionary:

PYTHON
model = ChatGoogleGenerativeAI(model=gemini3)

human_msg = HumanMessage(
    [
        {'type': 'text', 'text': 'Describe the image provided'},
        {'type': 'image',
         'url': 'https://www.shutterstock.com/image-vector/vector-cute-baby-panda-cartoon-600nw-2427356853.jpg'}
    ]
)

response = model.invoke([system_msg, human_msg])
print(response.text)
OUTPUT
The image features an adorable cartoon baby panda sitting upright and smiling, framed by green grass on a white background.

For local images, base64 encode the asset and define its MIME type:

PYTHON
import base64

mime_type = "image/png"

image_bytes = open("data/images/panda.png", 'rb').read()
bytes_base64 = base64.b64encode(image_bytes).decode('utf-8')

human_msg = HumanMessage(
    [
        {'type': 'text', 'text': 'Describe the image provided'},
        {'type': 'image',
         'base64': bytes_base64,
         "mime_type": mime_type}
    ]
)

response = model.invoke([system_msg, human_msg])
print(response.text)
OUTPUT
This is a cartoon illustration of a baby panda cub with round pink ears sitting in green grass.

PDF Document Analysis

Analyzing multi-page documents (such as financial SEC filings) follows the same base64 encoding strategy:

PYTHON
pdf_bytes = open(r'data\rag-data\pdfs\apple\apple 10-q q1 2024.pdf', 'rb').read()
pdf_base64 = base64.b64encode(pdf_bytes).decode('utf-8')

mime_type = "application/pdf"

human_msg = HumanMessage(
    [
        {'type': 'text', 'text': 'summarize the key financial highlights from this quarterly report.'},
        {'type': 'file',
         'base64': pdf_base64,
         'mime_type': mime_type}
    ]
)

model = ChatGoogleGenerativeAI(model=gemini2)
response = model.invoke([system_msg, human_msg])
print(response.text)
OUTPUT
Key Financial Highlights:
- Total Net Sales: Decreased by 4% to $90.75 billion.
- Services Sales: Increased by 14% to $23.87 billion.
- Diluted EPS: Increased slightly to $1.53.

Note

For large PDF inputs, setting media_resolution parameters can improve document text clarity during extraction.


Tool Calling and Native Functions

Gemini models support custom tool bindings as well as native Google-provided execution tools.

Binding Custom Tools

Declare a schema or python tool and bind it directly to the model executor:

PYTHON
from scripts import base_tools

model = ChatGoogleGenerativeAI(model=gemini2)
model_with_tools = model.bind_tools([base_tools.web_search, base_tools.get_weather])

response = model_with_tools.invoke("What is the weather in mumbai?")
print(response.tool_calls)
OUTPUT
[{'name': 'get_weather', 'args': {'location': 'Mumbai'}, 'id': '...'}]

Google Search and Python Code Interpreter

Enable native Google Search grounding or python code sandbox execution directly inside the API configuration:

PYTHON
model = ChatGoogleGenerativeAI(model=gemini2)
model_with_tools = model.bind_tools([{'google_search': {}}, {'code_execution': {}}])

query = "When is the next total solar eclipse in the US and what is 3 + 2?"
response = model_with_tools.invoke(query)
print(response.text)
OUTPUT
The answer to 3 + 2 is 5. The next total solar eclipse in the US will be on March 30, 2033, visible only in Alaska.

Warning

C:\Users\your-username\anaconda3\envs\ml\Lib\site-packages\langchain_google_genai\chat_models.py:1052 Replace your-username with your actual Windows username. Output structures may vary per execution. Ensure you validate server-returned schemas before parsing them in production pipelines.


Context Caching

When performing consecutive queries against large files, you can cache target documents in Gemini's server memory to minimize token costs and network latency. Caching requires a minimum threshold of 2,048 tokens.

PYTHON
import time
from google import genai
from google.genai.types import CreateCachedContentConfig, Content, Part

client = genai.Client()

file_paths = [
    "data/rag-data/pdfs/apple/apple 10-q q1 2024.pdf",
    "data/rag-data/pdfs/apple/apple 10-q q2 2024.pdf"
]

uploaded_files = []
for path in file_paths:
    file = client.files.upload(file=path)
    while file.state.name == "PROCESSING":
        time.sleep(2)
        file = client.files.get(name=file.name)
    uploaded_files.append(file)

Create the server-side cache container:

PYTHON
parts = [Part.from_uri(file_uri=f.uri, mime_type=f.mime_type) for f in uploaded_files]
contents = [Content(role='user', parts=parts)]

cache = client.caches.create(
    model=gemini2,
    config=CreateCachedContentConfig(
        display_name='Apple Q1 Q2 2024 reports',
        system_instruction="You are a financial analyst. Use these Apple quarterly reports to answer questions.",
        contents=contents,
        ttl='1800s'  # Cache lives for 30 minutes
    )
)

Invoke queries referencing the active cache:

PYTHON
model = ChatGoogleGenerativeAI(
    model=gemini2,
    cached_content=cache.name
)

query = "Compare the revenue growth between Q1 and Q2 2024."
response = model.invoke(query)

print(response.usage_metadata)
OUTPUT
{'input_tokens': 14482, 'output_tokens': 2128, 'total_tokens': 16610, 'input_token_details': {'cache_read': 14465}}

Notice 'cache_read': 14465 in usage_metadata — this indicates the large documents were successfully read from the server cache, bypassing token transfer costs.


Image Generation with Grounding

Generate graphics up to 4K resolution using gemini-3-pro-image-preview. Aspect ratios supported include "16:9", "1:1", "2:3", and "21:9".

PYTHON
from langchain_google_genai import Modality

image_model = ChatGoogleGenerativeAI(model="gemini-3-pro-image-preview")
image_content = "Create a professional infographic of Apple Q2 earnings."

image_response = image_model.invoke(
    image_content, response_modalities=[Modality.TEXT, Modality.IMAGE]
)

# Extract and save the generated image
with open("data/images/apple_info.png", 'wb') as f:
    f.write(base64.b64decode(image_response.content_blocks[0]['base64']))

Found this useful? Keep building with me.

New tutorials every week on YouTube — or go deeper with a full structured course.

Find this tutorial useful?

Subscribe to our YouTube channels for more practical production walk-throughs.

Discussion & Comments