Google's Gemini 3 model family introduces state-of-the-art reasoning, million-token context windows, multimodal intelligence, and built-in execution tools. Integrated with LangChain, developers can leverage these capabilities to construct structured, private, and highly adaptive agent systems.
This bootcamp walks through API key initialization, environment configuration, basic and streamed prompt execution, multimodal file analysis, tool calling, reasoning configuration, and context caching.
API Key and Environment Setup
To deploy Gemini 3 and track its calls, you must configure API access for both Google AI Studio and LangSmith.
Part 1: Creating a Gemini API Key
- Go to Google AI Studio and sign in with your Google account.
- Click "Get API Key" in the left sidebar, then click the "Create API Key" button.
- Select an existing Google Cloud project or create a new one.
- Generate the key, copy it immediately, and save it in a
.envfile in your project root:
GOOGLE_API_KEY=your_gemini_api_key_here
Part 2: Setting up LangSmith Tracing
- Go to LangSmith and register an account.
- Click your profile icon (top-right), navigate to "Settings", and click "API Keys" in the left sidebar.
- Click "Create API Key" and give it a descriptive name (e.g.,
"Multi-Agent Deep RAG"). - Save the generated key and trace parameters into your
.envfile:
# LANGSMITH_API_KEY="your_api_key_here"
# LANGSMITH_TRACING=true
# LANGSMITH_ENDPOINT="https://api.smith.langchain.com"
# LANGCHAIN_PROJECT="multi-agent-deep-rag"
Testing the Configuration
Install the required packages:
pip install python-dotenv langchain-google-genai google-genai langsmith
On Linux/macOS: The command is identical.
Verify the loaded environment keys in your python session:
import os
from dotenv import load_dotenv
load_dotenv()
google_api_key = os.getenv("GOOGLE_API_KEY")
langchain_api_key = os.getenv("LANGSMITH_API_KEY")
if google_api_key:
print("Gemini API Key loaded successfully")
else:
print("Gemini API Key not found")
if langchain_api_key:
print("Langsmith API Key loaded successfully")
else:
print("Langsmith API Key not found")
Gemini API Key loaded successfully
Langsmith API Key loaded successfully
Test connection to the model:
from google import genai
client = genai.Client()
response = client.models.generate_content(
model="gemini-3-pro-preview",
contents="Explain how AI works in a few words",
)
print(response.text)
It processes massive amounts of data to **recognize patterns** and **make predictions**.
In short: **Data + Math = Predictions.**
Ensure LangSmith connection functions correctly:
from langsmith import Client
client = Client()
try:
projects = list(client.list_projects(limit=1))
print("Langsmith connection successful!")
print(f"Connected to project: {os.getenv('LANGCHAIN_PROJECT')}")
except Exception as e:
print(f"Langsmith connection failed: {e}")
Langsmith connection successful!
Connected to project: multi-agent-deep-rag
Google Gemini 3 Model Lineup
Gemini 3 offers specialized variants tuned for specific latency, cost, and reasoning profiles:
| Model | Context Window (In/Out) | Recommended Use Case |
|---|---|---|
gemini-3-pro-preview |
1M / 64k | Complex reasoning, code generation, structural analysis |
gemini-3-pro-image-preview |
65k / 32k | Image generation, image-to-image editing, graphic rendering |
gemini-2.5-flash |
1M / 8k | Fast, cost-efficient processing, standard text operations |
Key Improvements
- Thinking Level: Reasoning depth configuration (
loworhigh). - Media Resolution: Configurable processing resolution per media asset (
low,medium,high,ultra_high). - Thought Signatures: Automatic internal recording and preservation of the reasoning chain.
Basic Invocation and Token Streaming
Basic Message Structure
Invoke the model using system and user message formats:
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.messages import HumanMessage, SystemMessage
gemini3 = 'gemini-3-pro-preview'
gemini2 = 'gemini-2.5-flash'
system_msg = SystemMessage("You are a helpful AI Assistant")
query = HumanMessage("Explain the theory of relativity in simple terms")
messages = [system_msg, query]
model = ChatGoogleGenerativeAI(model=gemini3)
response = model.invoke(messages)
print(response.content)
The Theory of Relativity, developed by Albert Einstein, is split into two parts:
1. **Special Relativity**: Time and space are relative and depend on how fast you are moving. The speed of light is the universal constant. As speed increases, time slows down (time dilation) and space contracts.
2. **General Relativity**: Gravity is not a pulling force, but rather the bending and warping of spacetime fabric caused by mass and energy. Heavy objects create a "dent" in spacetime that smaller objects roll along.
Real-Time Token Streaming
Stream tokens chunk-by-chunk to improve responsiveness in visual UIs:
model = ChatGoogleGenerativeAI(model=gemini2)
query = "Explain the theory of relativity in simple terms."
for chunk in model.stream(query):
print(chunk.text, end="", flush=True)
Okay, let's break down the theory of relativity...
Multimodal Document Analysis
Gemini 3 processes images, audio, video, and PDF files natively.
Image Analysis (URL/Local File)
You can pass image URLs directly to the model inside a content dictionary:
model = ChatGoogleGenerativeAI(model=gemini3)
human_msg = HumanMessage(
[
{'type': 'text', 'text': 'Describe the image provided'},
{'type': 'image',
'url': 'https://www.shutterstock.com/image-vector/vector-cute-baby-panda-cartoon-600nw-2427356853.jpg'}
]
)
response = model.invoke([system_msg, human_msg])
print(response.text)
The image features an adorable cartoon baby panda sitting upright and smiling, framed by green grass on a white background.
For local images, base64 encode the asset and define its MIME type:
import base64
mime_type = "image/png"
image_bytes = open("data/images/panda.png", 'rb').read()
bytes_base64 = base64.b64encode(image_bytes).decode('utf-8')
human_msg = HumanMessage(
[
{'type': 'text', 'text': 'Describe the image provided'},
{'type': 'image',
'base64': bytes_base64,
"mime_type": mime_type}
]
)
response = model.invoke([system_msg, human_msg])
print(response.text)
This is a cartoon illustration of a baby panda cub with round pink ears sitting in green grass.
PDF Document Analysis
Analyzing multi-page documents (such as financial SEC filings) follows the same base64 encoding strategy:
pdf_bytes = open(r'data\rag-data\pdfs\apple\apple 10-q q1 2024.pdf', 'rb').read()
pdf_base64 = base64.b64encode(pdf_bytes).decode('utf-8')
mime_type = "application/pdf"
human_msg = HumanMessage(
[
{'type': 'text', 'text': 'summarize the key financial highlights from this quarterly report.'},
{'type': 'file',
'base64': pdf_base64,
'mime_type': mime_type}
]
)
model = ChatGoogleGenerativeAI(model=gemini2)
response = model.invoke([system_msg, human_msg])
print(response.text)
Key Financial Highlights:
- Total Net Sales: Decreased by 4% to $90.75 billion.
- Services Sales: Increased by 14% to $23.87 billion.
- Diluted EPS: Increased slightly to $1.53.
Note
For large PDF inputs, setting media_resolution parameters can improve document text clarity during extraction.
Tool Calling and Native Functions
Gemini models support custom tool bindings as well as native Google-provided execution tools.
Binding Custom Tools
Declare a schema or python tool and bind it directly to the model executor:
from scripts import base_tools
model = ChatGoogleGenerativeAI(model=gemini2)
model_with_tools = model.bind_tools([base_tools.web_search, base_tools.get_weather])
response = model_with_tools.invoke("What is the weather in mumbai?")
print(response.tool_calls)
[{'name': 'get_weather', 'args': {'location': 'Mumbai'}, 'id': '...'}]
Google Search and Python Code Interpreter
Enable native Google Search grounding or python code sandbox execution directly inside the API configuration:
model = ChatGoogleGenerativeAI(model=gemini2)
model_with_tools = model.bind_tools([{'google_search': {}}, {'code_execution': {}}])
query = "When is the next total solar eclipse in the US and what is 3 + 2?"
response = model_with_tools.invoke(query)
print(response.text)
The answer to 3 + 2 is 5. The next total solar eclipse in the US will be on March 30, 2033, visible only in Alaska.
Warning
C:\Users\your-username\anaconda3\envs\ml\Lib\site-packages\langchain_google_genai\chat_models.py:1052
Replace your-username with your actual Windows username.
Output structures may vary per execution. Ensure you validate server-returned schemas before parsing them in production pipelines.
Context Caching
When performing consecutive queries against large files, you can cache target documents in Gemini's server memory to minimize token costs and network latency. Caching requires a minimum threshold of 2,048 tokens.
import time
from google import genai
from google.genai.types import CreateCachedContentConfig, Content, Part
client = genai.Client()
file_paths = [
"data/rag-data/pdfs/apple/apple 10-q q1 2024.pdf",
"data/rag-data/pdfs/apple/apple 10-q q2 2024.pdf"
]
uploaded_files = []
for path in file_paths:
file = client.files.upload(file=path)
while file.state.name == "PROCESSING":
time.sleep(2)
file = client.files.get(name=file.name)
uploaded_files.append(file)
Create the server-side cache container:
parts = [Part.from_uri(file_uri=f.uri, mime_type=f.mime_type) for f in uploaded_files]
contents = [Content(role='user', parts=parts)]
cache = client.caches.create(
model=gemini2,
config=CreateCachedContentConfig(
display_name='Apple Q1 Q2 2024 reports',
system_instruction="You are a financial analyst. Use these Apple quarterly reports to answer questions.",
contents=contents,
ttl='1800s' # Cache lives for 30 minutes
)
)
Invoke queries referencing the active cache:
model = ChatGoogleGenerativeAI(
model=gemini2,
cached_content=cache.name
)
query = "Compare the revenue growth between Q1 and Q2 2024."
response = model.invoke(query)
print(response.usage_metadata)
{'input_tokens': 14482, 'output_tokens': 2128, 'total_tokens': 16610, 'input_token_details': {'cache_read': 14465}}
Notice 'cache_read': 14465 in usage_metadata — this indicates the large documents were successfully read from the server cache, bypassing token transfer costs.
Image Generation with Grounding
Generate graphics up to 4K resolution using gemini-3-pro-image-preview. Aspect ratios supported include "16:9", "1:1", "2:3", and "21:9".
from langchain_google_genai import Modality
image_model = ChatGoogleGenerativeAI(model="gemini-3-pro-image-preview")
image_content = "Create a professional infographic of Apple Q2 earnings."
image_response = image_model.invoke(
image_content, response_modalities=[Modality.TEXT, Modality.IMAGE]
)
# Extract and save the generated image
with open("data/images/apple_info.png", 'wb') as f:
f.write(base64.b64decode(image_response.content_blocks[0]['base64']))