docs/docs/genai/tracing/integrations/listing/gemini.mdx
import { APILink } from "@site/src/components/APILink"; import Tabs from "@theme/Tabs"; import TabItem from "@theme/TabItem"; import TabsWrapper from "@site/src/components/TabsWrapper"; import StepHeader from "@site/src/components/StepHeader"; import ServerSetup from "@site/src/content/setup_server_slim.mdx"; import ImageBox from "@site/src/components/ImageBox"; import TilesGrid from "@site/src/components/TilesGrid"; import TileCard from "@site/src/components/TileCard"; import { Users, BookOpen, Scale } from "lucide-react";
MLflow Tracing provides automatic tracing capability for Google Gemini. By enabling auto tracing
for Gemini by calling the <APILink fn="mlflow.gemini.autolog" /> function, MLflow will capture nested traces and log them to the active MLflow Experiment upon invocation of Gemini Python SDK. In Typescript, you can instead use the tracedGemini function to wrap the Gemini client.
MLflow trace automatically captures the following information about Gemini calls:
temperature, max_tokens, if specified.```python
import mlflow
import google.generativeai as genai
import os
# Enable auto-tracing for Gemini
mlflow.gemini.autolog()
# Set a tracking URI and an experiment
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("Gemini")
# Configure your API key
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
# Use Gemini as usual - traces will be automatically captured
model = genai.GenerativeModel("gemini-2.5-flash")
response = model.generate_content("What is the capital of France?")
print(response.text)
```
```typescript
import { GoogleGenerativeAI } from "@google/generative-ai";
import { tracedGemini } from "@mlflow/gemini";
// Wrap the Gemini client with the tracedGemini function
const genAI = tracedGemini(new GoogleGenerativeAI(process.env.GOOGLE_API_KEY));
const model = genAI.getGenerativeModel({ model: "gemini-2.5-flash" });
// Invoke the client as usual
const result = await model.generateContent("What is the capital of France?");
console.log(result.response.text());
```
Browse to the MLflow UI at http://localhost:5000 (or your MLflow server URL) and you should see the traces for the Gemini API calls.
→ View Next Steps for learning about more MLflow features like user feedback tracking, prompt management, and evaluation.
:::note
Current MLflow tracing integration supports both new Google GenAI SDK and legacy Google AI Python SDK. However, it may drop support for the legacy package without notice, and it is highly recommended to migrate your use cases to the new Google GenAI SDK.
:::
MLflow supports automatic tracing for the following Gemini APIs:
| Text Generation | Chat | Function Calling | Streaming | Async | Image | Video |
|---|---|---|---|---|---|---|
| ✅ | ✅ | ✅ | - | ✅ (*1) | - | - |
(*1) Async support was added in MLflow 3.2.0.
</div>| Content Generation | Chat | Function Calling | Streaming | Async |
|---|---|---|---|---|
| ✅ | - | ✅ (*2) | - | ✅ |
(*2) Only models.generateContent() is supported. Function calls in responses are captured and can be rendered in the MLflow UI. The TypeScript SDK is natively async.
To request support for additional APIs, please open a feature request on GitHub.
# Turn on auto tracing for Gemini
mlflow.gemini.autolog()
# Optional: Set a tracking URI and an experiment
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("Gemini")
# Configure the SDK with your API key.
client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])
# Use the generate_content method to generate responses to your prompts.
response = client.models.generate_content(
model="gemini-1.5-flash", contents="The opposite of hot is"
)
```
const client = tracedGemini(new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY }));
const response = await client.models.generateContent({
model: "gemini-2.5-flash",
contents: "What is the capital of France?"
});
```
MLflow support tracing multi-turn conversations with Gemini:
import mlflow
mlflow.gemini.autolog()
chat = client.chats.create(model="gemini-1.5-flash")
response = chat.send_message("In one sentence, explain how a computer works to a young child.")
print(response.text)
response = chat.send_message("Okay, how about a more detailed explanation to a high schooler?")
print(response.text)
MLflow Tracing supports asynchronous API of the Gemini SDK since MLflow 3.2.0. The usage is same as the synchronous API.
<Tabs> <TabItem value="python" label="Python" default> ```python # Configure the SDK with your API key. client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])# Async API is invoked through the `aio` namespace.
response = await client.aio.models.generate_content(
model="gemini-1.5-flash", contents="The opposite of hot is"
)
```
MLflow Tracing for Gemini SDK supports embeddings API (Python only):
result = client.models.embed_content(model="text-embedding-004", contents="Hello world")
MLflow automatically tracks token usage and cost for Gemini API calls. The token usage for each LLM call will be logged in each Trace/Span and the aggregated cost and time trend are displayed in the built-in dashboard. See the Token Usage and Cost Tracking documentation for details on accessing this information programmatically.
Token usage and cost tracking is supported for both Python and TypeScript/JavaScript implementations.
Auto tracing for Gemini can be disabled globally by calling mlflow.gemini.autolog(disable=True) or mlflow.autolog(disable=True).