docs/gateway/call-llms-with-image-and-file-inputs.mdx
This page shows how to:
You can also find the runnable code for this example on GitHub.
</Tip> <Steps> <Step title="Configure file storage">TensorZero can store files used during multimodal inference for observability and other downstream workflows.
You can configure the object storage service in the object_storage section of the configuration file.
For production, we recommend using an S3-compatible object storage service.
For local development, you can also use the filesystem.
If you don't need to store files, you can disable object storage entirely.
TensorZero supports any S3-compatible object storage service, including AWS S3, GCP Cloud Storage, Cloudflare R2, and many more.
[object_storage]
type = "s3_compatible"
endpoint = "http://minio:9000" # optional: defaults to AWS S3
# region = "us-east-1" # optional: depends on your S3-compatible storage provider
bucket_name = "tensorzero" # optional: depends on your S3-compatible storage provider
# IMPORTANT: for production environments, remove the following setting and use a secure method of authentication in
# combination with a production-grade object storage service.
allow_http = true
The TensorZero Gateway will attempt to retrieve credentials from the following resources in order of priority:
S3_ACCESS_KEY_ID and S3_SECRET_ACCESS_KEY environment variablesAWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variablesFor local development, you can store files in a directory on the filesystem.
[object_storage]
type = "filesystem"
path = "/path/to/storage"
If you don't need to store files, you can disable object storage entirely.
[object_storage]
type = "disabled"
See Configuration Reference for more details.
</Step> <Step title="Deploy TensorZero">We'll use Docker Compose to deploy the TensorZero Gateway, Postgres, and MinIO (an open-source S3-compatible object storage service).
<Accordion title="docker-compose.yml"># This is a simplified example for learning purposes. Do not use this in production.
# For production-ready deployments, see: https://www.tensorzero.com/docs/deployment/tensorzero-gateway
services:
gateway:
image: tensorzero/gateway
volumes:
# Mount our tensorzero.toml file into the container
- ./config:/app/config:ro
command: --config-file /app/config/tensorzero.toml
environment:
OPENAI_API_KEY: ${OPENAI_API_KEY:?Environment variable OPENAI_API_KEY must be set.}
S3_ACCESS_KEY_ID: miniouser
S3_SECRET_ACCESS_KEY: miniopassword
TENSORZERO_POSTGRES_URL: postgres://postgres:postgres@postgres:5432/tensorzero
ports:
- "3000:3000"
extra_hosts:
- "host.docker.internal:host-gateway"
healthcheck:
test:
[
"CMD",
"wget",
"--no-verbose",
"--tries=1",
"--spider",
"http://localhost:3000/health",
]
start_period: 1s
start_interval: 1s
timeout: 1s
depends_on:
postgres:
condition: service_healthy
gateway-run-postgres-migrations:
condition: service_completed_successfully
minio:
condition: service_healthy
# For a production deployment, you can use AWS S3, GCP Cloud Storage, Cloudflare R2, etc.
minio:
image: bitnamilegacy/minio:2025.7.23
ports:
- "9000:9000" # API port
- "9001:9001" # Console port
environment:
MINIO_ROOT_USER: miniouser
MINIO_ROOT_PASSWORD: miniopassword
MINIO_DEFAULT_BUCKETS: tensorzero
healthcheck:
test: "mc ls local/tensorzero || exit 1"
start_period: 30s
start_interval: 1s
timeout: 1s
postgres:
image: tensorzero/postgres:17
command: ["postgres", "-c", "cron.database_name=tensorzero"]
environment:
POSTGRES_DB: tensorzero
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
ports:
- "5432:5432"
volumes:
- postgres-data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres -d tensorzero"]
start_period: 30s
start_interval: 1s
timeout: 1s
# Apply Postgres migrations before the gateway starts
gateway-run-postgres-migrations:
image: tensorzero/gateway
environment:
TENSORZERO_POSTGRES_URL: postgres://postgres:postgres@postgres:5432/tensorzero
depends_on:
postgres:
condition: service_healthy
command: ["--run-postgres-migrations"]
volumes:
postgres-data:
See Deploy the TensorZero Gateway and Deploy Postgres for production deployment instructions.
</Step> <Step title="Call LLMs with file inputs">The TensorZero Gateway accepts both embedded files (encoded as base64 strings) and remote files (specified by a URL).
<Tabs> <Tab title="Python (OpenAI SDK)">You can use the OpenAI Python SDK to send images to the TensorZero Gateway.
from openai import OpenAI
client = OpenAI(base_url="http://localhost:3000/openai/v1", api_key="not-used")
response = client.chat.completions.create(
model="tensorzero::model_name::openai::gpt-4o-mini",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "Do the images share any common features?",
},
# Remote image of Ferris the crab
{
"type": "image_url",
"image_url": {
"url": "https://raw.githubusercontent.com/tensorzero/tensorzero/eac2a230d4a4db1ea09e9c876e45bdb23a300364/tensorzero-core/tests/e2e/providers/ferris.png",
},
},
# One-pixel orange image encoded as a base64 string
{
"type": "image_url",
"image_url": {
"url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAA1JREFUGFdj+O/P8B8ABe0CTsv8mHgAAAAASUVORK5CYII=",
},
},
],
}
],
)
print(response)
You can use the OpenAI Node SDK to send images to the TensorZero Gateway.
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:3000/openai/v1",
});
const response = await client.chat.completions.create({
model: "tensorzero::model_name::openai::gpt-4o-mini",
messages: [
{
role: "user",
content: [
{
type: "text",
text: "Do the images share any common features?",
},
// Remote image of Ferris the crab
{
type: "image_url",
image_url: {
url: "https://raw.githubusercontent.com/tensorzero/tensorzero/eac2a230d4a4db1ea09e9c876e45bdb23a300364/tensorzero-core/tests/e2e/providers/ferris.png",
},
},
// One-pixel orange image encoded as a base64 string
{
type: "image_url",
image_url: {
url: "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAA1JREFUGFdj+O/P8B8ABe0CTsv8mHgAAAAASUVORK5CYII=",
},
},
],
},
],
});
console.dir(response, { depth: null });
You can also make requests directly using the OpenAI-compatible HTTP API.
curl -X POST http://localhost:3000/openai/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "tensorzero::model_name::openai::gpt-4o-mini",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Do the images share any common features?"
},
{
"type": "image_url",
"image_url": {
"url": "https://raw.githubusercontent.com/tensorzero/tensorzero/eac2a230d4a4db1ea09e9c876e45bdb23a300364/tensorzero-core/tests/e2e/providers/ferris.png"
}
},
{
"type": "image_url",
"image_url": {
"url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6QAAAA1JREFUGFdj+O/P8B8ABe0CTsv8mHgAAAAASUVORK5CYII="
}
}
]
}
]
}'
See Integrations for a list of providers that support multimodal inference.
</Step> </Steps>When working with image files, you can optionally specify a detail parameter to control the fidelity of image processing.
This parameter accepts three values: low, high, or auto.
The detail parameter only applies to image files and is ignored for other file types like PDFs or audio files.
Using low detail reduces token consumption and processing time at the cost of image quality, while high detail provides better image quality but consumes more tokens.
The auto setting allows the model provider to automatically choose the appropriate detail level based on the image characteristics.
By default, the TensorZero Gateway forwards remote file URLs directly to the model provider and fetches them separately for observability in parallel with inference. This means that in rare cases, the file the model provider fetches may differ from the one TensorZero stores (e.g. if the file at the URL changes between the two fetches).
To ensure that TensorZero and the model provider see identical inputs, you can set gateway.fetch_and_encode_input_files_before_inference = true in your configuration.
When enabled, the gateway will fetch remote input files and send them as base64-encoded payloads in the prompt.
This is recommended for if you require strict observability and reproducibility.
See Configuration Reference for more details.