docs/quickstart.mdx
This Quickstart guide shows how we'd upgrade an OpenAI wrapper to a minimal TensorZero deployment with built-in observability and fine-tuning capabilities β in just 5 minutes. From there, you can take advantage of dozens of features to build best-in-class LLM applications.
This Quickstart covers a tour of TensorZero features. If you're only interested in inference with the gateway, see the shorter How to call any LLM guide.
<Tip>You can also find the runnable code for this example on GitHub.
</Tip>Imagine we're building an LLM application that writes haikus.
Today, our integration with OpenAI might look like this:
<Tabs> <Tab title="Python">from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{
"role": "user",
"content": "Write a haiku about TensorZero.",
}
],
)
print(response)
ChatCompletion(
id='chatcmpl-A5wr5WennQNF6nzF8gDo3SPIVABse',
choices=[
Choice(
finish_reason='stop',
index=0,
logprobs=None,
message=ChatCompletionMessage(
content='Silent minds awaken, \nPatterns dance in code and wire, \nDreams of thought unfold.',
role='assistant',
function_call=None,
tool_calls=None,
refusal=None
)
)
],
created=1725981243,
model='gpt-4o-mini',
object='chat.completion',
system_fingerprint='fp_483d39d857',
usage=CompletionUsage(
completion_tokens=19,
prompt_tokens=22,
total_tokens=41
)
)
import OpenAI from "openai";
const client = new OpenAI();
const response = await client.chat.completions.create({
model: "gpt-4o-mini",
messages: [
{
role: "user",
content: "Write a haiku about TensorZero.",
},
],
});
console.log(JSON.stringify(response, null, 2));
curl https://api.openai.com/v1/chat/completions \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [
{
"role": "user",
"content": "Write a haiku about TensorZero."
}
]
}'
TensorZero offers dozens of features covering inference, observability, optimization, evaluations, and experimentation.
But the absolutely minimal setup requires just a simple configuration file: tensorzero.toml.
# A function defines the task we're tackling (e.g. generating a haiku)...
[functions.generate_haiku]
type = "chat"
# ... and a variant is one of many implementations we can use to tackle it (a choice of prompt, model, etc.).
# Since we only have one variant for this function, the gateway will always use it.
[functions.generate_haiku.variants.gpt_4o_mini]
type = "chat_completion"
model = "openai::gpt-4o-mini"
This minimal configuration file tells the TensorZero Gateway everything it needs to replicate our original OpenAI call.
<Tip>Using the shorthand openai::gpt-4o-mini notation is convenient for getting started.
To learn about all configuration options including schemas, templates, and advanced variant types, see Configure functions and variants.
For production deployments with multiple providers, routing, and fallbacks, see Configure models and providers.
We're almost ready to start making API calls. Let's launch TensorZero.
OPENAI_API_KEY.tensorzero.toml in the ./config directory.docker-compose.yml file.
This Docker Compose configuration sets up a development Postgres database, the TensorZero Gateway, and the TensorZero UI.curl -LO "https://raw.githubusercontent.com/tensorzero/tensorzero/refs/heads/main/examples/docs/guides/quickstart/docker-compose.yml"
# This is a simplified example for learning purposes. Do not use this in production.
# For production-ready deployments, see: https://www.tensorzero.com/docs/gateway/deployment
services:
gateway:
image: tensorzero/gateway
volumes:
# Mount our tensorzero.toml file into the container
- ./config:/app/config:ro
command: --config-file /app/config/tensorzero.toml
environment:
OPENAI_API_KEY: ${OPENAI_API_KEY:?Environment variable OPENAI_API_KEY must be set.}
TENSORZERO_POSTGRES_URL: postgres://postgres:postgres@postgres:5432/tensorzero
ports:
- "3000:3000"
extra_hosts:
- "host.docker.internal:host-gateway"
healthcheck:
test:
[
"CMD",
"wget",
"--no-verbose",
"--tries=1",
"--spider",
"http://localhost:3000/health",
]
start_period: 10s
start_interval: 1s
timeout: 1s
depends_on:
postgres:
condition: service_healthy
gateway-run-postgres-migrations:
condition: service_completed_successfully
ui:
image: tensorzero/ui
environment:
TENSORZERO_GATEWAY_URL: http://gateway:3000
ports:
- "4000:4000"
depends_on:
gateway:
condition: service_healthy
postgres:
image: tensorzero/postgres:17
command: ["postgres", "-c", "cron.database_name=tensorzero"]
environment:
POSTGRES_DB: tensorzero
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
ports:
- "5432:5432"
volumes:
- postgres-data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres -d tensorzero"]
start_period: 30s
start_interval: 1s
timeout: 1s
# Apply Postgres migrations before the gateway starts
gateway-run-postgres-migrations:
image: tensorzero/gateway
environment:
TENSORZERO_POSTGRES_URL: postgres://postgres:postgres@postgres:5432/tensorzero
depends_on:
postgres:
condition: service_healthy
command: ["--run-postgres-migrations"]
volumes:
postgres-data:
Our setup should look like:
- config/
- tensorzero.toml
- after.* see below
- before.*
- docker-compose.yml
Let's launch everything!
docker compose up
The gateway will replicate our original OpenAI call and store the data in our database β with less than 1ms latency overhead thanks to Rust π¦.
The TensorZero Gateway is compatible with the OpenAI SDK, so you can make inference calls using the OpenAI client for Python, Node, and other languages.
<Tabs> <Tab title="Python">Point your OpenAI client to the TensorZero Gateway:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:3000/openai/v1", api_key="not-used")
response = client.chat.completions.create(
model="tensorzero::function_name::generate_haiku",
messages=[
{
"role": "user",
"content": "Write a haiku about TensorZero.",
}
],
)
print(response)
ChatCompletion(
id='0194061e-2211-7a90-9087-1c255d060b59',
choices=[
Choice(
finish_reason='stop',
index=0,
logprobs=None,
message=ChatCompletionMessage(
content='Circuit dreams awake, \nSilent minds in metal formβ \nWisdom coded deep.',
refusal=None,
role='assistant',
audio=None,
function_call=None,
tool_calls=[]
)
)
],
created=1735269425,
model='gpt_4o_mini',
object='chat.completion',
service_tier=None,
system_fingerprint='',
usage=CompletionUsage(
completion_tokens=18,
prompt_tokens=15,
total_tokens=33,
completion_tokens_details=None,
prompt_tokens_details=None
),
episode_id='0194061e-1fab-7411-9931-576b067cf0c5'
)
Point your OpenAI client to the TensorZero Gateway:
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:3000/openai/v1",
apiKey: "not-used",
});
const response = await client.chat.completions.create({
model: "tensorzero::function_name::generate_haiku",
messages: [
{
role: "user",
content: "Write a haiku about TensorZero.",
},
],
});
console.log(JSON.stringify(response, null, 2));
{
"id": "01958633-3f56-7d33-8776-d209f2e4963a",
"episode_id": "01958633-3f56-7d33-8776-d2156dd1c44b",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"message": {
"content": "Wires pulse with knowledge, \nDreams crafted in circuits hum, \nMind of code awakes. ",
"tool_calls": [],
"role": "assistant"
}
}
],
"created": 1741713261,
"model": "gpt_4o_mini",
"system_fingerprint": "",
"object": "chat.completion",
"usage": {
"prompt_tokens": 15,
"completion_tokens": 23,
"total_tokens": 38
}
}
Make an HTTP request to the TensorZero Gateway:
curl http://localhost:3000/openai/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "tensorzero::function_name::generate_haiku",
"messages": [
{
"role": "user",
"content": "Write a haiku about TensorZero."
}
]
}'
{
"id": "019cd8d7-1a7d-7d50-ac23-54da47daba7b",
"episode_id": "019cd8d7-1a7d-7d50-ac23-54e783e7833b",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": "In silent circuits, \nTensorZero weaves threads, \nAI's quiet bloom."
}
}
],
"created": 1773164502,
"model": "gpt_4o_mini",
"system_fingerprint": "",
"object": "chat.completion",
"usage": {
"prompt_tokens": 15,
"completion_tokens": 17,
"total_tokens": 32
}
}
The TensorZero UI streamlines LLM engineering workflows like observability and optimization (e.g. fine-tuning).
The Docker Compose file we used above also launched the TensorZero UI.
You can visit the UI at http://localhost:4000.
The TensorZero UI provides a dashboard for observability data. We can inspect data about individual inferences, entire functions, and more.
<div class="flex gap-4"> <div> </div> <span></span> <div> </div> </div> <Tip>This guide is pretty minimal, so the observability data is pretty simple. Once we start using more advanced functions like feedback and variants, the observability UI will enable us to track metrics, experiments (A/B tests), and more.
</Tip>The TensorZero UI also provides a workflow for fine-tuning models like GPT-4o and Llama 3.
With a few clicks, you can launch a fine-tuning job.
Once the job is complete, the TensorZero UI will provide a configuration snippet you can add to your tensorzero.toml.
We can also send metrics & feedback to the TensorZero Gateway. This data is used to curate better datasets for fine-tuning and other optimization workflows. Since we haven't done that yet, the TensorZero UI will skip the curation step before fine-tuning.
</Tip>The Quickstart guide gives a tiny taste of what TensorZero is capable of.
We strongly encourage you to check out the guides on metrics & feedback and prompt templates & schemas. Though optional, they unlock many of the downstream features TensorZero offers in experimentation and optimization.
From here, you can explore features like built-in support for inference-time optimizations, retries & fallbacks, experimentation (A/B testing) with prompts and models, and a lot more.