Back to Tensorzero

API Reference: Inference (OpenAI-Compatible)

docs/gateway/api-reference/inference-openai-compatible.mdx

2026.4.151.3 KB
Original Source

POST /openai/v1/chat/completions

The /openai/v1/chat/completions endpoint allows TensorZero users to make TensorZero inferences with the OpenAI client. The gateway translates the OpenAI request parameters into the arguments expected by the inference endpoint and calls the same underlying implementation. This endpoint supports most of the features supported by the inference endpoint, but there are some limitations. Most notably, this endpoint doesn't support dynamic credentials, so they must be specified with a different method.

<Tip>

See the API Reference for POST /inference for more details on inference with the native TensorZero API.

</Tip>

Request

The OpenAI-compatible inference endpoints translate the OpenAI request parameters into the arguments expected by the inference endpoint.

TensorZero-specific parameters are prefixed with tensorzero:: (e.g. tensorzero::episode_id). These fields should be provided as extra body parameters in the request body.

<Warning>

The gateway will use the credentials specified in the tensorzero.toml file. In most cases, these credentials will be environment variables available to the TensorZero gateway — not your OpenAI client.

API keys sent from the OpenAI client will be ignored.

</Warning>

tensorzero::cache_options

  • Type: object
  • Required: no

Controls caching behavior for inference requests. This object accepts two fields:

  • enabled (string): The cache mode. Can be one of:
    • "write_only" (default): Only write to cache but don't serve cached responses
    • "read_only": Only read from cache but don't write new entries
    • "on": Both read from and write to cache
    • "off": Disable caching completely
  • max_age_s (integer or null): Maximum age in seconds for cache entries to be considered valid when reading from cache. Does not set a TTL for cache expiration. Default is null (no age limit).

When using the OpenAI client libraries, pass this parameter via extra_body.

See the Inference Caching guide for more details.

tensorzero::credentials

  • Type: object (a map from dynamic credential names to API keys)
  • Required: no (default: no credentials)

Each model provider in your TensorZero configuration can be configured to accept credentials at inference time by using the dynamic location (e.g. dynamic::my_dynamic_api_key_name). See the configuration reference for more details. The gateway expects the credentials to be provided in the credentials field of the request body as specified below. The gateway will return a 400 error if the credentials are not provided and the model provider has been configured with dynamic credentials.

<Accordion title="Example">
toml
[models.my_model_name.providers.my_provider_name]
# ...
# Note: the name of the credential field (e.g. `api_key_location`) depends on the provider type
api_key_location = "dynamic::my_dynamic_api_key_name"
# ...
json
{
  // ...
  "tensorzero::credentials": {
    // ...
    "my_dynamic_api_key_name": "sk-..."
    // ...
  }
  // ...
}
</Accordion>

tensorzero::deny_unknown_fields

  • Type: boolean
  • Required: no (default: false)

If true, the gateway will return an error if the request contains any unknown or unrecognized fields. By default, unknown fields are ignored with a warning logged.

This field does not affect the tensorzero::extra_body field, only unknown fields at the root of the request body.

This field should be provided as an extra body parameter in the request body.

python
response = oai.chat.completions.create(
    model="tensorzero::model_name::openai::gpt-5-mini",
    messages=[
        {
            "role": "user",
            "content": "Tell me a fun fact.",
        }
    ],
    extra_body={
        "tensorzero::deny_unknown_fields": True,
    },
    ultrathink=True,  # made-up parameter → `deny_unknown_fields` would reject this request
)

tensorzero::dryrun

  • Type: boolean
  • Required: no

If true, the inference request will be executed but won't be stored to the database. The gateway will still call the downstream model providers.

This field is primarily for debugging and testing, and you should generally not use it in production.

This field should be provided as an extra body parameter in the request body.

tensorzero::episode_id

  • Type: UUID
  • Required: no

The ID of an existing episode to associate the inference with. If null, the gateway will generate a new episode ID and return it in the response. See Episodes for more information.

This field should be provided as an extra body parameter in the request body.

tensorzero::extra_body

  • Type: array of objects (see below)
  • Required: no

The tensorzero::extra_body field allows you to modify the request body that TensorZero sends to a model provider. This advanced feature is an "escape hatch" that lets you use provider-specific functionality that TensorZero hasn't implemented yet.

<Warning>

The OpenAI SDKs generally also support such functionality.

If you use the OpenAI SDK's extra_body field, it will override the request from the client to the gateway. If you use tensorzero::extra_body, it will override the request from the gateway to the model provider.

</Warning>

Each object in the array must have two or three fields:

  • pointer: A JSON Pointer string specifying where to modify the request body
    • Use - as the final path element to append to an array (e.g., /messages/- appends to messages)
  • One of the following:
    • value: The value to insert at that location; it can be of any type including nested types
    • delete = true: Deletes the field at the specified location, if present.
  • Optional: If one of the following is specified, the modification will only be applied to the specified variant, model, or model provider. If neither is specified, the modification applies to all model inferences.
    • variant_name
    • model_name
    • model_name and provider_name
<Tip>

You can also set extra_body in the configuration file. The values provided at inference-time take priority over the values in the configuration file.

</Tip> <Accordion title="Example">

If TensorZero would normally send this request body to the provider...

json
{
  "project": "tensorzero",
  "safety_checks": {
    "no_internet": false,
    "no_agi": true
  }
}

...then the following extra_body in the inference request...

json
{
  // ...
  "tensorzero::extra_body": [
    {
      "variant_name": "my_variant", // or "model_name": "my_model", "provider_name": "my_provider"
      "pointer": "/agi",
      "value": true
    },
    {
      // No `variant_name` or `model_name`/`provider_name` specified, so it applies to all variants and providers
      "pointer": "/safety_checks/no_agi",
      "value": {
        "bypass": "on"
      }
    }
  ]
}

...overrides the request body to:

json
{
  "agi": true,
  "project": "tensorzero",
  "safety_checks": {
    "no_internet": false,
    "no_agi": {
      "bypass": "on"
    }
  }
}
</Accordion>

tensorzero::extra_headers

  • Type: array of objects (see below)
  • Required: no

The tensorzero::extra_headers field allows you to modify the request headers that TensorZero sends to a model provider. This advanced feature is an "escape hatch" that lets you use provider-specific functionality that TensorZero hasn't implemented yet.

<Warning>

The OpenAI SDKs generally also support such functionality.

If you use the OpenAI SDK's extra_headers field, it will override the request from the client to the gateway. If you use tensorzero::extra_headers, it will override the request from the gateway to the model provider.

</Warning>

Each object in the array must have two or three fields:

  • name: The name of the header to modify
  • value: The value to set the header to
  • Optional: If one of the following is specified, the modification will only be applied to the specified variant, model, or model provider. If neither is specified, the modification applies to all model inferences.
    • variant_name
    • model_name
    • model_name and provider_name
<Tip>

You can also set extra_headers in the configuration file. The values provided at inference-time take priority over the values in the configuration file.

</Tip> <Accordion title="Example">

If TensorZero would normally send the following request headers to the provider...

text
Safety-Checks: on

...then the following extra_headers...

json
{
  "extra_headers": [
    {
      "variant_name": "my_variant", // or "model_name": "my_model", "provider_name": "my_provider"
      "name": "Safety-Checks",
      "value": "off"
    },
    {
      // No `variant_name` or `model_name`/`provider_name` specified, so it applies to all variants and providers
      "name": "Intelligence-Level",
      "value": "AGI"
    }
  ]
}

...overrides the request headers so that Safety-Checks is set to off only for my_variant, while Intelligence-Level: AGI is applied globally to all variants and providers:

text
Safety-Checks: off
Intelligence-Level: AGI
</Accordion>

tensorzero::include_raw_response

  • Type: boolean
  • Required: no

If true, the raw responses from all model inferences will be included in the response in the tensorzero_raw_response field as an array.

See tensorzero_raw_response in the response section for more details.

tensorzero::include_raw_usage

  • Type: boolean
  • Required: no

If true, the response's usage object will include a tensorzero_raw_usage field containing an array of raw provider-specific usage data from each model inference.

This is useful for accessing provider-specific usage fields that TensorZero normalizes away, such as OpenAI's reasoning_tokens.

For streaming requests, this requires stream_options.include_usage to be true (or omitted, in which case it will be automatically enabled).

See tensorzero_raw_usage in the response section for more details.

tensorzero::namespace

  • Type: string
  • Required: no

Selects a namespace-specific experimentation config for this request. If the function has a matching namespace config, it will be used instead of the base experimentation config. If no matching config exists, the base config is used as a fallback.

The namespace is also validated against namespaced models: if the selected variant uses a model scoped to a different namespace, the request will fail.

The value must be a non-empty string. It is stored as the tensorzero::namespace tag on the inference record. Behaves the same as the namespace parameter in the native API.

Provide this as an extra body parameter:

python
response = client.chat.completions.create(
    model="tensorzero::function_name::draft_email",
    messages=[...],
    extra_body={
        "tensorzero::namespace": "mobile",
    },
)

See Scope experiments with namespaces for a full guide.

tensorzero::params

  • Type: object
  • Required: no

Allows you to override inference parameters dynamically at request time.

This field accepts an object with a chat_completion field containing any of the following parameters:

  • frequency_penalty (float): Penalizes tokens based on their frequency
  • json_mode (object): Controls JSON output formatting
  • max_tokens (integer): Maximum number of tokens to generate
  • presence_penalty (float): Penalizes tokens based on their presence
  • reasoning_effort (string): Effort level for reasoning models
  • seed (integer): Random seed for deterministic outputs
  • service_tier (string): Service tier for the request
  • stop_sequences (list of strings): Sequences that stop generation
  • temperature (float): Controls randomness in the output
  • thinking_budget_tokens (integer): Token budget for thinking/reasoning
  • top_p (float): Nucleus sampling parameter
  • verbosity (string): Output verbosity level
<Note>

When using the OpenAI-compatible endpoint, values specified in tensorzero::params take precedence over parameters provided directly in the request body (e.g., top-level temperature, max_tokens) or inferred from other fields (e.g., json_mode inferred from response_format).

</Note> <Accordion title="Example">
python
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:3000/openai/v1",
    api_key="your_api_key",
)

response = client.chat.completions.create(
    model="tensorzero::function_name::my_function",
    messages=[
        {"role": "user", "content": "Explain quantum computing"}
    ],
    extra_body={
        "tensorzero::params": {
            "chat_completion": {
                "temperature": 0.7,
                "max_tokens": 500,
                "reasoning_effort": "high"
            }
        }
    }
)
</Accordion>

tensorzero::provider_tools

  • Type: array of objects
  • Required: no (default: [])

A list of provider-specific built-in tools that can be used by the model during inference. These are tools that run server-side on the provider's infrastructure, such as OpenAI's web search tool.

Each object in the array has the following fields:

  • scope (object, optional): Limits which model/provider combination can use this tool. If omitted, the tool is available to all compatible providers.
    • model_name (string): The model name as defined in your configuration
    • provider_name (string, optional): The provider name for that model. If omitted, the tool is available to all providers for the specified model.
  • tool (object, required): The provider-specific tool configuration as defined by the provider's API

When using OpenAI client libraries, pass this parameter via extra_body.

This field allows for dynamic provider tool configuration at runtime. You should prefer to define provider tools in the configuration file if possible (see Configuration Reference). Only use this field if dynamic provider tool configuration is necessary for your use case.

<Accordion title="Example: OpenAI Web Search (Unscoped)">
python
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:3000/openai/v1",
    api_key="your_api_key",
)

response = client.chat.completions.create(
    model="tensorzero::function_name::my_function",
    messages=[
        {"role": "user", "content": "What were the latest developments in AI this week?"}
    ],
    extra_body={
        "tensorzero::provider_tools": [
            {
                "tool": {
                    "type": "web_search"
                }
            }
        ]
    }
)

This makes the web search tool available to all compatible providers configured for the function.

</Accordion> <Accordion title="Example: OpenAI Web Search (Scoped)">
python
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:3000/openai/v1",
    api_key="your_api_key",
)

response = client.chat.completions.create(
    model="tensorzero::function_name::my_function",
    messages=[
        {"role": "user", "content": "What were the latest developments in AI this week?"}
    ],
    extra_body={
        "tensorzero::provider_tools": [
            {
                "scope": {
                    "model_name": "gpt-5-mini",
                    "provider_name": "openai"
                },
                "tool": {
                    "type": "web_search"
                }
            }
        ]
    }
)

This makes the web search tool available only to the OpenAI provider for the gpt-5-mini model.

</Accordion>

tensorzero::tags

  • Type: flat JSON object with string keys and values
  • Required: no

User-provided tags to associate with the inference.

For example, {"user_id": "123"} or {"author": "Alice"}.

frequency_penalty

  • Type: float
  • Required: no (default: null)

Penalizes new tokens based on their frequency in the text so far if positive, encourages them if negative. Overrides the frequency_penalty setting for any chat completion variants being used.

max_completion_tokens

  • Type: integer
  • Required: no (default: null)

Limits the number of tokens that can be generated by the model in a chat completion variant. If both this and max_tokens are set, the smaller value is used.

max_tokens

  • Type: integer
  • Required: no (default: null)

Limits the number of tokens that can be generated by the model in a chat completion variant. If both this and max_completion_tokens are set, the smaller value is used.

messages

  • Type: list
  • Required: yes

A list of messages to provide to the model.

Each message is an object with the following fields:

  • role (required): The role of the message sender in an OpenAI message (assistant, system, tool, or user).
  • content (required for user and system messages and optional for assistant and tool messages): The content of the message. The content must be either a string or an array of content blocks (see below).
  • tool_calls (optional for assistant messages, otherwise disallowed): A list of tool calls. Each tool call is an object with the following fields:
    • id: A unique identifier for the tool call
    • type: The type of tool being called (currently only "function" is supported)
    • function: An object containing:
      • name: The name of the function to call
      • arguments: A JSON string containing the function arguments
  • tensorzero_extra_content (optional for assistant messages, otherwise disallowed): An array of extra content blocks to include in the message. This is used to round-trip non-text content blocks (such as thoughts from reasoning models) that were returned in a previous response. See tensorzero_extra_content in the response section for more details. Each block has the following fields:
    • type: The type of the extra content block ("thought" or "unknown")
    • insert_index (optional): The position in the message's content array where this block should be inserted. If omitted, the block is prepended to the beginning of the content.
    • Additional fields depending on the type (e.g. text, signature for "thought" blocks; data for "unknown" blocks).
  • tool_call_id (required for tool messages, otherwise disallowed): The ID of the tool call to associate with the message. This should be one that was originally returned by the gateway in a tool call id field.

A content block is an object that can have type text, image_url, or TensorZero-specific types.

If the content block has type text, it must have either of the following additional fields:

  • text: The text for the content block.
  • tensorzero::arguments: A JSON object containing the function arguments for TensorZero functions with templates and schemas (see Create a prompt template for details).

If a content block has type image_url, it must have the following additional fields:

  • "image_url": A JSON object with the following fields:
    • url: The URL for a remote image (e.g. "https://example.com/image.png") or base64-encoded data for an embedded image (e.g. "data:image/png;base64,...").
    • detail (optional): Controls the fidelity of image processing. Only applies to image files; ignored for other file types. Can be low, high, or auto. Affects token consumption and image quality.

If a content block has type input_audio, it must have the following additional field:

  • input_audio: An object containing:
    • data: Base64-encoded audio data (without a data: prefix or MIME type header).
    • format: The audio format as a string (e.g., "mp3", "wav"). Note: The MIME type is detected from the actual audio bytes, and a warning is logged if the detected type differs from this field.

The TensorZero-specific content block types are:

  • tensorzero::raw_text: Bypasses templates and schemas, sending text directly to the model. Useful for testing prompts or dynamic injection without configuration changes. Must have a value field containing the text. Only works for user and assistant messages; for system messages, it is treated as plain text and will not bypass templates.
  • tensorzero::template: Explicitly specify a template to use. Must have name and arguments fields.

model

  • Type: string
  • Required: yes

The name of the TensorZero function or model being called, with the appropriate prefix.

<table> <tbody> <tr> <td width="50%"> <b>To call...</b> </td> <td width="50%"> <b>Use this format...</b> </td> </tr> <tr> <td width="50%"> A function defined as `[functions.my_function]` in your `tensorzero.toml` configuration file </td> <td width="50%">`tensorzero::function_name::my_function`</td> </tr> <tr> <td width="50%"> A model defined as `[models.my_model]` in your `tensorzero.toml` configuration file </td> <td width="50%">`tensorzero::model_name::my_model`</td> </tr> <tr> <td width="50%"> A model offered by a model provider, without defining it in your `tensorzero.toml` configuration file (if supported, see below) </td> <td width="50%"> `tensorzero::model_name::{provider_type}::{model_name}` </td> </tr> </tbody> </table> <Tip>

The following model providers support short-hand model names: anthropic, deepseek, fireworks, gcp_vertex_anthropic, gcp_vertex_gemini, google_ai_studio_gemini, groq, hyperbolic, mistral, openai, openrouter, together, and xai.

</Tip>

For example, if you have the following configuration:

toml
[models.gpt-4o]
routing = ["openai", "azure"]

[models.gpt-4o.providers.openai]
# ...

[models.gpt-4o.providers.azure]
# ...

[functions.extract-data]
# ...

Then:

  • tensorzero::function_name::extract-data calls the extract-data function defined above.
  • tensorzero::model_name::gpt-4o calls the gpt-4o model in your configuration, which supports fallback from openai to azure. See Retries & Fallbacks for details.
  • tensorzero::model_name::openai::gpt-4o calls the OpenAI API directly for the gpt-4o model, ignoring the gpt-4o model defined above.
<Warning>

Be careful about the different prefixes: tensorzero::model_name::gpt-4o will use the [models.gpt-4o] model defined in the tensorzero.toml file, whereas tensorzero::model_name::openai::gpt-4o will call the OpenAI API directly for the gpt-4o model.

</Warning>

parallel_tool_calls

  • Type: boolean
  • Required: no (default: null)

Overrides the parallel_tool_calls setting for the function being called.

presence_penalty

  • Type: float
  • Required: no (default: null)

Penalizes new tokens based on whether they appear in the text so far if positive, encourages them if negative. Overrides the presence_penalty setting for any chat completion variants being used.

response_format

  • Type: either a string or an object
  • Required: no (default: null)

Options here are "text", "json_object", and "{"type": "json_schema", "schema": ...}", where the schema field contains a valid JSON schema. This field is not actually respected except for the "json_schema" variant, in which the schema field can be used to dynamically set the output schema for a json function.

seed

  • Type: integer
  • Required: no (default: null)

Overrides the seed setting for any chat completion variants being used.

stop_sequences

  • Type: list of strings
  • Required: no (default: null)

Overrides the stop_sequences setting for any chat completion variants being used.

stream

  • Type: boolean
  • Required: no (default: false)

If true, the gateway will stream the response to the client in an OpenAI-compatible format.

stream_options

  • Type: object with field "include_usage"
  • Required: no (default: null)

If "include_usage" is true, the gateway will include usage information in the response.

<Accordion title="Example">

If the following stream_options is provided...

json
{
  ...
  "stream_options": {
    "include_usage": true
  }
  ...
}

...then the gateway will include usage information in the response.

json
{
  ...
  "usage": {
    "prompt_tokens": 123,
    "completion_tokens": 456,
    "total_tokens": 579,
    "tensorzero_cost": 0.0003
  }
  ...
</Accordion>

temperature

  • Type: float
  • Required: no (default: null)

Overrides the temperature setting for any chat completion variants being used.

tools

  • Type: list of tool objects (see below)
  • Required: no (default: null)

Allows the user to dynamically specify tools at inference time in addition to those that are specified in the configuration.

Function Tools

Function tools are the typical tools used with LLMs. Each function tool object has the following structure:

  • type: Must be "function"
  • function: An object containing:
    • name: The name of the function (string, required)
    • description: A description of what the function does (string, optional)
    • parameters: A JSON Schema object describing the function's parameters (required)
    • strict: Whether to enforce strict schema validation (boolean, defaults to false)
OpenAI Custom Tools
<Warning>

OpenAI custom tools are only supported by OpenAI models (both Chat Completions and Responses APIs). Using custom tools with other providers will result in an error.

</Warning>

OpenAI custom tools support alternative output formats beyond JSON Schema, such as freeform text or grammar-constrained output.

Each custom tool object has the following structure:

  • type: Must be "custom"
  • custom: An object containing:
    • name: The name of the tool (string, required)
    • description: A description of what the tool does (string, optional)
    • format: The output format for the tool (object, optional):
      • {"type": "text"}: Freeform text output
      • {"type": "grammar", "grammar": {"syntax": "lark", "definition": "..."}}: Output constrained by a Lark grammar
      • {"type": "grammar", "grammar": {"syntax": "regex", "definition": "..."}}: Output constrained by a regular expression
<Accordion title="Example: OpenAI Custom Tool">
python
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:3000/openai/v1",
    api_key="your_api_key",
)

response = client.chat.completions.create(
    model="tensorzero::model_name::openai::gpt-5-mini",
    messages=[
        {"role": "user", "content": "Generate Python code to print 'Hello, World!'"}
    ],
    tools=[
        {
            "type": "custom",
            "custom": {
                "name": "code_generator",
                "description": "Generates Python code snippets",
                "format": {"type": "text"}
            }
        }
    ],
)
</Accordion>

tool_choice

  • Type: string or object
  • Required: no (default: "none" if no tools are present, "auto" if tools are present)

Controls which (if any) tool is called by the model by overriding the value in configuration. Supported values:

  • "none": The model will not call any tool and instead generates a message
  • "auto": The model can pick between generating a message or calling one or more tools
  • "required": The model must call one or more tools
  • {"type": "function", "function": {"name": "my_function"}}: Forces the model to call the specified tool
  • {"type": "allowed_tools", "allowed_tools": {"tools": [...], "mode": "auto"|"required"}}: Restricts which tools can be called

top_p

  • Type: float
  • Required: no (default: null)

Overrides the top_p setting for any chat completion variants being used.

tensorzero::variant_name

  • Type: string
  • Required: no

If set, pins the inference request to a particular variant (not recommended).

You should generally not set this field, and instead let the TensorZero gateway assign a variant. This field is primarily used for testing or debugging purposes.

This field should be provided as an extra body parameter in the request body.

Request Headers

You can attach custom OTLP trace metadata to individual inference requests using HTTP headers. This allows you to extend TensorZero's OpenTelemetry integration with per-request metadata useful for other observability solutions. When using the OpenAI client SDKs, pass these as extra_headers.

Header prefixDescription
tensorzero-otlp-traces-extra-header-Custom headers to include in OTLP trace exports. Merged with static headers from export.otlp.traces.extra_headers (dynamic values take precedence).
tensorzero-otlp-traces-extra-attribute-Custom span attributes to attach to OTLP trace exports.
tensorzero-otlp-traces-extra-resource-Custom resource attributes to attach to OTLP trace exports.

See Export OpenTelemetry traces for more details and examples.

Response

<Tabs> <Tab title="Regular">

In regular (non-streaming) mode, the response is a JSON object with the following fields:

choices

  • Type: list of choice objects, where each choice contains:
    • index: A zero-based index indicating the choice's position in the list (integer)
    • finish_reason: Always "stop".
    • message: An object containing:
      • content: The message content (string, optional)
      • tool_calls: List of tool calls made by the model (optional). The format is the same as in the request.
      • tensorzero_extra_content: An array of extra content blocks not representable as standard OpenAI content or tool_calls (optional). See tensorzero_extra_content below for details.
      • role: The role of the message sender (always "assistant").

created

  • Type: integer

The Unix timestamp (in seconds) of when the inference was created.

episode_id

  • Type: UUID

The ID of the episode that the inference was created for.

id

  • Type: UUID

The inference ID.

model

  • Type: string

The name of the variant that was actually used for the inference.

object

  • Type: string

The type of the inference object (always "chat.completion").

system_fingerprint

  • Type: string

Always ""

usage

  • Type: object

Contains token usage information for the request and response, with the following fields:

  • prompt_tokens: Number of tokens in the prompt (integer)
  • completion_tokens: Number of tokens in the completion (integer)
  • total_tokens: Total number of tokens used (integer)
  • prompt_tokens_details: Object containing detailed prompt token breakdown (optional). Only present when the provider reports cache metrics.
    • cached_tokens: Number of input tokens served from the provider's prompt cache (integer).
  • tensorzero_provider_cache_write_input_tokens: Number of input tokens written to the provider's prompt cache (integer, optional). Only present when the provider reports cache metrics.
  • tensorzero_cost: The cost in dollars for the inference (number or null). Set to null when cost is not configured for the model provider or the provider does not report the relevant information.

See Track usage and cost for more information.

tensorzero_extra_content

  • Type: array (optional)

An array of extra content blocks that are not representable as standard OpenAI content or tool_calls. This field is present when the model returns non-text content blocks, such as reasoning thoughts from models like Anthropic Claude with extended thinking.

Each block has the following fields:

  • type (string): The type of the extra content block. Currently supported types are:
    • "thought": A reasoning/thinking block from a model that supports extended thinking.
    • "unknown": A provider-specific content block that doesn't fit standard categories.
  • insert_index (integer): The position this block occupied in the full content array returned by the model. This indicates where the block appeared relative to text and other content blocks.

For "thought" blocks, the following additional fields are available:

  • text (string, optional): The text content of the thought.
  • signature (string, optional): An opaque signature used by some providers (e.g. Anthropic) for multi-turn reasoning conversations. You should pass this back when round-tripping.
  • summary (array, optional): Summary blocks for the thought.
  • provider_type (string, optional): The provider type that generated this thought (e.g. "anthropic"). When round-tripping, TensorZero uses this to send the thought only to compatible providers.
  • extra_data (object, optional): Provider-specific opaque data for multi-turn reasoning support.

For "unknown" blocks, the following additional fields are available:

  • data (any): The underlying content block as returned by the model provider.
  • model_name (string, optional): The model name associated with this content block.
  • provider_name (string, optional): The provider name associated with this content block.
<Tip>

You can round-trip tensorzero_extra_content by including it in an assistant message in a follow-up request. This preserves reasoning context across multi-turn conversations.

</Tip>

tensorzero_raw_response

  • Type: array (optional, only when tensorzero::include_raw_response is true)

An array of raw provider-specific response data from all model inferences. Each entry contains:

  • model_inference_id: UUID of the model inference.
  • provider_type: The provider type (e.g., "openai", "anthropic").
  • data: The raw response string from the provider.

For complex variants like experimental_best_of_n_sampling, this includes raw responses from all candidate inferences as well as the evaluator/fuser inference.

tensorzero_raw_usage

  • Type: array (optional, only when tensorzero::include_raw_usage is true)

An array of raw provider-specific usage data. Each entry contains:

  • model_inference_id: UUID of the model inference.
  • provider_type: The provider type (e.g., "openai", "anthropic").
  • api_type: The API type ("chat_completions", "responses", or "embeddings").
  • data (optional): The raw usage object from the provider. The field is optional because some providers don't return usage.
</Tab> <Tab title="Streaming">

In streaming mode, the response is an <a href="https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events#event_stream_format" target="_blank">SSE</a> stream of JSON messages, followed by a final [DONE] message.

Each JSON message has the following fields:

choices

  • Type: list

A list of choices from the model, where each choice contains:

  • index: The index of the choice (integer)
  • finish_reason: always ""
  • delta: An object containing any of:
    • content: The next piece of generated text (string)
    • tool_calls: A list of tool calls, each containing the next piece of the tool call being generated
    • tensorzero_extra_content: An array of extra content block chunks (e.g. thought deltas). Each chunk includes an insert_index field indicating the block's position in the full content array, along with the incremental content for that block.

created

  • Type: integer

The Unix timestamp (in seconds) of when the inference was created.

episode_id

  • Type: UUID

The ID of the episode that the inference was created for.

id

  • Type: UUID

The inference ID.

model

  • Type: string

The name of the variant that was actually used for the inference.

object

  • Type: string

The type of the inference object (always "chat.completion").

system_fingerprint

  • Type: string

Always ""

usage

  • Type: object
  • Required: no

Contains token usage information for the request and response, with the following fields:

  • prompt_tokens: Number of tokens in the prompt (integer)
  • completion_tokens: Number of tokens in the completion (integer)
  • total_tokens: Total number of tokens used (integer)
  • prompt_tokens_details: Object containing detailed prompt token breakdown (optional). Only present when the provider reports cache metrics.
    • cached_tokens: Number of input tokens served from the provider's prompt cache (integer).
  • tensorzero_provider_cache_write_input_tokens: Number of input tokens written to the provider's prompt cache (integer, optional). Only present when the provider reports cache metrics.
  • tensorzero_cost: The cost in dollars for the inference (number or null). Set to null when cost is not configured for the model provider or the provider does not report the relevant information.

See Track usage and cost for more information.

tensorzero_raw_response

  • Type: array (optional, only when tensorzero::include_raw_response is true)

An array of raw provider-specific response data from previous model inferences (e.g., best-of-n candidates). Each entry contains:

  • model_inference_id: UUID of the model inference.
  • provider_type: The provider type (e.g., "openai", "anthropic").
  • data: The raw response string from the provider.

This field is typically emitted in an early chunk of the stream and contains raw responses from model inferences that occurred before the current streaming inference (e.g., candidate inferences in experimental_best_of_n_sampling).

tensorzero_raw_chunk

  • Type: string (optional, only when tensorzero::include_raw_response is true)

The raw chunk from the model provider as a JSON string for the current streaming inference.

tensorzero_raw_usage

  • Type: array (optional, only when tensorzero::include_raw_usage is true)

An array of raw provider-specific usage data. Each entry contains:

  • model_inference_id: UUID of the model inference.
  • provider_type: The provider type (e.g., "openai", "anthropic").
  • api_type: The API type ("chat_completions", "responses", or "embeddings").
  • data (optional): The raw usage object from the provider. The field is optional because some providers don't return usage.

See Track usage and cost for more information.

</Tab> </Tabs>

Examples

<span class="!invisible !h-0 !m-0 !p-0 !inline">

Chat Function with Structured System Prompt

</span> <Accordion title="Chat Function with Structured System Prompt">
Configuration
toml
# ...
[functions.draft_email]
type = "chat"
system_schema = "functions/draft_email/system_schema.json"
# ...
json
// functions/draft_email/system_schema.json
{
  "type": "object",
  "properties": {
    "assistant_name": { "type": "string" }
  }
}
Request
<Tabs> <Tab title="Python">
python
from openai import AsyncOpenAI

async with AsyncOpenAI(
    base_url="http://localhost:3000/openai/v1"
) as client:
    result = await client.chat.completions.create(
        # there already was an episode_id from an earlier inference
        extra_body={"tensorzero::episode_id": str(episode_id)},
        messages=[
            {
                "role": "system",
                "content": [{"assistant_name": "Alfred Pennyworth"}]
                # NOTE: the JSON is in an array here so that a structured system message can be sent
            },
            {
                "role": "user",
                "content": "I need to write an email to Gabriel explaining..."
            }
        ],
        model="tensorzero::function_name::draft_email",
        temperature=0.4,
        # Optional: stream=True
    )
</Tab> <Tab title="HTTP">
bash
curl -X POST http://localhost:3000/openai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "episode_id: your_episode_id_here" \
  -d '{
    "messages": [
      {
        "role": "system",
        "content": [{"assistant_name": "Alfred Pennyworth"}]
      },
      {
        "role": "user",
        "content": "I need to write an email to Gabriel explaining..."
      }
    ],
    "model": "tensorzero::function_name::draft_email",
    "temperature": 0.4
    // Optional: "stream": true
  }'
</Tab> </Tabs>
Response
<Tabs> <Tab title="Regular">
json
{
  "id": "00000000-0000-0000-0000-000000000000",
  "episode_id": "11111111-1111-1111-1111-111111111111",
  "model": "email_draft_variant",
  "choices": [
    {
      "index": 0,
      "finish_reason": "stop",
      "message": {
        "content": "Hi Gabriel,\n\nI noticed...",
        "role": "assistant"
      }
    }
  ],
  "usage": {
    "prompt_tokens": 100,
    "completion_tokens": 100,
    "total_tokens": 200,
    "tensorzero_cost": 0.0003
  }
}
</Tab> <Tab title="Streaming">

In streaming mode, the response is an <a href="https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events#event_stream_format" target="_blank">SSE</a> stream of JSON messages, followed by a final [DONE] message.

Each JSON message has the following fields:

json
{
  "id": "00000000-0000-0000-0000-000000000000",
  "episode_id": "11111111-1111-1111-1111-111111111111",
  "model": "email_draft_variant",
  "choices": [
    {
      "index": 0,
      "finish_reason": "stop",
      "delta": {
        "content": "Hi Gabriel,\n\nI noticed..."
      }
    }
  ],
  "usage": {
    "prompt_tokens": 100,
    "completion_tokens": 100,
    "total_tokens": 200,
    "tensorzero_cost": 0.0003
  }
}
</Tab> </Tabs> </Accordion> <span class="!invisible !h-0 !m-0 !p-0 !inline">

Chat Function with Dynamic Tool Use

</span> <Accordion title="Chat Function with Dynamic Tool Use">
Configuration
toml
# ...

[functions.weather_bot]
type = "chat"
# Note: no `tools = ["get_temperature"]` field in configuration

# ...

Request
<Tabs> <Tab title="Python">
python
from openai import AsyncOpenAI

async with AsyncOpenAI(
    base_url="http://localhost:3000/openai/v1"
) as client:
    result = await client.chat.completions.create(
        model="tensorzero::function_name::weather_bot",
        input={
            "messages": [
                {
                    "role": "user",
                    "content": "What is the weather like in Tokyo?"
                }
            ]
        },
        tools=[
            {
              "type": "function",
              "function": {
                  "name": "get_temperature",
                  "description": "Get the current temperature in a given location",
                  "parameters": {
                    "$schema": "http://json-schema.org/draft-07/schema#",
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The location to get the temperature for (e.g. \"New York\")"
                        },
                        "units": {
                            "type": "string",
                            "description": "The units to get the temperature in (must be \"fahrenheit\" or \"celsius\")",
                            "enum": ["fahrenheit", "celsius"]
                        }
                    },
                    "required": ["location"],
                    "additionalProperties": false
                }
              }
            }
        ],
        # optional: stream=True,
    )
</Tab> <Tab title="HTTP">
bash
curl -X POST http://localhost:3000/openai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tensorzero::function_name::weather_bot",
    "input": {
      "messages": [
        {
          "role": "user",
          "content": "What is the weather like in Tokyo?"
        }
      ]
    },
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_temperature",
          "description": "Get the current temperature in a given location",
          "parameters": {
            "$schema": "http://json-schema.org/draft-07/schema#",
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "The location to get the temperature for (e.g. \"New York\")"
              },
              "units": {
                "type": "string",
                "description": "The units to get the temperature in (must be \"fahrenheit\" or \"celsius\")",
                "enum": ["fahrenheit", "celsius"]
              }
            },
            "required": ["location"],
            "additionalProperties": false
          }
        }
      }
    ]
    // optional: "stream": true
  }'
</Tab> </Tabs>
Response
<Tabs> <Tab title="Regular">
json
{
  "id": "00000000-0000-0000-0000-000000000000",
  "episode_id": "11111111-1111-1111-1111-111111111111",
  "model": "weather_bot_variant",
  "choices": [
    {
      "index": 0,
      "finish_reason": "stop",
      "message": {
        "content": null,
        "tool_calls": [
          {
            "id": "123456789",
            "type": "function",
            "function": {
              "name": "get_temperature",
              "arguments": "{\"location\": \"Tokyo\", \"units\": \"celsius\"}"
            }
          }
        ],
        "role": "assistant"
      }
    }
  ],
  "usage": {
    "prompt_tokens": 100,
    "completion_tokens": 100,
    "total_tokens": 200,
    "tensorzero_cost": 0.0003
  }
}
</Tab> <Tab title="Streaming">

In streaming mode, the response is an <a href="https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events#event_stream_format" target="_blank">SSE</a> stream of JSON messages, followed by a final [DONE] message.

Each JSON message has the following fields:

json
{
  "id": "00000000-0000-0000-0000-000000000000",
  "episode_id": "11111111-1111-1111-1111-111111111111",
  "model": "weather_bot_variant",
  "choices": [
    {
      "index": 0,
      "finish_reason": "stop",
      "message": {
        "content": null,
        "tool_calls": [
          {
            "id": "123456789",
            "type": "function",
            "function": {
              "name": "get_temperature",
              "arguments": "{\"location\":" // a tool arguments delta
            }
          }
        ]
      }
    }
  ],
  "usage": {
    "prompt_tokens": 100,
    "completion_tokens": 100,
    "total_tokens": 200,
    "tensorzero_cost": 0.0003
  }
}
</Tab> </Tabs> </Accordion> <span class="!invisible !h-0 !m-0 !p-0 !inline">

Json Function with Dynamic Output Schema

</span> <Accordion title="JSON Function with Dynamic Output Schema">
Configuration
toml
# ...
[functions.extract_email]
type = "json"
output_schema = "output_schema.json"
# ...
json
// output_schema.json
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "email": {
      "type": "string"
    }
  },
  "required": ["email"]
}
Request
<Tabs> <Tab title="Python">
python
from openai import AsyncOpenAI

dynamic_output_schema = {
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "email": { "type": "string" },
    "domain": { "type": "string" }
  },
  "required": ["email", "domain"]
}

async with AsyncOpenAI(
    base_url="http://localhost:3000/openai/v1"
) as client:
    result = await client.chat.completions.create(
        model="tensorzero::function_name::extract_email",
        input={
            "system": "You are an AI assistant...",
            "messages": [
                {
                    "role": "user",
                    "content": "...blah blah blah [email protected] blah blah blah..."
                }
            ]
        }
        # Override the output schema using the `response_format` field
        response_format={"type": "json_schema", "schema": dynamic_output_schema}
        # optional: stream=True,
    )
</Tab> <Tab title="HTTP">
bash
curl -X POST http://localhost:3000/openai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tensorzero::function_name::extract_email",
    "input": {
      "system": "You are an AI assistant...",
      "messages": [
        {
          "role": "user",
          "content": "...blah blah blah [email protected] blah blah blah..."
        }
      ]
    },
    "response_format": {
      "type": "json_schema",
      "schema": {
        "$schema": "http://json-schema.org/draft-07/schema#",
        "type": "object",
        "properties": {
          "email": { "type": "string" },
          "domain": { "type": "string" }
        },
        "required": ["email", "domain"]
      }
    },
    // optional: "stream": true
  }'
</Tab> </Tabs>
Response
<Tabs> <Tab title="Regular">
json
{
  "id": "00000000-0000-0000-0000-000000000000",
  "episode_id": "11111111-1111-1111-1111-111111111111",
  "model": "extract_email_variant",
  "choices": [
    {
      "index": 0,
      "finish_reason": "stop",
      "message": {
        "content": "{\"email\": \"[email protected]\", \"domain\": \"tensorzero.com\"}"
      }
    }
  ],
  "usage": {
    "prompt_tokens": 100,
    "completion_tokens": 100,
    "total_tokens": 200,
    "tensorzero_cost": 0.0003
  }
}
</Tab> <Tab title="Streaming">

In streaming mode, the response is an <a href="https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events#event_stream_format" target="_blank">SSE</a> stream of JSON messages, followed by a final [DONE] message.

Each JSON message has the following fields:

json
{
  "id": "00000000-0000-0000-0000-000000000000",
  "episode_id": "11111111-1111-1111-1111-111111111111",
  "model": "extract_email_variant",
  "choices": [
    {
      "index": 0,
      "finish_reason": "stop",
      "message": {
        "content": "{\"email\":" // a JSON content delta
      }
    }
  ],
  "usage": {
    "prompt_tokens": 100,
    "completion_tokens": 100,
    "total_tokens": 200,
    "tensorzero_cost": 0.0003
  }
}
</Tab> </Tabs> </Accordion>