Overview - Tensorzero

TensorZero provides pre-built optimization workflows covering models, prompts, and inference strategies. You can also create your own workflows to cover your specific needs.

The TensorZero Gateway collects structured inference data and the downstream metrics and feedback associated with it. This dataset sets the perfect foundation for building and optimizing LLM applications. As this dataset builds up, you can use these recipes to generate powerful variants for your functions. For example, you can use this dataset to curate data to fine-tune a custom LLM, or run an automated prompt engineering workflow.

Model Optimizations

Supervised Fine-tuning (SFT)

Supervised Fine-Tuning (SFT) trains a language model on curated examples of good behavior, resulting in a custom model that performs better on your specific use case.

TensorZero has fine-tuning integrations with OpenAI, GCP Vertex AI, Fireworks AI, and Together AI. See the Supervised Fine-Tuning Guide to learn more.

Additionally, we provide recipes for self-hosted fine-tuning with axolotl, torchtune, and unsloth.

Reinforcement Fine-tuning (RFT)

Guide coming soon...

Direct Preference Optimization (DPO)

A direct preference optimization (DPO) — also known as preference fine-tuning — recipe fine-tunes an LLM on a dataset of preference pairs. You can use demonstration feedback collected with TensorZero to curate a dataset of preference pairs and fine-tune an LLM on it.

We provide a recipe for DPO (Preference Fine-tuning) with OpenAI.

Prompt Optimization

GEPA

GEPA is an automated prompt optimization method that evolves prompts through iterative evaluation, analysis, and mutation. It uses LLMs to analyze inference results and propose prompt improvements, then filters variants using Pareto frontier selection to balance multiple objectives.

See the GEPA Guide to learn more.

Inference-Time Optimization

Best-of-N & Mixture-of-N Sampling

Best-of-N Sampling generates multiple candidate responses from a single model and selects the best one based on a scoring function or verifier. Mixture-of-N Sampling extends this by generating candidates across multiple models or variants, combining diversity with quality selection. Both techniques improve output quality at inference time without requiring additional training or fine-tuning.

See Inference-Time Optimizations to learn more.

Dynamic In-Context Learning

Dynamic In-Context Learning (DICL) is an inference-time optimization that improves LLM performance by incorporating relevant historical examples into your prompt. Instead of incorporating static examples manually in your prompts, DICL selects the most relevant examples at inference time.

See the Dynamic In-Context Learning (DICL) Guide to learn more.

Custom Recipes

You can also create your own recipes. Put simply, a recipe takes inference and feedback data collected by TensorZero and generates a new set of variants for your functions. You should should be able to use virtually any LLM engineering workflow with TensorZero, ranging from automated prompt engineering to advanced RLHF workflows.

For example, see our recipes for self-hosted supervised fine-tuning (SFT) with axolotl, torchtune, and unsloth.

Examples

We are working on a series of complete runnable examples illustrating TensorZero's data & learning flywheel.

Optimizing Data Extraction (NER) with TensorZero — This example shows how to use TensorZero to optimize a data extraction pipeline. We demonstrate techniques like fine-tuning and dynamic in-context learning (DICL). In the end, an optimized GPT-4o Mini model outperforms GPT-4o on this task — at a fraction of the cost and latency — using a small amount of training data.
Agentic RAG — Multi-Hop Question Answering with LLMs — This example shows how to build a multi-hop retrieval agent using TensorZero. The agent iteratively searches Wikipedia to gather information, and decides when it has enough context to answer a complex question.
Writing Haikus to Satisfy a Judge with Hidden Preferences — This example fine-tunes GPT-4o Mini to generate haikus tailored to a specific taste. You'll see TensorZero's "data flywheel in a box" in action: better variants leads to better data, and better data leads to better variants. You'll see progress by fine-tuning the LLM multiple times.
Image Data Extraction — Multimodal (Vision) Fine-tuning — This example shows how to fine-tune multimodal models (VLMs) like GPT-4o to improve their performance on vision-language tasks. Specifically, we'll build a system that categorizes document images (screenshots of computer science research papers).
Improving LLM Chess Ability with Best/Mixture-of-N Sampling — This example showcases how best-of-N sampling and mixture-of-N sampling can significantly enhance an LLM's chess-playing abilities by selecting the most promising moves from multiple generated options.