docs/optimization/index.mdx
TensorZero provides pre-built optimization workflows covering models, prompts, and inference strategies. You can also create your own workflows to cover your specific needs.
The TensorZero Gateway collects structured inference data and the downstream metrics and feedback associated with it. This dataset sets the perfect foundation for building and optimizing LLM applications. As this dataset builds up, you can use these recipes to generate powerful variants for your functions. For example, you can use this dataset to curate data to fine-tune a custom LLM, or run an automated prompt engineering workflow.
Supervised Fine-Tuning (SFT) trains a language model on curated examples of good behavior, resulting in a custom model that performs better on your specific use case.
TensorZero has fine-tuning integrations with OpenAI, GCP Vertex AI, Fireworks AI, and Together AI. See the Supervised Fine-Tuning Guide to learn more.
Additionally, we provide recipes for self-hosted fine-tuning with axolotl, torchtune, and unsloth.
Guide coming soon...
A direct preference optimization (DPO) — also known as preference fine-tuning — recipe fine-tunes an LLM on a dataset of preference pairs. You can use demonstration feedback collected with TensorZero to curate a dataset of preference pairs and fine-tune an LLM on it.
We provide a recipe for DPO (Preference Fine-tuning) with OpenAI.
GEPA is an automated prompt optimization method that evolves prompts through iterative evaluation, analysis, and mutation. It uses LLMs to analyze inference results and propose prompt improvements, then filters variants using Pareto frontier selection to balance multiple objectives.
See the GEPA Guide to learn more.
Best-of-N Sampling generates multiple candidate responses from a single model and selects the best one based on a scoring function or verifier. Mixture-of-N Sampling extends this by generating candidates across multiple models or variants, combining diversity with quality selection. Both techniques improve output quality at inference time without requiring additional training or fine-tuning.
See Inference-Time Optimizations to learn more.
Dynamic In-Context Learning (DICL) is an inference-time optimization that improves LLM performance by incorporating relevant historical examples into your prompt. Instead of incorporating static examples manually in your prompts, DICL selects the most relevant examples at inference time.
See the Dynamic In-Context Learning (DICL) Guide to learn more.
You can also create your own recipes. Put simply, a recipe takes inference and feedback data collected by TensorZero and generates a new set of variants for your functions. You should should be able to use virtually any LLM engineering workflow with TensorZero, ranging from automated prompt engineering to advanced RLHF workflows.
For example, see our recipes for self-hosted supervised fine-tuning (SFT) with axolotl, torchtune, and unsloth.
We are working on a series of complete runnable examples illustrating TensorZero's data & learning flywheel.