docs/optimization/supervised-fine-tuning-sft.mdx
Supervised Fine-Tuning (SFT) trains a language model on curated examples of good behavior, resulting in a custom model that performs better on your specific use case. TensorZero simplifies the SFT workflow by helping you curate training data from your historical inferences and feedback, then launching fine-tuning jobs on your preferred provider.
Here's how it works:
Supervised fine-tuning is particularly useful when you have substantial high-quality data and want to improve model behavior beyond what prompting alone can achieve.
| Criterion | Impact | Details |
|---|---|---|
| Complexity | Low | Requires data curation; few parameters |
| Data Efficiency | Moderate | Requires hundreds to thousands of high-quality examples |
| Optimization Ceiling | High | Can significantly improve model behavior beyond prompting |
| Optimization Cost | Moderate | More expensive than DICL, but relatively cost effective |
| Inference Cost | Low | Fine-tuned models typically cost the same as the base model |
| Inference Latency | Low | No runtime overhead |
SFT tends to work best when:
You can find a complete runnable example of this guide on GitHub.
</Tip> <Steps> <Step title="Configure your LLM application">Define a function with a baseline variant for your application.
[functions.extract_entities]
type = "json"
output_schema = "functions/extract_entities/output_schema.json"
[functions.extract_entities.variants.baseline]
type = "chat_completion"
model = "openai::gpt-4.1-mini"
templates.system.path = "functions/extract_entities/initial_prompt/system_template.minijinja"
json_mode = "strict"
You are an assistant that is performing a named entity recognition task.
Your job is to extract entities from a given text.
The entities you are extracting are:
- people
- organizations
- locations
- miscellaneous other entities
Please return the entities in the following JSON format:
{
"person": ["person1", "person2", ...],
"organization": ["organization1", "organization2", ...],
"location": ["location1", "location2", ...],
"miscellaneous": ["miscellaneous1", "miscellaneous2", ...]
}
After deploying the TensorZero Gateway with Postgres, build a dataset of good examples for the extract_entities function you configured.
You can create datapoints from historical inferences or external/synthetic datasets.
SFT performance depends heavily on data quality. There is a trade-off between dataset size and quality of datapoints.
</Tip> </Step> <Step title="Configure SFT optimization">Configure SFT by specifying the base model to fine-tune and any hyperparameters.
<Tabs> <Tab title="OpenAI">from tensorzero import OpenAISFTConfig
optimization_config = OpenAISFTConfig(
model="gpt-4.1-2025-04-14",
)
OpenAI uses credentials from the OPENAI_API_KEY environment variable by default.
from tensorzero import GCPVertexGeminiSFTConfig
optimization_config = GCPVertexGeminiSFTConfig(
model="gemini-2.5-flash",
)
GCP Vertex AI requires project and storage configuration in tensorzero.toml:
[provider_types.gcp_vertex_gemini.sft]
project_id = "your-gcp-project-id"
region = "us-central1"
bucket_name = "your-training-data-bucket"
from tensorzero import FireworksSFTConfig
optimization_config = FireworksSFTConfig(
model="accounts/fireworks/models/glm-4p7",
epochs=3, # optional
lora_rank=16, # optional
)
Fireworks requires your account ID in tensorzero.toml:
[provider_types.fireworks.sft]
account_id = "your-fireworks-account-id"
from tensorzero import TogetherSFTConfig
optimization_config = TogetherSFTConfig(
model="deepseek-ai/DeepSeek-R1-Distill-Qwen-14B",
n_epochs=3, # optional
)
Together uses credentials from the TOGETHER_API_KEY environment variable by default.
Optional Weights & Biases integration can be configured in tensorzero.toml:
[provider_types.together.sft]
wandb_api_key = "your-wandb-api-key" # optional
wandb_project_name = "my-project" # optional
Launch the SFT job using the TensorZero Gateway:
job_handle = t0.experimental_launch_optimization_workflow(
function_name="extract_entities",
template_variant_name="baseline",
dataset_name="extract_entities_dataset",
optimizer_config=optimization_config,
val_fraction=0.2,
)
print("Job launched!")
SFT jobs run asynchronously on the provider's infrastructure. Poll for completion:
import asyncio
from tensorzero import OptimizationJobStatus
job_info = t0.experimental_poll_optimization(job_handle=job_handle)
# For long-running jobs, poll periodically:
while job_info.status == OptimizationJobStatus.Pending:
print(f"Job status: {job_info.status}")
await asyncio.sleep(60) # wait 1 minute between polls
job_info = t0.experimental_poll_optimization(job_handle=job_handle)
if job_info.status == OptimizationJobStatus.Completed:
print("Fine-tuning complete!")
else:
print(f"Job failed: {job_info.message}")
Fine-tuning typically takes 10-30 minutes for small datasets, but can take hours for large datasets. You can close your script and poll later using the job handle.
</Tip> </Step> <Step title="Update your configuration with the fine-tuned model">After optimization completes, extract the fine-tuned model name and update your configuration:
fine_tuned_model = job_info.output["routing"][0]
print(f"Fine-tuned model: {fine_tuned_model}")
Add the fine-tuned model and a new variant to your tensorzero.toml:
[models.extract_entities_fine_tuned]
routing = ["openai"]
[models.extract_entities_fine_tuned.providers.openai]
type = "openai"
model_name = "ft:gpt-4.1-2025-04-14:org::xxxxx" # from above
[functions.extract_entities.variants.fine_tuned]
type = "chat_completion"
model = "extract_entities_fine_tuned"
templates.system.path = "functions/extract_entities/initial_prompt/system_template.minijinja"
json_mode = "strict"
For most model providers, you can also use the shorthand syntax in your variant configuration:
model = "openai::ft:gpt-4.1-2025-04-14:org::xxxxx"
This avoids needing to define a separate [models.*] section.
That's it! Your fine-tuned model is now ready to use.
</Step> </Steps> <Tip>You can run experiments comparing your baseline and fine-tuned variants using adaptive A/B testing.
</Tip>OpenAISFTConfigConfigure OpenAI supervised fine-tuning by creating an OpenAISFTConfig object with the following parameters:
GCPVertexGeminiSFTConfigConfigure GCP Vertex AI Gemini supervised fine-tuning by creating a GCPVertexGeminiSFTConfig object with the following parameters:
FireworksSFTConfigConfigure Fireworks supervised fine-tuning by creating a FireworksSFTConfig object with the following parameters:
TogetherSFTConfigConfigure Together supervised fine-tuning by creating a TogetherSFTConfig object with the following parameters: