Supervised Fine-Tuning (SFT) - Tensorzero

Supervised Fine-Tuning (SFT) trains a language model on curated examples of good behavior, resulting in a custom model that performs better on your specific use case. TensorZero simplifies the SFT workflow by helping you curate training data from your historical inferences and feedback, then launching fine-tuning jobs on your preferred provider.

Here's how it works:

You collect examples of good LLM behavior (demonstrations or inferences with good metrics).
TensorZero renders these examples using your prompt templates into a training dataset.
TensorZero uploads the dataset and launches a fine-tuning job on your provider (OpenAI, GCP Vertex AI, Fireworks, or Together).
The provider trains a custom model and returns a model identifier.
You update your configuration to use the fine-tuned model.

When should you use supervised fine-tuning (SFT)?

Supervised fine-tuning is particularly useful when you have substantial high-quality data and want to improve model behavior beyond what prompting alone can achieve.

Criterion	Impact	Details
Complexity	Low	Requires data curation; few parameters
Data Efficiency	Moderate	Requires hundreds to thousands of high-quality examples
Optimization Ceiling	High	Can significantly improve model behavior beyond prompting
Optimization Cost	Moderate	More expensive than DICL, but relatively cost effective
Inference Cost	Low	Fine-tuned models typically cost the same as the base model
Inference Latency	Low	No runtime overhead

<Tip>

SFT tends to work best when:

You have hundreds to thousands of high-quality examples.
- If less: consider Dynamic In-Context Learning (DICL) first.
- If much more: SFT can continue to improve with more data.
Inference cost and latency are important. Unlike DICL, SFT shifts the cost to a one-time optimization workflow.
- If inference cost matters: SFT is often more economical than DICL at scale.
You want to improve model behavior beyond what prompting can achieve.
- If prompts are sufficient: consider GEPA for automated prompt engineering.

</Tip>

Fine-tune your LLM with Supervised Fine-Tuning

<Tip>

You can find a complete runnable example of this guide on GitHub.

</Tip> <Steps> <Step title="Configure your LLM application">

Define a function with a baseline variant for your application.

toml

[functions.extract_entities]
type = "json"
output_schema = "functions/extract_entities/output_schema.json"

[functions.extract_entities.variants.baseline]
type = "chat_completion"
model = "openai::gpt-4.1-mini"
templates.system.path = "functions/extract_entities/initial_prompt/system_template.minijinja"
json_mode = "strict"

text

You are an assistant that is performing a named entity recognition task.
Your job is to extract entities from a given text.

The entities you are extracting are:

- people
- organizations
- locations
- miscellaneous other entities

Please return the entities in the following JSON format:

{
"person": ["person1", "person2", ...],
"organization": ["organization1", "organization2", ...],
"location": ["location1", "location2", ...],
"miscellaneous": ["miscellaneous1", "miscellaneous2", ...]
}

</Accordion> </Step> <Step title="Collect your optimization data">

After deploying the TensorZero Gateway with Postgres, build a dataset of good examples for the extract_entities function you configured. You can create datapoints from historical inferences or external/synthetic datasets.

<Tip>

SFT performance depends heavily on data quality. There is a trade-off between dataset size and quality of datapoints.

</Tip> </Step> <Step title="Configure SFT optimization">

Configure SFT by specifying the base model to fine-tune and any hyperparameters.

python

from tensorzero import OpenAISFTConfig

optimization_config = OpenAISFTConfig(
    model="gpt-4.1-2025-04-14",
)

OpenAI uses credentials from the OPENAI_API_KEY environment variable by default.

</Tab> <Tab title="GCP Vertex AI Gemini">

python

from tensorzero import GCPVertexGeminiSFTConfig

optimization_config = GCPVertexGeminiSFTConfig(
    model="gemini-2.5-flash",
)

GCP Vertex AI requires project and storage configuration in tensorzero.toml:

toml

[provider_types.gcp_vertex_gemini.sft]
project_id = "your-gcp-project-id"
region = "us-central1"
bucket_name = "your-training-data-bucket"

</Tab> <Tab title="Fireworks">

python

from tensorzero import FireworksSFTConfig

optimization_config = FireworksSFTConfig(
    model="accounts/fireworks/models/glm-4p7",
    epochs=3,  # optional
    lora_rank=16,  # optional
)

Fireworks requires your account ID in tensorzero.toml:

toml

[provider_types.fireworks.sft]
account_id = "your-fireworks-account-id"

</Tab> <Tab title="Together">

python

from tensorzero import TogetherSFTConfig

optimization_config = TogetherSFTConfig(
    model="deepseek-ai/DeepSeek-R1-Distill-Qwen-14B",
    n_epochs=3,  # optional
)

Together uses credentials from the TOGETHER_API_KEY environment variable by default. Optional Weights & Biases integration can be configured in tensorzero.toml:

toml

[provider_types.together.sft]
wandb_api_key = "your-wandb-api-key"  # optional
wandb_project_name = "my-project"     # optional

</Tab> </Tabs> </Step> <Step title="Launch the SFT job">

Launch the SFT job using the TensorZero Gateway:

python

job_handle = t0.experimental_launch_optimization_workflow(
    function_name="extract_entities",
    template_variant_name="baseline",
    dataset_name="extract_entities_dataset",
    optimizer_config=optimization_config,
    val_fraction=0.2,
)

print("Job launched!")

</Step> <Step title="Poll for completion">

SFT jobs run asynchronously on the provider's infrastructure. Poll for completion:

python

import asyncio
from tensorzero import OptimizationJobStatus

job_info = t0.experimental_poll_optimization(job_handle=job_handle)

# For long-running jobs, poll periodically:
while job_info.status == OptimizationJobStatus.Pending:
    print(f"Job status: {job_info.status}")
    await asyncio.sleep(60)  # wait 1 minute between polls
    job_info = t0.experimental_poll_optimization(job_handle=job_handle)

if job_info.status == OptimizationJobStatus.Completed:
    print("Fine-tuning complete!")
else:
    print(f"Job failed: {job_info.message}")

<Tip>

Fine-tuning typically takes 10-30 minutes for small datasets, but can take hours for large datasets. You can close your script and poll later using the job handle.

</Tip> </Step> <Step title="Update your configuration with the fine-tuned model">

After optimization completes, extract the fine-tuned model name and update your configuration:

python

fine_tuned_model = job_info.output["routing"][0]
print(f"Fine-tuned model: {fine_tuned_model}")

Add the fine-tuned model and a new variant to your tensorzero.toml:

toml

[models.extract_entities_fine_tuned]
routing = ["openai"]

[models.extract_entities_fine_tuned.providers.openai]
type = "openai"
model_name = "ft:gpt-4.1-2025-04-14:org::xxxxx"  # from above

[functions.extract_entities.variants.fine_tuned]
type = "chat_completion"
model = "extract_entities_fine_tuned"
templates.system.path = "functions/extract_entities/initial_prompt/system_template.minijinja"
json_mode = "strict"

<Tip>

For most model providers, you can also use the shorthand syntax in your variant configuration:

toml

model = "openai::ft:gpt-4.1-2025-04-14:org::xxxxx"

This avoids needing to define a separate [models.*] section.

</Tip>

That's it! Your fine-tuned model is now ready to use.

</Step> </Steps> <Tip>

You can run experiments comparing your baseline and fine-tuned variants using adaptive A/B testing.

</Tip>

Provider Configuration Reference

`OpenAISFTConfig`

Configure OpenAI supervised fine-tuning by creating an OpenAISFTConfig object with the following parameters:

<ParamField body="model" type="str" required> The base model to fine-tune. See [OpenAI's supported models](https://platform.openai.com/docs/guides/fine-tuning) for available options. </ParamField> <ParamField body="batch_size" type="int"> Batch size for training. If not specified, OpenAI chooses automatically. </ParamField> <ParamField body="learning_rate_multiplier" type="float"> Learning rate multiplier. Values between 0.5 and 2.0 are typical. </ParamField> <ParamField body="n_epochs" type="int"> Number of training epochs. If not specified, OpenAI chooses automatically based on dataset size. </ParamField> <ParamField body="seed" type="int"> Random seed for reproducibility. </ParamField> <ParamField body="suffix" type="str"> Suffix to add to the fine-tuned model name for identification. </ParamField>

`GCPVertexGeminiSFTConfig`

Configure GCP Vertex AI Gemini supervised fine-tuning by creating a GCPVertexGeminiSFTConfig object with the following parameters:

<ParamField body="model" type="str" required> The base model to fine-tune. See [Vertex AI's supported models](https://cloud.google.com/vertex-ai/generative-ai/docs/models/tune-gemini-overview) for available options. </ParamField> <ParamField body="adapter_size" type="int"> Adapter size for parameter-efficient tuning. </ParamField> <ParamField body="export_last_checkpoint_only" type="bool"> Whether to export only the final checkpoint instead of all checkpoints. </ParamField> <ParamField body="learning_rate_multiplier" type="float"> Learning rate multiplier for training. </ParamField> <ParamField body="n_epochs" type="int"> Number of training epochs. </ParamField> <ParamField body="seed" type="int"> Random seed for reproducibility. </ParamField> <ParamField body="tuned_model_display_name" type="str"> Display name for the tuned model in the Vertex AI console. </ParamField>

`FireworksSFTConfig`

Configure Fireworks supervised fine-tuning by creating a FireworksSFTConfig object with the following parameters:

<ParamField body="model" type="str" required> The base model to fine-tune. See [Fireworks' supported models](https://docs.fireworks.ai/fine-tuning/fine-tuning-models) for available options. </ParamField> <ParamField body="batch_size" type="int"> Batch size in tokens for training. </ParamField> <ParamField body="deploy_after_training" type="bool" default={false}> Whether to automatically deploy the model after training completes. </ParamField> <ParamField body="display_name" type="str"> Display name for the fine-tuning job. </ParamField> <ParamField body="early_stop" type="bool"> Whether to enable early stopping based on validation loss. </ParamField> <ParamField body="epochs" type="int"> Number of training epochs. </ParamField> <ParamField body="eval_auto_carveout" type="bool"> Whether to automatically carve out a portion of training data for evaluation. </ParamField> <ParamField body="is_turbo" type="bool"> Whether to enable turbo mode for faster training. </ParamField> <ParamField body="learning_rate" type="float"> Learning rate for training. </ParamField> <ParamField body="lora_rank" type="int"> LoRA rank for parameter-efficient fine-tuning. </ParamField> <ParamField body="max_context_length" type="int"> Maximum context length for training examples. </ParamField> <ParamField body="mtp_enabled" type="bool"> Whether to enable Multi-Token Prediction. </ParamField> <ParamField body="mtp_freeze_base_model" type="bool"> Whether to freeze the base model when using MTP. </ParamField> <ParamField body="mtp_num_draft_tokens" type="int"> Number of draft tokens for Multi-Token Prediction. </ParamField> <ParamField body="nodes" type="int"> Number of nodes for distributed training. </ParamField> <ParamField body="output_model" type="str"> Custom model ID for the fine-tuned model. Defaults to the job ID. </ParamField> <ParamField body="warm_start_from" type="str"> PEFT addon model to start from. Mutually exclusive with `model`. </ParamField>

`TogetherSFTConfig`

Configure Together supervised fine-tuning by creating a TogetherSFTConfig object with the following parameters:

<ParamField body="model" type="str" required> The base model to fine-tune. See [Together's supported models](https://docs.together.ai/reference/post-fine-tunes) for available options. </ParamField> <ParamField body="batch_size" type="int | str" default="max"> Batch size for training. Can be an integer or `"max"` for automatic optimization. </ParamField> <ParamField body="from_checkpoint" type="str"> Job ID of a previous fine-tuning job to continue from. </ParamField> <ParamField body="from_hf_model" type="str"> Hugging Face model to start from instead of a Together model. </ParamField> <ParamField body="hf_model_revision" type="str"> Hugging Face model revision/commit to use. </ParamField> <ParamField body="hf_output_repo_name" type="str"> Hugging Face repository name for uploading the fine-tuned model. </ParamField> <ParamField body="learning_rate" type="float" default={0.00001}> Learning rate for training. </ParamField> <ParamField body="lr_scheduler" type="dict"> Learning rate scheduler configuration. Supports `"linear"` and `"cosine"` types. </ParamField> <ParamField body="max_grad_norm" type="float" default={1.0}> Maximum gradient norm for gradient clipping. Set to 0 to disable. </ParamField> <ParamField body="n_checkpoints" type="int" default={1}> Number of intermediate checkpoints to save during training. </ParamField> <ParamField body="n_epochs" type="int" default={1}> Number of training epochs. </ParamField> <ParamField body="n_evals" type="int"> Number of evaluations to run on the validation set during training. </ParamField> <ParamField body="suffix" type="str"> Suffix for the fine-tuned model name. </ParamField> <ParamField body="training_method" type="dict"> Training method configuration. Supports SFT with options like `train_on_inputs`. </ParamField> <ParamField body="training_type" type="dict"> Training type configuration. Supports `"full"` and `"lora"` with parameters like `lora_r`, `lora_alpha`, `lora_dropout`. </ParamField> <ParamField body="wandb_name" type="str"> Weights & Biases run name for experiment tracking. </ParamField> <ParamField body="warmup_ratio" type="float" default={0.0}> Warmup ratio as a percentage of total training steps. </ParamField> <ParamField body="weight_decay" type="float" default={0.0}> Weight decay regularization parameter. </ParamField>