Dynamic In-Context Learning (DICL) - Tensorzero

Dynamic In-Context Learning (DICL) is an inference-time optimization that improves LLM performance by incorporating relevant historical examples into your prompt. Instead of incorporating static examples manually in your prompts, DICL selects the most relevant examples at inference time.

Here's how it works:

Before inference: You curate examples of good LLM behavior. TensorZero embeds them using an embedding model and stores them in your database.
TensorZero embeds inference inputs before sending them to the LLM and retrieves similar curated examples from your database.
TensorZero inserts these examples into your prompt and sends the request to the LLM.
The LLM generates a response using the enhanced prompt.

<Frame>

</Frame>

When should you use DICL?

DICL is particularly useful if you have limited high-quality data.

Criterion	Impact	Details
Complexity	Low	Requires data curation; few parameters
Data Efficiency	High	Achieves good results with limited data
Optimization Ceiling	Moderate	Plateaus quickly with more data; prompt only but dynamic
Optimization Cost	Low	Generates embeddings for curated examples
Inference Cost	High	Scales input tokens proportional to `k`
Inference Latency	Moderate	Requires embedding and retrieval before LLM call

<Tip>

DICL tends to work best when:

You have dozens to thousands of curated examples of good LLM behavior.
- If less: you should label a few dozen datapoints manually.
- If more: DICL still works well, but you should consider supervised fine-tuning instead.
The inference inputs are reasonably sized. Large inputs inflate the context and limit k (see below), degrading performance.
- If prompts have a lot of boilerplate: see configure prompt templates to mitigate impact.
- If still very large: consider supervised fine-tuning instead.
Inference cost (and to a lesser extent, latency) is not a bottleneck. Optimization is relatively cheap (generating embeddings), but DICL materially increases input tokens at inference time.
- If inference cost matters: consider supervised fine-tuning instead, which shifts the marginal cost to a one-time optimization workflow.

</Tip>

Optimize your LLM inferences with Dynamic In-Context Learning

<Tip>

You can find a complete runnable example of this guide on GitHub.

</Tip> <Steps> <Step title="Configure your LLM application">

Define a function with a baseline variant for your application.

toml

[functions.extract_entities]
type = "json"
output_schema = "functions/extract_entities/output_schema.json"

[functions.extract_entities.variants.baseline]
type = "chat_completion"
model = "openai::gpt-5-mini"
templates.system.path = "functions/extract_entities/initial_prompt/system_template.minijinja"
json_mode = "strict"

<Tip>

If your prompt has a lot of boilerplate, configure prompt templates. DICL operates on template variables, so it'll improve retrieval (and therefore inference quality) and mitigate the marginal cost and latency. Set system_instructions in your variant configuration with the boilerplate instead.

</Tip> <Accordion title="Example: Data Extraction (Named Entity Recognition) — Configuration">

text

You are an assistant that is performing a named entity recognition task.
Your job is to extract entities from a given text.

The entities you are extracting are:

- people
- organizations
- locations
- miscellaneous other entities

Please return the entities in the following JSON format:

{
"person": ["person1", "person2", ...],
"organization": ["organization1", "organization2", ...],
"location": ["location1", "location2", ...],
"miscellaneous": ["miscellaneous1", "miscellaneous2", ...]
}

</Accordion> </Step> <Step title="Collect your optimization data">

After deploying the TensorZero Gateway with Postgres, build a dataset of good examples for the extract_entities function you configured. You can create datapoints from historical inferences or external/synthetic datasets.

<Tip>

The performance of DICL degrades as the curated examples become noisier with bad behavior. There is a trade-off between dataset size and quality of datapoints.

</Tip> </Step> <Step title="Configure DICL">

Configure DICL by specifying the name of your function, variant, and embedding model.

python

from tensorzero import DICLOptimizationConfig

optimization_config = DICLOptimizationConfig(
    function_name="extract_entities",
    variant_name="dicl",
    embedding_model="openai::text-embedding-3-small",
    k=10,  # how many examples are retrieved and injected as context
    model="openai::gpt-5-mini",  # LLM that will generate outputs using the retrieved examples
)

You can also define a custom embedding model in your configuration.

<Tip>

You should experiment with different choices of k. Typical values are 3-10, with smaller values when inputs tend to be larger.

</Tip> <Tip>

If you see inferences with irrelevant examples, consider setting a max_distance in your variant configuration later. With this setting, the retrieval step can return less than k examples if they don't meet a cosine distance threshold. Make sure to tune the value according to your embedding model.

</Tip> </Step> <Step title="Launch DICL">

You can now launch your DICL optimization job using the TensorZero Gateway:

python

job_handle = t0.experimental_launch_optimization_workflow(
    function_name="extract_entities",
    template_variant_name="baseline",
    dataset_name="extract_entities_dataset",
    optimizer_config=optimization_config,
)

job_info = t0.experimental_poll_optimization(
    job_handle=job_handle
)

DICL will embed all your training samples and store them in Postgres.

</Step> <Step title="Update your configuration">

After optimization completes, add the DICL variant to your configuration:

toml

[functions.extract_entities.variants.dicl]
type = "experimental_dynamic_in_context_learning"
embedding_model = "openai::text-embedding-3-small"
k = 10
model = "openai::gpt-5-mini"
json_mode = "strict"

The embedding_model in the configuration must match the embedding model you used during optimization.

</Warning>

That's it! At inference time, the DICL variant will retrieve the k most similar examples from your training data and include them as context for in-context learning.

</Step> </Steps> <Tip>

You can run experiments comparing your baseline and DICL variants using adaptive A/B testing.

</Tip>

`DICLOptimizationConfig`

Configure DICL optimization by creating a DICLOptimizationConfig object with the following parameters:

<ParamField body="embedding_model" type="str" required> Name of the embedding model to use. </ParamField> <ParamField body="function_name" type="str" required> Name of the TensorZero function to optimize. </ParamField> <ParamField body="model" type="str" required> Model to use for the DICL variant. </ParamField> <ParamField body="variant_name" type="str" required> Name to use for the DICL variant. </ParamField> <ParamField body="append_to_existing_variants" type="bool" default={false}> Whether to append to existing variants. If `false`, raises an error if the variant already exists. </ParamField> <ParamField body="batch_size" type="int" default={128}> Batch size for embedding generation. </ParamField> <ParamField body="dimensions" type="int"> Embedding dimensions. If not specified, uses the embedding model's default. </ParamField> <ParamField body="k" type="int" default={10}> Number of nearest neighbors to retrieve at inference time. </ParamField> <ParamField body="max_concurrency" type="int" default={10}> Maximum concurrent embedding requests. </ParamField>