examples/evaluations/tutorial/README.md
This directory contains the code for the TensorZero Evaluations Guide.
We provide a configuration file (./config/tensorzero.toml) that specifies:
write_haiku function that generates a haiku, with gpt_4o and gpt_4o_mini variants.write_haiku function, including exact match and assorted LLM judges.uv: uv syncOPENAI_API_KEY)..env file with the OPENAI_API_KEY environment variable (see .env.example for an example).docker compose up to launch the TensorZero Gateway, the TensorZero UI, and a development ClickHouse database.main.py script to generate 100 haikus.Let's generate a dataset composed of our 100 haikus.
http://localhost:4000/datasets/builder).haiku_dataset.
Select your write_haiku function, "None" as the metric, and "Inference" as the dataset output.Let's evaluate our gpt_4o variant using the TensorZero Evaluations CLI tool.
docker compose run --rm evaluations \
--function-name write_haiku \
--evaluator-names valid_haiku,metaphor_count,exact_match,compare_haikus \
--dataset-name haiku_dataset \
--variant-name gpt_4o \
--concurrency 5
Let's evaluate our gpt_4o_mini variant using the TensorZero Evaluations UI, and compare the results.
http://localhost:4000/evaluations) and select "New Run".gpt_4o_mini variant.