API Reference: Workflow Evaluations - Tensorzero

Workflow Evaluations focus on evaluating complex workflows that might include multiple TensorZero inference calls, arbitrary application logic, and more.

You can initialize and run workflow evaluations using the TensorZero Gateway, either through the TensorZero client or the gateway's HTTP API. Unlike inference evaluations, workflow evaluations are not defined in the TensorZero configuration file.

See the Workflow Evaluations Tutorial for a step-by-step guide.

Endpoints & Methods

Starting a workflow evaluation run

Gateway Endpoint: POST /workflow_evaluation_run
Client Method: workflow_evaluation_run
Parameters:
- variants: an object (dictionary) mapping function names to variant names
- project_name (string, optional): the name of the project to associate the run with
- display_name (string, optional): the display (human-readable) name of the run
- tags (dictionary, optional): a dictionary of key-value pairs to tag the run's inferences with
Returns:
- run_id (UUID): the ID of the run

Starting an episode in a workflow evaluation run

Gateway Endpoint: POST /workflow_evaluation_run/{run_id}/episode
Client Method: workflow_evaluation_run_episode
Parameters:
- run_id (UUID): the ID of the run generated by the workflow_evaluation_run method
- task_name (string, optional): the name of the task to associate the episode with
- tags (dictionary, optional): a dictionary of key-value pairs to tag the episode's inferences with
Returns:
- episode_id (UUID): the ID of the episode

Making inference and feedback calls during a workflow evaluation run

After initializing a run and an episode, you can make inference and feedback API calls like you normally would. By providing the special episode_id parameter generated by the workflow_evaluation_run_episode method , the TensorZero Gateway will associate the inference and feedback with the evaluation run, handle variant pinning, and more.