Back to Opik

Export by SDK, REST, and UI

apps/opik-documentation/documentation/fern/docs/tracing/export_data.mdx

2.0.24-526215.2 KB
Original Source

When working with Opik, it is important to be able to export traces, spans, and threads so that you can use them to fine-tune your models or run deeper analysis.

You can export the data you have logged to the Opik platform using:

  1. Using the Opik SDK: You can use the Python SDK methods (Opik.search_traces, Opik.search_spans, and Opik.search_threads) or the TypeScript SDK method (client.searchTraces()) to export traces, spans, and threads.
  2. Using the Opik REST API: You can use the /traces and /spans endpoints to export traces and spans.
  3. Using the UI: Once you have selected the traces or spans you want to export, you can click on the Export CSV button in the Actions dropdown.
<Tip> The recommended way to export data is to use the SDK methods in the Opik Python SDK ([`Opik.search_traces`](https://www.comet.com/docs/opik/python-sdk-reference/Opik.html#opik.Opik.search_traces), [`Opik.search_spans`](https://www.comet.com/docs/opik/python-sdk-reference/Opik.html#opik.Opik.search_spans), and [`Opik.search_threads`](https://www.comet.com/docs/opik/python-sdk-reference/Opik.html#opik.Opik.search_threads)) or TypeScript SDK (`client.searchTraces()`). </Tip>

Using the Opik SDK

Exporting traces

The Python SDK Opik.search_traces method and TypeScript SDK client.searchTraces() method allow you to both export all the traces in a project or search for specific traces and export them.

Exporting all traces

To export all traces, you will need to specify a max_results / maxResults value that is higher than the total number of traces in your project:

<Tabs> <Tab value="Python" title="Python"> ```python import opik
client = opik.Opik()

traces = client.search_traces(project_name="Default project", max_results=1000000)
```
</Tab> <Tab value="TypeScript" title="TypeScript"> ```typescript import { Opik } from "opik";
const client = new Opik();

const traces = await client.searchTraces({
  projectName: "Default project",
  maxResults: 1000000
});
```
</Tab> </Tabs>

Search for specific traces

You can use the filter_string (Python) / filterString (TypeScript) parameter to search for specific traces:

<Tabs> <Tab value="Python" title="Python"> ```python import opik
client = opik.Opik()

traces = client.search_traces(
  project_name="Default project",
  filter_string='input contains "Opik"'
)

# Convert to Dict if required
traces = [trace.dict() for trace in traces]
```
</Tab> <Tab value="TypeScript" title="TypeScript"> ```typescript import { Opik } from "opik";
const client = new Opik();

const traces = await client.searchTraces({
  projectName: "Default project",
  filterString: 'input contains "Opik"'
});
```
</Tab> </Tabs>

Filtering with Opik Query Language (OQL)

All search methods (search_traces, search_spans, and search_threads) accept a filter_string (Python) / filterString (TypeScript) parameter that uses Opik Query Language (OQL):

"<COLUMN> <OPERATOR> <VALUE> [AND <COLUMN> <OPERATOR> <VALUE>]*"

Rules:

  • String values must be wrapped in double quotes
  • Multiple conditions can be combined with AND (OR is not supported)
  • DateTime fields require ISO 8601 format (e.g., "2024-01-01T00:00:00Z")
  • Use dot notation for nested fields: metadata.model, feedback_scores.accuracy

Each entity type supports a different set of filter columns. The tables below list the available columns for each.

Trace columns

ColumnTypeOperators
idString=, !=, contains, not_contains, starts_with, ends_with, >, <
nameString=, !=, contains, not_contains, starts_with, ends_with, >, <
input, outputString=, !=, contains, not_contains, starts_with, ends_with, >, <
thread_idString=, !=, contains, not_contains, starts_with, ends_with, >, <
guardrailsString=, !=, contains, not_contains, starts_with, ends_with, >, <
experiment_idString=, !=, contains, not_contains, starts_with, ends_with, >, <
start_time, end_timeDateTime=, !=, >, >=, <, <=
created_at, last_updated_atDateTime=, !=, >, >=, <, <=
metadataDictionary=, !=, contains, not_contains, starts_with, ends_with, >, >=, <, <=
input_json, output_jsonDictionary=, !=, contains, not_contains, starts_with, ends_with, >, >=, <, <=
feedback_scoresNumeric=, !=, >, >=, <, <=, is_empty, is_not_empty
span_feedback_scoresNumeric=, !=, >, >=, <, <=, is_empty, is_not_empty
tagsList=, !=, contains, not_contains, is_empty, is_not_empty
annotation_queue_idsList=, !=, contains, not_contains, is_empty, is_not_empty
usage.total_tokens, usage.prompt_tokens, usage.completion_tokensNumeric=, !=, >, >=, <, <=
duration, total_estimated_cost, llm_span_countNumeric=, !=, >, >=, <, <=
error_infoContaineris_empty, is_not_empty

Span columns

ColumnTypeOperators
idString=, !=, contains, not_contains, starts_with, ends_with, >, <
nameString=, !=, contains, not_contains, starts_with, ends_with, >, <
input, outputString=, !=, contains, not_contains, starts_with, ends_with, >, <
modelString=, !=, contains, not_contains, starts_with, ends_with, >, <
providerString=, !=, contains, not_contains, starts_with, ends_with, >, <
trace_idString=, !=, contains, not_contains, starts_with, ends_with, >, <
typeEnum=, !=
start_time, end_timeDateTime=, !=, >, >=, <, <=
metadataDictionary=, !=, contains, not_contains, starts_with, ends_with, >, >=, <, <=
input_json, output_jsonDictionary=, !=, contains, not_contains, starts_with, ends_with, >, >=, <, <=
feedback_scoresNumeric=, !=, >, >=, <, <=, is_empty, is_not_empty
tagsList=, !=, contains, not_contains, is_empty, is_not_empty
usage.total_tokens, usage.prompt_tokens, usage.completion_tokensNumeric=, !=, >, >=, <, <=
duration, total_estimated_costNumeric=, !=, >, >=, <, <=
error_infoContaineris_empty, is_not_empty

Thread columns

ColumnTypeOperators
idString=, !=, contains, not_contains, starts_with, ends_with, >, <
first_message, last_messageString=, !=, contains, not_contains, starts_with, ends_with, >, <
statusEnum=, !=
start_time, end_timeDateTime=, !=, >, >=, <, <=
created_at, last_updated_atDateTime=, !=, >, >=, <, <=
feedback_scoresNumeric=, !=, >, >=, <, <=, is_empty, is_not_empty
tagsList=, !=, contains, not_contains, is_empty, is_not_empty
annotation_queue_idsList=, !=, contains, not_contains, is_empty, is_not_empty
duration, number_of_messagesNumeric=, !=, >, >=, <, <=
<Tabs> <Tab value="Python" title="Python"> ```python import opik
client = opik.Opik(project_name="Default project")

# Trace filters
traces = client.search_traces(filter_string='input contains "Opik"')
traces = client.search_traces(filter_string='start_time >= "2024-01-01T00:00:00Z"')
traces = client.search_traces(filter_string='usage.total_tokens > 1000')
traces = client.search_traces(filter_string='metadata.model = "gpt-4o"')
traces = client.search_traces(filter_string='feedback_scores.user_rating is_not_empty')
traces = client.search_traces(filter_string='tags contains "production"')

# Thread filters
threads = client.search_threads(filter_string='number_of_messages >= 5')
threads = client.search_threads(filter_string='first_message contains "hello"')
threads = client.search_threads(filter_string='status = "active"')
```
</Tab> <Tab value="TypeScript" title="TypeScript"> ```typescript import { Opik } from "opik";
const client = new Opik({ projectName: "Default project" });

// Trace filters
const t1 = await client.searchTraces({ filterString: 'input contains "Opik"' });
const t2 = await client.searchTraces({ filterString: 'start_time >= "2024-01-01T00:00:00Z"' });
const t3 = await client.searchTraces({ filterString: 'usage.total_tokens > 1000' });
const t4 = await client.searchTraces({ filterString: 'metadata.model = "gpt-4o"' });
const t5 = await client.searchTraces({ filterString: 'feedback_scores.user_rating is_not_empty' });
const t6 = await client.searchTraces({ filterString: 'tags contains "production"' });
```
</Tab> </Tabs> <Tip> If your `feedback_scores` key contains spaces, you will need to wrap it in double quotes:

'feedback_scores."My Score" > 0'

If the feedback_scores key contains both spaces and double quotes, you will need to escape the double quotes as "":

'feedback_scores."Score ""with"" Quotes" > 0'

or by using different quotes, surrounding in triple-quotes, like this:

'''feedback_scores.'Accuracy "Happy Index"' < 0.8'''

</Tip>

Exporting spans

You can export spans using the Opik.search_spans method. This method allows you to search for spans based on trace_id or based on a filter string.

Exporting spans based on trace_id

To export all the spans associated with a specific trace, you can use the trace_id parameter:

python
import opik

client = opik.Opik()

spans = client.search_spans(
  project_name="Default project",
  trace_id="067092dc-e639-73ff-8000-e1c40172450f"
)

Search for specific spans

You can use the filter_string parameter to search for specific spans:

python
import opik

client = opik.Opik()

spans = client.search_spans(
  project_name="Default project",
  filter_string='input contains "Opik"'
)

Exporting threads

You can export threads using the Opik.search_threads method. This method allows you to search for conversational threads in a project.

Exporting all threads

To export all threads, you will need to specify a max_results value that is higher than the total number of threads in your project:

python
import opik

client = opik.Opik()

threads = client.search_threads(project_name="Default project", max_results=1000000)

Search for specific threads

You can use the filter_string parameter to search for specific threads:

python
import opik

client = opik.Opik()

# Search for a specific thread by ID
threads = client.search_threads(
  project_name="Default project",
  filter_string='id = "thread_123"'
)

# Search for threads with many messages
threads = client.search_threads(
  project_name="Default project",
  filter_string='number_of_messages >= 5'
)

# Search for threads with a specific feedback score
threads = client.search_threads(
  project_name="Default project",
  filter_string='feedback_scores.user_satisfaction > 0.8'
)

# Search for threads by tag
threads = client.search_threads(
  project_name="Default project",
  filter_string='tags contains "important"'
)

Using the Opik REST API

To export traces using the Opik REST API, you can use the /traces endpoint and the /spans endpoint. These endpoints are paginated so you will need to make multiple requests to retrieve all the traces or spans you want.

To search for specific traces or spans, you can use the filter parameter. While this is a string parameter, it does not follow the same format as the filter_string parameter in the Opik SDK. Instead it is a list of json objects with the following format:

json
[
  {
    "field": "name",
    "type": "string",
    "operator": "=",
    "value": "Opik"
  }
]
<Warning> The `filter` parameter was designed to be used with the Opik UI and has therefore limited flexibility. If you need more flexibility, please raise an issue on [GitHub](https://github.com/comet-ml/opik/issues) so we can help. </Warning>

Using the UI

To export traces as a CSV file from the UI, you can simply select the traces or spans you wish to export and click on Export CSV in the Actions dropdown:

<Frame> </Frame> <Tip> The UI only allows you to export up to 100 traces or spans at a time as it is linked to the page size of the traces table. If you need to export more traces or spans, we recommend using the Opik SDK. </Tip>