apps/opik-documentation/documentation/fern/docs/changelog/2026-05-12.mdx
Here are the most relevant improvements we've made since the last release:
You can now tag traces, spans, and threads with an environment field โ production, staging, dev, or any label you define. This makes it easy to separate signal from noise: filter your project's trace view to only production issues, or compare behavior between environments without spinning up separate projects.
What's new:
production vs staging in a single projectenvironment to @track, opik.trace(), or opik.span() โ and it's preserved through .end() and .update() callsenvironment on trace and span creationimport opik
@opik.track(environment="production")
def my_agent(input: str) -> str:
...
๐ Environments Documentation
Test suite assertions can now look inside a trace โ not just the top-level input/output โ to reason about tool calls, intermediate LLM steps, and sub-agent behavior. The evaluator LLM gets access to two on-demand tools: get_trace_spans (lists all sub-spans for the trace) and read (fetches a specific span by ID) โ so it can drill into exactly what happened at each step.
Why it matters: Previously, an assertion could only see what went in and came out of the agent. Now it can check whether the right tool was called, which model was used in an intermediate step, or whether a specific span had an error โ enabling far more meaningful correctness checks for complex agents.
Traces and spans tables no longer download attachment bytes (images, PDFs) when loading a list โ attachments are lazy-loaded only when you open an individual trace. In our benchmarks with image and PDF attachments, this reduced the per-page payload from 85 MB โ 0.13 MB and load time from 3.4 s to 0.1 s.
Why it matters: If any of your traces include file attachments, the table was silently fetching all that binary data on every page load. The experience is now fast regardless of attachment size or count.
The Playground's reasoning_effort control now tracks OpenAI's actual per-model capability matrix. Models like gpt-5.1 that support a "none" option show it; models that don't support reasoning effort have the control hidden automatically. Previously, the UI could get out of sync with what the backend supported.
Several reliability fixes and small improvements to the Python (and TypeScript) SDKs:
search_traces() and search_spans() now automatically wait and retry on 429 responses instead of raising an error, so large bulk searches complete reliably under API rate limitsAnd much more! ๐ See full commit log on GitHub
Releases: 2.0.25, 2.0.26, 2.0.27, 2.0.28, 2.0.29, 2.0.30, 2.0.31