apps/opik-documentation/documentation/fern/docs/changelog/2025-07-18.mdx
We now support thread-level LLMs-as-a-Judge metrics!
We've implemented Online evaluation for threads, enabling the evaluation of entire conversations between humans and agents.
This allows for scalable measurement of metrics such as user frustration, goal achievement, conversational turn quality, clarification request rates, alignment with user intent, and much more.
We've also implemented Python metrics support for threads, giving you full code control over metric definitions.
<Frame> </Frame>To improve visibility into trends and to help detect spikes in these metrics when the agent is running in production, weβve added Thread Feedback Scores and Thread Duration widgets to the Metrics dashboard. These additions make it easier to monitor changes over time in live environments.
<Frame> </Frame>Once youβve identified problematic sessions or traces, weβve made it easier to inspect and analyze them with the following improvements:
The Spans tab provides a clearer, more comprehensive view of agent activity to help you analyze tool and sub-agent usage across threads, uncover trends, and spot latency outliers more easily.
Whatβs New:
These improvements make it faster and easier to inspect agent behavior and performance at a glance.
<Frame> </Frame>Slow model response times can lead to frustrating user experiences and create hidden bottlenecks in production systems. However, identifying latency issues early (during experimentation) is often difficult without clear visibility into model performance.
To help address this, weβve added Duration as a key metric for monitoring model latency in the Experiments engine. You can now include Duration as a selectable column in both the Experiments and Experiment Details views. This makes it easier to identify slow-responding models or configurations early, so you can proactively address potential performance risks before they impact users.
<Frame> </Frame>When usage grows and data volumes increase, effective data management becomes crucial. We've added several capabilities to make team workflows easier:
We've added support for:
We've enhanced several integrations:
And much more! π See full commit log on GitHub
Releases: 1.8.0, 1.8.1, 1.8.2, 1.8.3, 1.8.4, 1.8.5, 1.8.6