apps/opik-documentation/documentation/fern/docs/changelog/2025-11-04.mdx
Here are the most relevant improvements we've made since the last release:
We now offer native Slack and PagerDuty alert integrations, eliminating the need for any middleware configuration. Set up alerts directly in Opik to receive notifications when important events happen in your workspace.
With native integrations, you can:
๐ Read the full docs here - Alerts Guide
LLM as a Judge metrics can now evaluate traces that contain images when using vision-capable models. This is useful for:
To reference image data from traces in your evaluation prompts:
๐ Read more: Evaluating traces with images
We've launched the Prompt Generator and Prompt Improver โ two AI-powered tools that help you create and refine prompts faster, directly inside the Playground.
Designed for non-technical users, these features automatically apply best practices from OpenAI, Anthropic, and Google, helping you craft clear, effective, and production-grade prompts without leaving the Playground.
Prompt engineering is still one of the biggest bottlenecks in LLM development. With these tools, teams can:
๐ Read the full docs: Prompt Generator & Improver
We've implemented prompt integration into spans and traces, creating a seamless connection between your Prompt Library, Traces, and the Playground.
You can now associate prompts directly with traces and spans using the opik_context module โ so every execution is automatically tied to the exact prompt version used.
Understanding which prompt produced a given trace is key for users building both simple and advanced multi-prompt and multi-agent systems.
With this integration, you can:
Once added, your prompts appear in the trace details view โ with links back to the Prompt Library and the Playground, so you can iterate in one click.
<Frame> </Frame>๐ Read more: Adding prompts to traces and spans
We've introduced a series of improvements directly in the Playground to make experimentation easier and more powerful:
Key enhancements:
We've added on-demand online evaluation in Opik, letting users run metrics on already logged traces and threads โ perfect for evaluating historical data or backfilling new scores.
Select traces/threads, choose any online score rule (e.g., Moderation, Equals, Contains), and run evaluations directly from the UI โ no code needed.
Results appear inline as feedback scores and are fully logged for traceability.
This enables:
๐ Read more: Manual Evaluation
We've added two new comprehensive guides on evaluating agents:
This guide helps you evaluate that your agent is making the right tool calls before returning the final answer. It's fundamentally about evaluating and scoring what is happening within a trace.
๐ Read the full guide: Evaluating Agent Trajectories
Evaluating chatbots is tough because you need to evaluate not just a single LLM response but instead a conversation. This guide walks you through how you can use the new opik.simulation.SimulatedUser method to create simulated threads for your agent.
๐ Read the full guide: Evaluating Multi-Turn Agents
These new docs significantly strengthen our agent evaluation feature-set and include diagrams to visualize how each evaluation strategy works.
Added new command-line functions for importing and exporting Opik data: you can now export all traces, spans, datasets, prompts, and evaluation rules from a project to local JSON or CSV files. Also helps you import data from local JSON files into an existing project.
Read the full docs: Import/Export Commands
And much more! ๐ See full commit log on GitHub
Releases: 1.8.83, 1.8.84, 1.8.85, 1.8.86, 1.8.87, 1.8.88, 1.8.89, 1.8.90, 1.8.91, 1.8.92, 1.8.93, 1.8.94, 1.8.95, 1.8.96, 1.8.97