apps/opik-documentation/documentation/fern/docs/production/alerts.mdx
Alerts allow you to configure automated webhook notifications for important events in your Opik workspace. When specific events occur — such as trace errors, new feedback scores, or prompt changes — Opik sends HTTP POST requests to your configured endpoint with detailed event data.
Opik provides three destination types for alerts:
Navigate to Alerts
Configure basic settings
Configure webhook settings
http:// or https://)
https://hooks.slack.com/services/...)https://events.pagerduty.com/v2/enqueue)Advanced webhook settings (optional)
X-Custom-Auth: Bearer your-token-hereAdd triggers
>, <) for feedback score alertsTest your configuration
Create the alert
Opik supports three main approaches for integrating alerts with external systems:
Opik provides native Slack integration that automatically formats alert messages for Slack's Block Kit format.
https://hooks.slack.com/services/T00000000/B00000000/XXXX)In Slack:
In Opik:
Opik will automatically format all alert payloads into Slack-compatible messages with rich formatting, including:
Opik provides native PagerDuty integration that automatically formats alert events for PagerDuty's Events API v2.
In PagerDuty:
In Opik:
https://events.pagerduty.com/v2/enqueueOpik will automatically format all alert payloads into PagerDuty-compatible events with:
For more complex integrations or custom formatting requirements, you can use a middleware service to transform Opik's payload before sending it to your destination. This approach works with any destination type (General, Slack, or PagerDuty).
import requests
def transform_to_slack(opik_payload):
event_type = opik_payload.get('eventType')
alert_name = opik_payload['payload']['alertName']
event_count = opik_payload['payload']['eventCount']
# Custom formatting logic
return {
"blocks": [
{
"type": "header",
"text": {
"type": "plain_text",
"text": f"🚨 {alert_name}"
}
},
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": f"*{event_count}* new `{event_type}` events"
}
},
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": f"View in Opik: https://www.comet.com/opik"
}
},
{
"type": "section",
"fields": [
{
"type": "mrkdwn",
"text": f"*Environment:*\nProduction"
},
{
"type": "mrkdwn",
"text": f"*Priority:*\nHigh"
}
]
}
]
}
@app.route('/opik-to-slack', methods=['POST'])
def opik_to_slack():
opik_data = request.json
slack_payload = transform_to_slack(opik_data)
# Forward to Slack
requests.post(
SLACK_WEBHOOK_URL,
json=slack_payload
)
return {'status': 'success'}, 200
No-code automation tools like n8n, Make.com, and IFTTT provide an easy way to connect Opik alerts to other services—without writing or deploying code. These platforms can receive webhooks from Opik, apply filters or conditions, and trigger actions such as sending Slack messages, logging data in Google Sheets, or creating incidents in PagerDuty.
<Frame> </Frame>To use them:
These tools also provide built-in monitoring, retries, and visual flow editors, making them suitable for both technical and non-technical users who want to automate Opik alert handling securely and efficiently. This approach works well when you need to route alerts to multiple destinations or apply complex business logic.
Build a custom monitoring dashboard that receives alerts using the General destination type:
from fastapi import FastAPI, Request
from datetime import datetime
app = FastAPI()
# In-memory storage (use a database in production)
alert_history = []
@app.post("/webhook")
async def receive_webhook(request: Request):
data = await request.json()
# Store alert
alert_history.append({
'timestamp': datetime.utcnow(),
'event_type': data.get('eventType'),
'alert_name': data['payload']['alertName'],
'event_count': data['payload']['eventCount'],
'data': data
})
# Keep only last 1000 alerts
if len(alert_history) > 1000:
alert_history.pop(0)
return {"status": "success"}
@app.get("/dashboard")
async def get_dashboard():
# Return aggregated statistics
return {
'total_alerts': len(alert_history),
'by_type': group_by_type(alert_history),
'recent_alerts': alert_history[-10:]
}
Opik supports ten types of alert events:
Trace errors threshold exceeded
trace:errorsTrace feedback score threshold exceeded
trace:feedback_score>, <), and time windowThread feedback score threshold exceeded
trace_thread:feedback_score>, <), and time windowGuardrails triggered
trace:guardrails_triggeredCost threshold exceeded
trace:costLatency threshold exceeded
trace:latencyNew prompt added
prompt:createdNew prompt version created
prompt:committedPrompt deleted
prompt:deletedExperiment finished
experiment:finishedIf you need additional event types for your use case, please create an issue on GitHub and let us know what you'd like to monitor.
All webhook events follow a consistent payload structure:
{
"id": "webhook-event-id",
"eventType": "trace:errors",
"alertId": "alert-uuid",
"alertName": "Production Errors Alert",
"workspaceId": "workspace-uuid",
"createdAt": "2025-01-15T10:30:00Z",
"payload": {
"alertId": "alert-uuid",
"alertName": "Production Errors Alert",
"eventType": "trace:errors",
"eventIds": ["event-id-1", "event-id-2"],
"userNames": ["[email protected]"],
"eventCount": 2,
"aggregationType": "consolidated",
"message": "Alert 'Production Errors Alert': 2 trace:errors events aggregated",
"metadata": [
{
"id": "trace-uuid",
"name": "handle_query",
"project_id": "project-uuid",
"project_name": "Demo Project",
"start_time": "2025-01-15T10:29:45Z",
"end_time": "2025-01-15T10:29:50Z",
"input": {
"query": "User question"
},
"output": {
"response": "LLM response"
},
"error_info": {
"exception_type": "ValidationException",
"message": "Validation failed",
"traceback": "Full traceback..."
},
"metadata": {
"customer_id": "customer_123"
},
"tags": ["production"]
}
]
}
}
| Field | Type | Description |
|---|---|---|
id | string | Unique webhook event identifier |
eventType | string | Type of event (e.g., trace:errors) |
alertId | string (UUID) | Alert configuration identifier |
alertName | string | Name of the alert |
workspaceId | string | Workspace identifier |
createdAt | string (ISO 8601) | Timestamp when webhook was created |
payload.eventIds | array | List of aggregated event IDs |
payload.userNames | array | Users associated with the events |
payload.eventCount | number | Number of aggregated events |
payload.aggregationType | string | Always "consolidated" |
payload.metadata | array | Event-specific data (varies by event type) |
{
"metadata": {
"event_type": "TRACE_ERRORS",
"metric_name": "trace:errors",
"metric_value": "15",
"threshold": "10",
"window_seconds": "900",
"project_ids": "0198ec68-6e06-7253-a20b-d35c9252b9ba,0198ec68-6e06-7253-a20b-d35c9252b9bb",
"project_names": "Demo Project,Default Project"
}
}
{
"metadata": {
"event_type": "TRACE_FEEDBACK_SCORE",
"metric_name": "trace:feedback_score",
"metric_value": "0.7500",
"threshold": "0.8000",
"window_seconds": "3600",
"project_ids": "0198ec68-6e06-7253-a20b-d35c9252b9ba,0198ec68-6e06-7253-a20b-d35c9252b9bb",
"project_names": "Demo Project,Default Project"
}
}
{
"metadata": {
"event_type": "TRACE_THREAD_FEEDBACK_SCORE",
"metric_name": "trace_thread:feedback_score",
"metric_value": "0.7500",
"threshold": "0.8000",
"window_seconds": "3600",
"project_ids": "0198ec68-6e06-7253-a20b-d35c9252b9ba,0198ec68-6e06-7253-a20b-d35c9252b9bb",
"project_names": "Demo Project,Default Project"
}
}
{
"metadata": {
"id": "prompt-uuid",
"name": "Prompt Name",
"description": "Prompt description",
"tags": ["system", "assistant"],
"created_at": "2025-01-15T10:00:00Z",
"created_by": "[email protected]",
"last_updated_at": "2025-01-15T10:00:00Z",
"last_updated_by": "[email protected]"
}
}
{
"metadata": {
"id": "version-uuid",
"prompt_id": "prompt-uuid",
"commit": "abc12345",
"template": "You are a helpful assistant. {{question}}",
"type": "mustache",
"metadata": {
"version": "1.0",
"model": "gpt-4"
},
"created_at": "2025-01-15T10:00:00Z",
"created_by": "[email protected]"
}
}
{
"metadata": [
{
"id": "prompt-uuid",
"name": "Prompt Name",
"description": "Prompt description",
"tags": ["deprecated"],
"created_at": "2025-01-10T10:00:00Z",
"created_by": "[email protected]",
"last_updated_at": "2025-01-15T10:00:00Z",
"last_updated_by": "[email protected]",
"latest_version": {
"id": "version-uuid",
"commit": "abc12345",
"template": "Template content",
"type": "mustache",
"created_at": "2025-01-15T10:00:00Z",
"created_by": "[email protected]"
}
}
]
}
{
"metadata": [
{
"id": "guardrail-check-uuid",
"entity_id": "trace-uuid",
"project_id": "project-uuid",
"project_name": "Project Name",
"name": "PII",
"result": "failed",
"details": {
"detected_entities": ["EMAIL", "PHONE_NUMBER"],
"message": "PII detected in response: email and phone number"
}
}
]
}
{
"metadata": [
{
"id": "experiment-uuid",
"name": "Experiment Name",
"dataset_id": "dataset-uuid",
"created_at": "2025-01-15T10:00:00Z",
"created_by": "[email protected]",
"last_updated_at": "2025-01-15T10:05:00Z",
"last_updated_by": "[email protected]",
"feedback_scores": [
{
"name": "accuracy",
"value": 0.92
},
{
"name": "latency",
"value": 1.5
}
]
}
]
}
{
"metadata": {
"event_type": "TRACE_COST",
"metric_name": "trace:cost",
"metric_value": "150.75",
"threshold": "100.00",
"window_seconds": "3600",
"project_ids": "0198ec68-6e06-7253-a20b-d35c9252b9ba,0198ec68-6e06-7253-a20b-d35c9252b9bb",
"project_names": "Demo Project,Default Project"
}
}
{
"metadata": {
"event_type": "TRACE_LATENCY",
"metric_name": "trace:latency",
"metric_value": "5250.5000",
"threshold": "5",
"window_seconds": "1800",
"project_ids": "0198ec68-6e06-7253-a20b-d35c9252b9ba,0198ec68-6e06-7253-a20b-d35c9252b9bb",
"project_names": "Demo Project,Default Project"
}
}
Add a secret token to your webhook configuration to verify that incoming requests are from Opik:
openssl rand -hex 32)Authorization header: Authorization: Bearer your-secret-tokenfrom flask import Flask, request, abort
import hmac
app = Flask(__name__)
SECRET_TOKEN = "your-secret-token-here"
@app.route('/webhook', methods=['POST'])
def handle_webhook():
# Verify the secret token
auth_header = request.headers.get('Authorization', '')
if not auth_header.startswith('Bearer '):
abort(401, 'Missing or invalid Authorization header')
token = auth_header.split(' ', 1)[1]
if not hmac.compare_digest(token, SECRET_TOKEN):
abort(401, 'Invalid secret token')
# Process the webhook
data = request.json
event_type = data.get('eventType')
# Handle different event types
if event_type == 'trace:errors':
handle_trace_errors(data)
elif event_type == 'trace:feedback_score':
handle_feedback_score(data)
elif event_type == 'experiment:finished':
handle_experiment_finished(data)
return {'status': 'success'}, 200
You can add custom headers for additional authentication or routing:
# In your webhook handler
api_key = request.headers.get('X-API-Key')
environment = request.headers.get('X-Environment')
if api_key != EXPECTED_API_KEY:
abort(401, 'Invalid API key')
# Route to different handlers based on environment
if environment == 'production':
handle_production_webhook(data)
else:
handle_staging_webhook(data)
Check endpoint accessibility:
curl -X POST -H "Content-Type: application/json" -d '{"test": "data"}' https://your-endpoint.com/webhookCheck webhook configuration:
http:// or https://Check alert status:
Opik expects webhooks to respond within the configured timeout (typically 30 seconds). If your endpoint takes longer:
Optimize your handler:
Example async processing:
from flask import Flask
from threading import Thread
app = Flask(__name__)
def process_webhook_async(data):
# Long-running processing
send_to_slack(data)
update_dashboard(data)
log_to_database(data)
@app.route('/webhook', methods=['POST'])
def handle_webhook():
data = request.json
# Start background processing
thread = Thread(target=process_webhook_async, args=(data,))
thread.start()
# Return immediately
return {'status': 'accepted'}, 200
If you receive duplicate webhooks:
Check retry configuration:
id fieldExample idempotent handler:
processed_webhook_ids = set()
@app.route('/webhook', methods=['POST'])
def handle_webhook():
data = request.json
webhook_id = data.get('id')
# Skip if already processed
if webhook_id in processed_webhook_ids:
return {'status': 'already_processed'}, 200
# Process webhook
process_alert(data)
# Mark as processed
processed_webhook_ids.add(webhook_id)
return {'status': 'success'}, 200
Check event type matching:
Check workspace context:
Check alert evaluation:
If you see SSL certificate errors in logs:
For development/testing:
For production:
Understanding Opik's alert architecture can help with troubleshooting and optimization.
The Opik Alerts system monitors your workspace for specific events and sends consolidated webhook notifications to your configured endpoints. Here's the flow:
To prevent overwhelming your webhook endpoint, Opik aggregates multiple events of the same type within a short time window (typically 30-60 seconds) and sends them as a single consolidated webhook. This is particularly useful for high-frequency events like feedback scores.
1. Event occurs (e.g., trace error logged)
↓
2. Service publishes AlertEvent to EventBus
↓
3. AlertEventListener receives event
↓
4. AlertEventEvaluationService evaluates against configured alerts
↓
5. Matching events added to AlertBucketService (Redis)
↓
6. AlertJob (runs every 5 seconds) processes ready buckets
↓
7. WebhookPublisher publishes to Redis stream
↓
8. WebhookSubscriber consumes from stream
↓
9. WebhookHttpClient sends HTTP POST request
↓
10. Retries on failure with exponential backoff
Opik uses Redis-based buckets to aggregate events:
alert_bucket:{alertId}:{eventType}This prevents overwhelming your webhook endpoint with individual events and reduces costs for high-frequency events.
Failed webhooks are automatically retried:
Create focused alerts:
Optimize for your workflow:
Test thoroughly:
Handle failures gracefully:
Implement security:
Monitor performance:
For high-volume workspaces:
For multiple projects: