apps/opik-documentation/documentation/fern/docs/self-host/architecture.mdx
Opik's architecture consists of multiple services that each handle a specific role, including:
Opik's main backend uses Java 21 LTS and Dropwizard, structured as a RESTful web service offering public API endpoints for core functionality. Full API documentation is available here.
Key responsibilities:
For observability, Opik uses OpenTelemetry due to its vendor-neutral approach and wide support across languages and frameworks. It provides a single, consistent way to collect telemetry data from all services and applications.
You can find the full backend codebase in GitHub under the apps/opik-backend folder.
Opik includes a Python backend service (Flask + Gunicorn) that handles workloads requiring Python execution:
The Java backend calls the Python backend for evaluator execution, and both services share Redis for job coordination.
You can find the full Python backend codebase in GitHub under the apps/opik-python-backend folder.
Opik's frontend is a TypeScript + React single-page application built with Vite and served by Nginx. The Nginx server handles two roles:
index.html)./api/* requests to the Java backend by stripping the /api prefix and proxying to port 8080. This includes WebSocket upgrade support for streaming endpoints.The frontend uses TanStack Router for file-based routing, TanStack React Query for server state management, and Zustand for client-side state.
You can find the full frontend codebase in GitHub under the apps/opik-frontend folder.
Opik provides SDKs for Python and TypeScript. Both SDKs implement asynchronous batching to optimize network
efficiency — they accumulate individual trace and span operations and send them as bulk requests to the backend's
batch endpoints (POST /v1/private/traces/batch, POST /v1/private/spans/batch, etc.).
| Python SDK | TypeScript SDK | |
|---|---|---|
| HTTP client | httpx | fetch API |
| Batching | Message queue + batch manager with memory-capped batches (50MB) | Debounce-based batch queue (default 300ms / 100 items) |
| Retries | Exponential backoff (0.5s–10s) | Default 2 retries |
You can find the SDK codebases in GitHub under sdks/python for the Python SDK
and sdks/typescript for the TypeScript SDK.
ClickHouse is a column-oriented OLAP database optimized for fast analytics on large datasets. Opik uses ClickHouse for data that requires near real-time ingestion and analytical queries:
The backend connects to ClickHouse via HTTP (port 8123) using a reactive R2DBC driver for non-blocking queries. Async inserts are enabled for high-throughput ingestion with configurable batching and deduplication.
In Kubernetes deployments, ClickHouse is managed by the Altinity ClickHouse Operator, which handles cluster provisioning, scaling, and monitoring. Zookeeper provides distributed coordination for replica synchronization.
<Frame> </Frame> <em> <small>Liquibase automates schema management</small> </em>Opik uses MySQL for ACID-compliant transactional storage of lower-volume but critical data:
The backend connects via JDBC with connection pooling and supports AWS RDS with IAM authentication for cloud deployments.
<Frame> </Frame> <em> <small>Liquibase automates schema management</small> </em>Redis serves multiple roles in Opik's architecture:
Opik uses MinIO as an S3-compatible object store for binary data that doesn't belong in the relational or analytical databases:
In production deployments, MinIO can be replaced with any S3-compatible storage service (e.g., AWS S3, Google Cloud Storage).
Opik is built on top of open-source infrastructure (MySQL, Redis, ClickHouse, Kubernetes), making it straightforward to integrate with popular observability stacks such as Grafana and Prometheus: