examples/otel-dashboards/grafana/README.md
This directory contains a pre-configured Grafana dashboard for monitoring Daytona Sandbox resources including CPU, Memory, and Disk utilization using Prometheus metrics.
The dashboard provides comprehensive monitoring across multiple pages:
daytona-otel-tokenOTEL_EXPORTER_OTLP_ENDPOINT value (e.g., https://otlp-gateway-prod-eu-central-0.grafana.net/otlp)OTEL_EXPORTER_OTLP_HEADERS value (e.g., Authorization=Basic MTUxNzAz...)https://otlp-gateway-prod-eu-central-0.grafana.net/otlp)Authorization=Basic MTUxNzAz...)Create a sandbox in Daytona and let it run for a few minutes
In Grafana Cloud, go to Observability → Application to see your sandboxes
Or go to Explore, select your Prometheus data source, and run:
{__name__=~"daytona_sandbox.*"}
You should see metrics appearing for each sandbox
dashboard.jsongrafanacloud-<stack>-prom)The dashboard uses template variables for flexible filtering:
| Variable | Description | Default |
|---|---|---|
$datasource | Prometheus data source selector | Auto-detected |
$service | Filter by service_name label (multi-select) | All services |
$interval | Time aggregation interval (for custom panels) | 1m |
The $interval variable is available for custom panels you may add:
| Widget | Type | Description |
|---|---|---|
| Sandbox Count | Stat | Total number of active sandboxes reporting metrics |
| Critical Services | Stat | Count of services exceeding resource thresholds (with color coding) |
| Services Resource Overview | Table | Detailed metrics per service (CPU%, Memory%, Disk%, limits) |
| CPU Utilization by Service | Time Series | CPU usage percentage over time per service |
| Memory Utilization by Service | Time Series | Memory usage percentage over time per service |
| Disk Utilization by Service | Time Series | Disk usage percentage over time per service |
| Top CPU Consumers | Bar Gauge | Services with highest average CPU usage |
| Top Memory Consumers | Bar Gauge | Services with highest average memory usage |
| Top Disk Consumers | Bar Gauge | Services with highest average disk usage |
| Resource Pressure Score | Time Series | Combined weighted score of all resource utilization |
| Widget | Type | Description |
|---|---|---|
| CPU Utilization Timeseries | Time Series | Detailed CPU usage over time per service |
| Current CPU by Service | Stat | Current CPU % with threshold coloring |
| CPU Limit by Service | Table | CPU cores limit, average, and peak usage |
| CPU Usage Heatmap | Heatmap | Distribution of CPU usage values over time |
| Widget | Type | Description |
|---|---|---|
| Memory Utilization Timeseries | Time Series | Memory usage percentage over time |
| Current Memory by Service | Stat | Current memory % with threshold coloring |
| Memory Usage in GB | Time Series (Area) | Absolute memory usage in gigabytes |
| Memory Limits and Usage | Table | Memory used, limit, average, and peak % |
| Widget | Type | Description |
|---|---|---|
| Disk Utilization Timeseries | Time Series | Disk usage percentage over time |
| Current Disk by Service | Stat | Current disk % with threshold coloring |
| Disk Usage in GB | Time Series (Area) | Absolute disk usage in gigabytes |
| Disk Space Breakdown | Table | Used, available, total space, and utilization % |
The dashboard includes pre-configured color thresholds for visual alerting:
| Resource | Warning (Yellow) | Critical (Red) |
|---|---|---|
| CPU | 70% | 85% |
| Memory | 80% | 90% |
| Disk | 75% | 85% |
These thresholds are configured in stat panels and provide immediate visual feedback when resources are constrained.
All metrics follow the OTEL to Prometheus naming convention (dots become underscores, units are appended as suffixes):
| OTEL Metric | Prometheus Metric | Description | Unit |
|---|---|---|---|
daytona.sandbox.cpu.utilization | daytona_sandbox_cpu_utilization_percent | CPU usage percentage | % (0-100) |
daytona.sandbox.cpu.limit | daytona_sandbox_cpu_limit_cores | CPU cores limit | cores |
daytona.sandbox.memory.utilization | daytona_sandbox_memory_utilization_percent | Memory usage percentage | % (0-100) |
daytona.sandbox.memory.usage | daytona_sandbox_memory_usage_bytes | Memory used | bytes |
daytona.sandbox.memory.limit | daytona_sandbox_memory_limit_bytes | Memory limit | bytes |
daytona.sandbox.filesystem.utilization | daytona_sandbox_filesystem_utilization_percent | Disk usage percentage | % (0-100) |
daytona.sandbox.filesystem.usage | daytona_sandbox_filesystem_usage_bytes | Disk space used | bytes |
daytona.sandbox.filesystem.available | daytona_sandbox_filesystem_available_bytes | Available disk space | bytes |
daytona.sandbox.filesystem.total | daytona_sandbox_filesystem_total_bytes | Total disk space | bytes |
All metrics include the service_name label identifying the sandbox.
Verify metrics are being received: Run this PromQL query in Grafana Explore:
daytona_sandbox_cpu_utilization_percent
Check data source connection: Go to Connections → Data Sources → your Prometheus source → Test
Verify time range: Ensure the dashboard time picker includes when metrics were sent
Check service filter: Try selecting "All" for the $service variable
If you have many sandboxes, consider:
{__name__=~"daytona.*"}service_name (not service.name)jq . dashboard.jsone)Click Add → Visualization
Select your Prometheus data source
Write PromQL queries using the metrics listed above
Example for custom metric:
avg(daytona_sandbox_cpu_utilization_percent{service_name=~"$service"}) by (service_name)
dashboard.json with your customized version