Back to Daytona

Grafana Dashboard for Daytona Sandbox Monitoring

examples/otel-dashboards/grafana/README.md

0.173.09.4 KB
Original Source

Grafana Dashboard for Daytona Sandbox Monitoring

This directory contains a pre-configured Grafana dashboard for monitoring Daytona Sandbox resources including CPU, Memory, and Disk utilization using Prometheus metrics.

Dashboard Overview

The dashboard provides comprehensive monitoring across multiple pages:

  • Resource Overview: High-level view of all sandboxes with aggregate metrics
  • CPU Details: Detailed CPU utilization, limits, and heatmaps
  • Memory Details: Memory usage patterns and limits
  • Disk Details: Filesystem usage and space breakdown

Prerequisites

  • Grafana Cloud account (free tier available)
  • Daytona account with access to Experimental settings

Setup

Step 1: Create a Grafana Cloud Account

  1. Go to grafana.com and click Create free account
  2. Sign up with email, Google, or GitHub
  3. Create a new stack (choose a region close to you)

Step 2: Set Up OpenTelemetry Connection

  1. In Grafana Cloud Portal, go to ConnectionsAdd new connection
  2. Search for OpenTelemetry (OTLP) and select it
  3. Follow the setup wizard:
    • Choose instrumentation method: Select OpenTelemetry SDK, then your language
    • Choose your infrastructure: Select Linux
  4. Create a Grafana Cloud Access token:
    • Click Create a Grafana Cloud Access token for your application
    • Name it something like daytona-otel-token
    • Select All scopes
    • Click Create and save the token
  5. Get your configuration values from the instrumentation instructions:
    • Note the OTEL_EXPORTER_OTLP_ENDPOINT value (e.g., https://otlp-gateway-prod-eu-central-0.grafana.net/otlp)
    • Note the OTEL_EXPORTER_OTLP_HEADERS value (e.g., Authorization=Basic MTUxNzAz...)

Step 3: Configure Daytona

  1. Go to the Daytona Dashboard
  2. Navigate to SettingsExperimental
  3. Enter the values from Step 2:
    • OTLP Endpoint: The endpoint URL from Grafana (e.g., https://otlp-gateway-prod-eu-central-0.grafana.net/otlp)
    • OTLP Headers: The Authorization header from Grafana (e.g., Authorization=Basic MTUxNzAz...)
  4. Click Save

Step 4: Verify Metrics Are Flowing

  1. Create a sandbox in Daytona and let it run for a few minutes

  2. In Grafana Cloud, go to ObservabilityApplication to see your sandboxes

  3. Or go to Explore, select your Prometheus data source, and run:

    promql
    {__name__=~"daytona_sandbox.*"}
    
  4. You should see metrics appearing for each sandbox

Step 5: Import the Dashboard

  1. In Grafana Cloud, click Dashboards in the left menu
  2. Click NewImport
  3. Click Upload dashboard JSON file and select dashboard.json
  4. Select your Prometheus data source from the dropdown (e.g., grafanacloud-<stack>-prom)
  5. Click Import

Dashboard Variables

The dashboard uses template variables for flexible filtering:

VariableDescriptionDefault
$datasourcePrometheus data source selectorAuto-detected
$serviceFilter by service_name label (multi-select)All services
$intervalTime aggregation interval (for custom panels)1m

Interval Options

The $interval variable is available for custom panels you may add:

  • 1m: Fine-grained, best for real-time monitoring
  • 5m: Balanced detail and performance
  • 10m: Good for hourly analysis
  • 30m: Overview of trends
  • 1h: Long-term trend analysis

Widget Descriptions

Resource Overview Page

WidgetTypeDescription
Sandbox CountStatTotal number of active sandboxes reporting metrics
Critical ServicesStatCount of services exceeding resource thresholds (with color coding)
Services Resource OverviewTableDetailed metrics per service (CPU%, Memory%, Disk%, limits)
CPU Utilization by ServiceTime SeriesCPU usage percentage over time per service
Memory Utilization by ServiceTime SeriesMemory usage percentage over time per service
Disk Utilization by ServiceTime SeriesDisk usage percentage over time per service
Top CPU ConsumersBar GaugeServices with highest average CPU usage
Top Memory ConsumersBar GaugeServices with highest average memory usage
Top Disk ConsumersBar GaugeServices with highest average disk usage
Resource Pressure ScoreTime SeriesCombined weighted score of all resource utilization

CPU Details Page

WidgetTypeDescription
CPU Utilization TimeseriesTime SeriesDetailed CPU usage over time per service
Current CPU by ServiceStatCurrent CPU % with threshold coloring
CPU Limit by ServiceTableCPU cores limit, average, and peak usage
CPU Usage HeatmapHeatmapDistribution of CPU usage values over time

Memory Details Page

WidgetTypeDescription
Memory Utilization TimeseriesTime SeriesMemory usage percentage over time
Current Memory by ServiceStatCurrent memory % with threshold coloring
Memory Usage in GBTime Series (Area)Absolute memory usage in gigabytes
Memory Limits and UsageTableMemory used, limit, average, and peak %

Disk Details Page

WidgetTypeDescription
Disk Utilization TimeseriesTime SeriesDisk usage percentage over time
Current Disk by ServiceStatCurrent disk % with threshold coloring
Disk Usage in GBTime Series (Area)Absolute disk usage in gigabytes
Disk Space BreakdownTableUsed, available, total space, and utilization %

Alert Thresholds

The dashboard includes pre-configured color thresholds for visual alerting:

ResourceWarning (Yellow)Critical (Red)
CPU70%85%
Memory80%90%
Disk75%85%

These thresholds are configured in stat panels and provide immediate visual feedback when resources are constrained.

Metrics Reference

All metrics follow the OTEL to Prometheus naming convention (dots become underscores, units are appended as suffixes):

OTEL MetricPrometheus MetricDescriptionUnit
daytona.sandbox.cpu.utilizationdaytona_sandbox_cpu_utilization_percentCPU usage percentage% (0-100)
daytona.sandbox.cpu.limitdaytona_sandbox_cpu_limit_coresCPU cores limitcores
daytona.sandbox.memory.utilizationdaytona_sandbox_memory_utilization_percentMemory usage percentage% (0-100)
daytona.sandbox.memory.usagedaytona_sandbox_memory_usage_bytesMemory usedbytes
daytona.sandbox.memory.limitdaytona_sandbox_memory_limit_bytesMemory limitbytes
daytona.sandbox.filesystem.utilizationdaytona_sandbox_filesystem_utilization_percentDisk usage percentage% (0-100)
daytona.sandbox.filesystem.usagedaytona_sandbox_filesystem_usage_bytesDisk space usedbytes
daytona.sandbox.filesystem.availabledaytona_sandbox_filesystem_available_bytesAvailable disk spacebytes
daytona.sandbox.filesystem.totaldaytona_sandbox_filesystem_total_bytesTotal disk spacebytes

Labels

All metrics include the service_name label identifying the sandbox.

Troubleshooting

No Data Showing

  1. Verify metrics are being received: Run this PromQL query in Grafana Explore:

    promql
    daytona_sandbox_cpu_utilization_percent
    
  2. Check data source connection: Go to ConnectionsData Sources → your Prometheus source → Test

  3. Verify time range: Ensure the dashboard time picker includes when metrics were sent

  4. Check service filter: Try selecting "All" for the $service variable

High Cardinality Warnings

If you have many sandboxes, consider:

  • Reducing the time range
  • Using larger aggregation intervals
  • Filtering to specific services

Panel Shows "No Data"

  • Verify the metric exists in Grafana Explore using {__name__=~"daytona.*"}
  • Check label names match: service_name (not service.name)
  • Ensure sandboxes are running and generating metrics

Dashboard Import Fails

  1. Ensure JSON is valid: jq . dashboard.json
  2. Check that you have dashboard creation permissions in Grafana Cloud

Customization

Modifying Panels

  1. Import the dashboard to Grafana
  2. Enter edit mode (click pencil icon or press e)
  3. Modify panels as needed
  4. Save the dashboard

Adding New Panels

  1. Click AddVisualization

  2. Select your Prometheus data source

  3. Write PromQL queries using the metrics listed above

  4. Example for custom metric:

    promql
    avg(daytona_sandbox_cpu_utilization_percent{service_name=~"$service"}) by (service_name)
    

Adjusting Thresholds

  1. Edit the desired stat panel
  2. Go to FieldThresholds
  3. Modify warning and critical values
  4. Save the panel

Exporting Customized Dashboard

  1. Click the share icon in the top navigation bar
  2. Select Export
  3. Enable Export for sharing externally
  4. Click Save to file
  5. Replace dashboard.json with your customized version

Additional Resources