Performance Benchmark Workflow

This document explains the automated CRUD performance benchmark workflow that runs on every pull request.

Overview

The benchmark workflow (crud-bench.yml) automatically runs CRUD benchmarks using crud-bench to measure SurrealDB performance. It posts the results as a PR comment for review.

Workflow Triggers

The benchmark workflow runs automatically on:

Pull Requests: When a PR is opened, synchronized (new commits), or reopened
Manual Dispatch: Can be manually triggered via GitHub Actions UI for testing
- Allows specifying a custom crud-bench revision for testing upgrades

crud-bench Version Management

The workflow uses a pinned version of crud-bench to ensure consistent benchmarking across all PRs (apples-to-apples comparison). This prevents benchmark variations caused by changes in the benchmarking tool itself.

The specific revision is defined in the CRUD_BENCH_REVISION environment variable in .github/workflows/crud-bench.yml.

Why Pin the Version?

Consistency: All PRs benchmark against the same tool version
Reproducibility: Results are comparable across different time periods
Controlled Upgrades: Intentionally upgrade when ready, not automatically
Avoid False Positives: Tool changes won't be mistaken for performance regressions

How to Upgrade crud-bench

When you want to upgrade to a newer version of crud-bench:

Find the target revision:

bash

# Get latest commit from crud-bench main branch
git ls-remote https://github.com/surrealdb/crud-bench.git HEAD

Update the workflow:
- Edit .github/workflows/crud-bench.yml
- Update CRUD_BENCH_REVISION in the env section
Test the upgrade:
- Use workflow dispatch with the new revision as input
- Verify benchmarks complete successfully
- Check that output format hasn't changed
Commit and verify:
- Create a PR with the update
- Let benchmarks run to ensure everything works
- Historical comparisons will continue using the new version going forward

Manual Override

You can test a different crud-bench version without changing the workflow:

Go to Actions → Performance Benchmarks
Click Run workflow
Enter a git revision (commit hash, tag, or branch name)
Click Run workflow

Benchmark Configurations

The workflow tests multiple SurrealDB configurations:

Networked Benchmarks

These benchmarks test SurrealDB as a server using the binary built from your PR:

Memory: SurrealDB server with in-memory storage (surrealdb-memory)
RocksDB: SurrealDB server with RocksDB persistent storage (surrealdb-rocksdb)
SurrealKV: SurrealDB server with SurrealKV storage (surrealdb-surrealkv)

Embedded Benchmarks

These benchmarks test the SurrealDB SDK embedded in the benchmark tool using Cargo patching:

Embedded Memory: SurrealDB SDK with in-memory storage (surrealdb + -e memory)
Embedded RocksDB: SurrealDB SDK with RocksDB storage (surrealdb + -e rocksdb)
Embedded SurrealKV: SurrealDB SDK with SurrealKV storage (surrealdb + -e surrealkv)

Other Database Engines

The workflow also benchmarks:

SurrealKV: Direct SurrealKV storage engine (embedded, no network endpoint)
SurrealMX: SurrealMX storage engine (embedded, no network endpoint)

All benchmarks use code from your PR - networked benchmarks use the compiled binary, while embedded benchmarks use the SDK via Cargo patching.

Customizing Network Endpoints

Networked SurrealDB benchmarks require an endpoint field to specify the connection URL. The endpoint field is optional for benchmarks that don't use network connections:

yaml

# Networked benchmark - requires endpoint
- name: memory
  database: surrealdb-memory
  endpoint: ws://localhost:8000
  description: SurrealDB with in-memory storage

# HTTP variant (for testing REST API performance)
- name: memory-http
  database: surrealdb-memory
  endpoint: http://localhost:8000
  description: SurrealDB with in-memory storage (HTTP)

# Embedded benchmark - requires endpoint to specify storage
- name: embedded-memory
  database: surrealdb
  endpoint: memory
  description: SurrealDB embedded with in-memory storage

# Local database - no endpoint needed
- name: surrealkv
  database: surrealkv
  description: SurrealKV

How endpoints are used:

Networked SurrealDB (surrealdb-*): Connection URL (e.g., ws://localhost:8000, http://localhost:8000)
- Default if omitted: ws://127.0.0.1:8000
Embedded SurrealDB (surrealdb): Storage engine specification (e.g., memory, rocksdb:~/data)
- Default if omitted: ws://127.0.0.1:8000 (not useful for embedded mode, so always specify!)
SurrealKV/SurrealMX: Not used (these are local embedded databases)

This flexibility allows you to:

Compare WebSocket vs HTTP performance for networked benchmarks
Test different storage engines for embedded benchmarks
Benchmark remote endpoints if needed

Benchmark Parameters

The workflow uses a matrix strategy to test each configuration with multiple key types:

Matrix dimensions:
- Configurations: 6 (memory, rocksdb, embedded-memory, embedded-rocksdb, surrealkv-local, surrealmx)
- Key types: 4 (integer, string26, string90, string250)
- Total jobs: 24 (6 configs × 4 key types)
Benchmark parameters per job:
- Samples: 10,000 operations per CRUD operation
- Clients: 12 concurrent clients
- Threads: 48 threads per client
- Order: Randomized key generation (-r flag)

Each matrix job runs independently with a clean database state, ensuring accurate performance measurements. Multiple key types ensure performance is measured across different indexing scenarios (small integers vs. long strings).

Operations Tested

For each configuration, the benchmark measures:

Create: Insert operations with unique records
Read: Select operations by primary key
Update: Modify existing records
Delete: Remove records

Metrics Collected

For each operation, the following metrics are captured:

Throughput: Operations per second
Latency:
- Average (mean)
- P50 (median)
- P95 (95th percentile)
- P99 (99th percentile)
Total time: Complete operation duration
Sample count: Number of operations performed

Performance Analysis

The workflow measures the following for each benchmark configuration:

Throughput: Operations per second for each CRUD operation
Latency percentiles: P50, P95, and P99 latencies
Sample count: Number of operations performed

Results are posted as a PR comment for manual review and comparison.

Result Reporting

PR Comments

After benchmarks complete, the workflow posts a comment on the PR with:

Summary Table: Throughput and latency percentiles for each config/operation
Detailed Metrics: Expandable section with full metrics including average latency and total time
Methodology: How the benchmarks were run

Result Artifacts

Benchmark results are stored as GitHub Actions artifacts:

Current results: Individual JSON files for each configuration (30 days retention)
Analysis report: Markdown and JSON files with the full analysis (90 days retention)

These can be downloaded from the workflow run for further analysis or comparison.

Manually Running Benchmarks

Trigger via GitHub UI

Go to Actions tab
Select Performance Benchmarks workflow
Click Run workflow
Select branch and click Run workflow

Interpreting Results

Understanding the Metrics

Throughput: Higher is better - measures how many operations can be performed per second
Latency P50 (median): Half of operations complete faster than this time
Latency P95: 95% of operations complete faster than this time
Latency P99: 99% of operations complete faster than this time

What to Look For

Review the results to:

Verify expected performance: Check that performance aligns with expectations for your changes
Compare configurations: See how different storage engines perform
Identify anomalies: Look for unexpectedly low throughput or high latency
Review across operations: Check if changes affect specific CRUD operations more than others

Variability

Benchmark results can vary due to:

CI runner variance: Different runners may have different performance characteristics
System load: Background processes can affect timing
Network overhead: For networked benchmarks, network conditions matter
Warmup effects: First runs may be slower than subsequent runs

For more reliable comparisons, consider:

Running benchmarks multiple times
Comparing against your local development environment
Looking at trends across multiple commits rather than single runs

Troubleshooting

Benchmark Failed to Run

Check the workflow logs for:

Build failures: SurrealDB binary failed to compile
Server startup issues: SurrealDB server failed to start or become ready
Timeout: Benchmark took longer than 45 minutes (rare)

Analysis Script Errors

If the Python analysis script fails:

Verify JSON format matches expected crud-bench output
Check that result files were generated by crud-bench
Review workflow logs for parsing errors

Benchmarks are Slow

The workflow uses parallel matrix execution, so total wall-clock time should be ~5-7 minutes if runners are available.

Build Phase (parallel):

SurrealDB binaries with minimal features: ~3-4 minutes
crud-bench with patched surrealdb: ~5 minutes
Total build time: ~5 minutes (builds run in parallel)

Benchmark Phase (parallel):

24 jobs execute simultaneously
Each job downloads pre-built binaries and runs benchmarks: ~30-60 seconds
Total benchmark time: ~1 minute (when sufficient runners are available)

If individual jobs are too slow:

Reduce sample count: Change -s 10000 in the workflow to -s 5000 or -s 2500
Reduce concurrency: Change -c 12 -t 48 to lower values like -c 8 -t 16

If you want to run fewer benchmarks:

Skip specific configs: Add them to the exclude: section in the matrix
Skip key types: Add specific config/key combinations to exclude:

Example:

yaml

exclude:
  - config: rocksdb
    key_type: string250

Architecture

Workflow Jobs

The workflow uses a three-phase architecture with shared binary builds:

Phase 1: Build Jobs (parallel)

build-surrealdb-binaries (matrix job: 2 parallel builds)
- Builds SurrealDB binaries with minimal features for each storage type
- Matrix dimensions: 2 configs (memory, rocksdb)
- Features per build:
  - memory: --no-default-features --features "storage-mem,http"
  - rocksdb: --no-default-features --features "storage-rocksdb,http"
- Uploads binaries as artifacts: surreal-binary-{config}
- Timeout: 15 minutes per build
- Expected time: ~3-4 minutes per build (parallel)
build-crud-bench (single job)
- Checks out SurrealDB sources and crud-bench repository
- Aligns version compatibility between crud-bench and PR
- Uses Cargo patching (.github/tools/.cargo/config.toml) to link crud-bench against local surrealdb
- Builds crud-bench with patched dependencies
- Uploads binary as artifact: crud-bench-binary
- Timeout: 15 minutes
- Expected time: ~5 minutes

Phase 2: Benchmark Jobs (parallel, 24 jobs)

benchmark (matrix job: 24 parallel jobs)
- Depends on: build-surrealdb-binaries, build-crud-bench
- Matrix dimensions: 6 configs × 4 key types = 24 jobs
- Each job:
  - Downloads pre-built binaries from build phase
    - SurrealDB binary (if needs_server == true)
    - crud-bench binary (always)
  - Checks out crud-bench repository for directory structure
  - Starts SurrealDB server (for networked benchmarks only)
  - Runs single benchmark with specific config and key type
  - Uploads results as individual artifact
- Clean state for every job (no data carryover)
- Timeout: 30 minutes per job
- Expected time: ~30-60 seconds per job (just benchmark execution + artifact I/O)

Phase 3: Analysis Job

analyze-and-report
- Downloads benchmark results from all 24 matrix jobs
- Merges all JSON result files
- Parses and analyzes the results
- Generates markdown and JSON reports
- Posts/updates PR comment with results
- Uploads analysis artifacts

How PR Code is Used

The workflow ensures all benchmarks test your PR's code through shared build jobs that create binaries once and distribute them to all benchmark jobs:

Build Phase

SurrealDB Binaries (for networked benchmarks)
- The build-surrealdb-binaries job builds minimal-feature binaries from your PR
- Each binary includes only the features needed for its storage type:
  - memory: In-memory storage + HTTP server
  - rocksdb: RocksDB storage + HTTP server
- Binaries are uploaded as artifacts and downloaded by benchmark jobs
- This ensures networked benchmarks test your PR's server code
crud-bench Binary (for all benchmarks)
- The build-crud-bench job uses Cargo patching (.github/tools/.cargo/config.toml)
- This patches the surrealdb dependency in crud-bench to use your local SDK code
- When crud-bench builds, it links against your PR's surrealdb SDK
- The binary is uploaded as an artifact and downloaded by all benchmark jobs
- This ensures embedded benchmarks test your PR's SDK code

Benchmark Phase

Networked Benchmarks (memory, rocksdb):

Download storage-specific SurrealDB binary from build phase
Start local SurrealDB server using the downloaded binary
Download and run crud-bench binary (which also uses your PR's SDK)
crud-bench connects to local server via ws://localhost:8000

Embedded Benchmarks (embedded-memory, embedded-rocksdb, surrealkv-local, surrealmx):

Download crud-bench binary from build phase (linked against your PR's SDK)
Run crud-bench with embedded storage (no server needed)
Benchmarks execute entirely within the SDK from your PR

Analysis Script

The Python script (.github/scripts/analyze_benchmark.py) handles:

Parsing crud-bench JSON output
Formatting metrics for display
Markdown report generation
JSON output for debugging