pkg/cmd/roachprod-centralized/README.md
A centralized REST API service for managing CockroachDB roachprod clusters, tasks, and cloud provider operations.
The roachprod-centralized service provides a unified HTTP API for:
./dev doctor to verify)# From the CockroachDB repository root
./dev build roachprod-centralized
# Set minimum required configuration
export ROACHPROD_API_AUTHENTICATION_TYPE=disabled
export ROACHPROD_DATABASE_TYPE=memory
./dev run roachprod-centralized api
The API will be available at http://localhost:8080 with metrics at http://localhost:8081/metrics.
The roachprod-centralized command provides the following subcommands:
# Start the API server with default configuration (all-in-one mode)
roachprod-centralized api
# Start with custom configuration file
roachprod-centralized api --config /path/to/config.yaml
# Start with specific port
roachprod-centralized api --api-port 9090
# Start API-only mode without task workers (requires CockroachDB)
roachprod-centralized api --no-workers --database-type cockroachdb
# View all available options
roachprod-centralized api --help
# Start dedicated task workers with metrics endpoint (requires CockroachDB)
roachprod-centralized workers --database-type cockroachdb --database-url "postgresql://..."
# Start workers with custom configuration
roachprod-centralized workers --config /path/to/config.yaml
# Configure number of concurrent workers
roachprod-centralized workers --tasks-workers 5
# View all available options
roachprod-centralized workers --help
The service supports three deployment modes for flexibility and scalability:
| Mode | Command | Use Case | Database | Workers | API | Background Tasks |
|---|---|---|---|---|---|---|
| All-in-one | api | Development, small deployments | memory or cockroachdb | ✓ | ✓ | ✓ |
| API-only | api --no-workers | Horizontally scaled API tier | cockroachdb (required) | ✗ | ✓ | ✗ |
| Workers-only | workers | Horizontally scaled task processing | cockroachdb (required) | ✓ | ✗ (metrics only) | ✓ |
All-in-one mode (default):
API-only mode (--no-workers):
Workers-only mode (workers command):
Smart Initial Sync: When instances start, they intelligently decide whether to sync cluster data from cloud providers:
Example: Scaled Production Setup
# Terminal 1: API instance 1 (no workers)
roachprod-centralized api --no-workers --api-port 8080 --database-type cockroachdb
# Terminal 2: API instance 2 (no workers)
roachprod-centralized api --no-workers --api-port 8090 --database-type cockroachdb
# Terminal 3: Workers instance 1
roachprod-centralized workers --tasks-workers 5 --api-metrics-port 9081
# Terminal 4: Workers instance 2
roachprod-centralized workers --tasks-workers 5 --api-metrics-port 9082
Key configuration flags (all can be set via environment variables):
--api-port HTTP API port (default: 8080)
--api-base-path Base URL path for API endpoints
--api-metrics-enabled Enable metrics collection (default: true)
--api-metrics-port Metrics HTTP port (default: 8081)
--api-authentication-disabled Disable API authentication (default: false)
--database-type Database type: memory|cockroachdb (default: memory)
--database-url Database connection URL
--log-level Logging level: debug|info|warn|error (default: info)
--tasks-workers Number of background task workers (default: 1)
--no-workers Run API without task workers (api command only, requires CockroachDB)
All configuration can be set via environment variables with the ROACHPROD_ prefix:
# Core API settings
export ROACHPROD_API_PORT=8080
export ROACHPROD_API_METRICS_ENABLED=true
export ROACHPROD_LOG_LEVEL=info
# Authentication disabled (for development)
export ROACHPROD_API_AUTHENTICATION_METHOD=disabled
# Authentication via GCP Identity-Aware Proxy
export ROACHPROD_API_AUTHENTICATION_METHOD=jwt
export ROACHPROD_API_AUTHENTICATION_JWT_HEADER="X-Goog-IAP-JWT-Assertion"
export ROACHPROD_API_AUTHENTICATION_JWT_AUDIENCE="your-audience"
# Database configuration
export ROACHPROD_DATABASE_TYPE=cockroachdb
export ROACHPROD_DATABASE_URL="postgresql://user:password@localhost:26257/roachprod?sslmode=require"
export ROACHPROD_DATABASE_MAX_CONNS=10
# Task processing
export ROACHPROD_TASKS_WORKERS=3
Create a YAML configuration file for more complex setups:
Log:
Level: info
API:
Port: 8080
BasePath: "/api"
Metrics:
Enabled: true
Port: 8081
Authentication:
Disabled: false
JWT:
Header: "X-Goog-IAP-JWT-Assertion"
Audience: "your-audience"
Database:
Type: cockroachdb
URL: "postgresql://user:password@localhost:26257/roachprod?sslmode=require"
MaxConns: 10
MaxIdleTime: 300
Tasks:
Workers: 3
See docs/CLOUD_PROVIDER_CONFIG.md for detailed cloud provider setup.
Quick Examples:
The service follows a clean architecture pattern:
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Controllers │────│ Services │────│ Repositories │
│ (HTTP Layer) │ │ (Business Logic)│ │ (Data Layer) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
│ ┌─────────────────┐ │
└──────────────│ Background │─────────────┘
│ Task System │
└─────────────────┘
Key Components:
controllers/)services/)repositories/)models/)utils/)Authorization Boundary (Important):
For detailed architecture information, see docs/ARCHITECTURE.md.
For small production deployments or development:
# Enable authentication
export ROACHPROD_API_AUTHENTICATION_TYPE=bearer
export ROACHPROD_API_AUTHENTICATION_BEARER_OKTA_ISSUER="https://your-org.okta.com"
export ROACHPROD_API_AUTHENTICATION_BEARER_OKTA_AUDIENCE="your-audience"
export ROACHPROD_API_AUTHENTICATION_BEARER_OKTA_CLIENT_ID="your-client-id"
export ROACHPROD_API_AUTHENTICATION_BEARER_OKTA_CLIENT_SECRET="your-client-secret"
# Use CockroachDB backend
export ROACHPROD_DATABASE_TYPE=cockroachdb
export ROACHPROD_DATABASE_URL="postgresql://user:password@prod-cluster:26257/roachprod?sslmode=require"
# Configure workers
export ROACHPROD_TASKS_WORKERS=5
# Start the service
roachprod-centralized api
For high-availability and load distribution, run separate API and worker instances:
1. API Instances (scale horizontally for HTTP load):
# API Instance 1
export ROACHPROD_DATABASE_TYPE=cockroachdb
export ROACHPROD_DATABASE_URL="postgresql://user:password@prod-cluster:26257/roachprod?sslmode=require"
export ROACHPROD_API_PORT=8080
roachprod-centralized api --no-workers
# API Instance 2 (different server/container)
export ROACHPROD_DATABASE_TYPE=cockroachdb
export ROACHPROD_DATABASE_URL="postgresql://user:password@prod-cluster:26257/roachprod?sslmode=require"
export ROACHPROD_API_PORT=8080
roachprod-centralized api --no-workers
2. Worker Instances (scale horizontally for task processing):
# Worker Instance 1
export ROACHPROD_DATABASE_TYPE=cockroachdb
export ROACHPROD_DATABASE_URL="postgresql://user:password@prod-cluster:26257/roachprod?sslmode=require"
export ROACHPROD_TASKS_WORKERS=5
export ROACHPROD_API_METRICS_PORT=9081
roachprod-centralized workers
# Worker Instance 2 (different server/container)
export ROACHPROD_DATABASE_TYPE=cockroachdb
export ROACHPROD_DATABASE_URL="postgresql://user:password@prod-cluster:26257/roachprod?sslmode=require"
export ROACHPROD_TASKS_WORKERS=5
export ROACHPROD_API_METRICS_PORT=9082
roachprod-centralized workers
3. Load Balancer Configuration (for API instances):
GET /healthEnable Bearer Authentication:
export ROACHPROD_API_AUTHENTICATION_TYPE=bearer
export ROACHPROD_API_AUTHENTICATION_BEARER_OKTA_ISSUER="https://your-org.okta.com"
export ROACHPROD_API_AUTHENTICATION_BEARER_OKTA_AUDIENCE="your-audience"
export ROACHPROD_API_AUTHENTICATION_BEARER_OKTA_CLIENT_ID="your-client-id"
export ROACHPROD_API_AUTHENTICATION_BEARER_OKTA_CLIENT_SECRET="your-client-secret"
Use CockroachDB Backend (required for scaled deployments):
export ROACHPROD_DATABASE_TYPE=cockroachdb
export ROACHPROD_DATABASE_URL="postgresql://user:password@prod-cluster:26257/roachprod?sslmode=require"
Bootstrap SCIM Provisioning:
# Generate a bootstrap token for initial SCIM setup (first startup only)
export ROACHPROD_BOOTSTRAP_SCIM_TOKEN="rp\$sa\$1\$$(openssl rand -base64 32 | tr -dc 'a-zA-Z0-9' | head -c 43)"
On first startup, this creates a short-lived (6 hour) service account for configuring Okta SCIM. See docs/services/AUTH.md for details.
Configure Cloud Providers: Set up cloud provider credentials as detailed in docs/CLOUD_PROVIDER_CONFIG.md
Resource Limits:
export ROACHPROD_DATABASE_MAX_CONNS=20
export ROACHPROD_TASKS_WORKERS=5 # Per worker instance
Monitoring:
:8081/metrics (API instances)/health and /health/detailedSee docker/README.md for containerized deployment options.
The service provides comprehensive health checks:
GET /health - Basic API availabilityGET /health/detailed - Component-level statusGET :8081/metrics - Prometheus metricsFor local development setup and contribution guidelines, see docs/DEVELOPMENT.md.
# Run all tests
./dev test pkg/cmd/roachprod-centralized/...
# Run specific package tests
./dev test pkg/cmd/roachprod-centralized/services/clusters
# Run with race detection
./dev test pkg/cmd/roachprod-centralized/... --race
After modifying protocol buffers or other generated code:
./dev generate
1. Authentication Errors
Error: authentication failed
Solution: For development, disable authentication:
export ROACHPROD_API_AUTHENTICATION_METHOD=disabled
2. Database Connection Issues
Error: failed to connect to database
Solution: Check database configuration and connectivity:
# For development, use in-memory storage
export ROACHPROD_DATABASE_TYPE=memory
# Or verify CockroachDB connection
psql "postgresql://user:password@localhost:26257/roachprod?sslmode=require"
3. Cloud Provider Configuration
Error: failed to initialize cloud provider
Solution: Verify cloud provider credentials are properly configured. See docs/CLOUD_PROVIDER_CONFIG.md.
4. Port Already in Use
Error: bind: address already in use
Solution: Change the API port:
export ROACHPROD_API_PORT=9090
5. --no-workers with Memory Database
Error: --no-workers cannot be used with memory database backend
Solution: The --no-workers flag requires CockroachDB for distributed coordination:
export ROACHPROD_DATABASE_TYPE=cockroachdb
export ROACHPROD_DATABASE_URL="postgresql://..."
roachprod-centralized api --no-workers
6. Workers Command with Memory Database
Error: workers command requires database.type=cockroachdb
Solution: The workers command requires CockroachDB for distributed task coordination:
export ROACHPROD_DATABASE_TYPE=cockroachdb
export ROACHPROD_DATABASE_URL="postgresql://..."
roachprod-centralized workers
7. Tasks Not Being Processed
Tasks remain in pending state indefinitely
Solution: Verify workers are running:
ROACHPROD_TASKS_WORKERS is > 0workers instance is runningexport ROACHPROD_LOG_LEVEL=debug8. Background Work Not Running in API-only Mode
Clusters not syncing, health checks not working
Expected behavior: When using --no-workers, background work is intentionally disabled:
This is correct - background work that schedules tasks shouldn't run without workers. To enable background work, run dedicated workers instances.
Enable debug logging for detailed troubleshooting:
export ROACHPROD_LOG_LEVEL=debug
Checking Service Logs:
# Look for these log messages to verify correct mode:
# API-only mode:
# "health service: skipping instance registration (workers disabled)"
# "clusters service: skipping background work (workers disabled)"
# "Task workers disabled (Workers=0), skipping task processing routine"
# Workers mode:
# "Starting in metrics-only mode (workers)"
# "Starting tasks processing routine"