docs/architecture.md
This document summarizes the application's architecture and core flows.
Below is a conceptual diagram representing the main components and their interactions within the application:
Lago API is a Rails application running on AWS. The architecture consists of several key components:
The main worker listens on the following queues (in priority order):
| Queue | Purpose |
|---|---|
high_priority | Urgent tasks requiring immediate processing |
default | Standard job processing |
mailers | Email delivery jobs |
clock | Scheduled/recurring tasks from Clockwork |
providers | Third-party provider integrations |
webhook | Webhook delivery jobs |
invoices | Invoice generation and processing |
wallets | (deprecated - jobs migrated to other queues) |
integrations | Integration-related tasks |
low_priority | Non-urgent background tasks |
long_running | Jobs expected to take extended time |
SIDEKIQ_CONCURRENCY env var in production)Lago supports dedicated workers for specific job types to improve performance and monitoring. When enabled via environment variables, jobs are routed to dedicated queues with their own worker processes, offloading work from the default worker.
| Environment Variable | Queue Name | Default Concurrency (Production) | Purpose |
|---|---|---|---|
SIDEKIQ_ANALYTICS | analytics | 10 | Analytics processing |
SIDEKIQ_BILLING | billing | 5 | Billing operations |
SIDEKIQ_CLOCK | clock_worker | 5 | Scheduled tasks |
SIDEKIQ_EVENTS | events | 10 | Event processing |
SIDEKIQ_PAYMENTS | payments | 10 | Payment operations |
SIDEKIQ_PDF | pdfs | 10 | PDF generation |
SIDEKIQ_WEBHOOK | webhook_worker | 10 | Webhook delivery |
SIDEKIQ_AI_AGENT | ai_agent | 10 | AI Agent |
Jobs dynamically select their queue based on environment variables. Example from webhook jobs:
queue_as do
if ActiveModel::Type::Boolean.new.cast(ENV["SIDEKIQ_WEBHOOK"])
:webhook_worker # Dedicated queue with dedicated worker process
else
:webhook # Default worker queue
end
end
Behavior:
SIDEKIQ_WEBHOOK=true:
webhook_worker queueSIDEKIQ_WEBHOOK=false or unset:
webhook queue on the default workerThis pattern is applied across all dedicated worker types, allowing flexible scaling and performance optimization of specific job categories based on workload requirements.
Job Enqueuing
Job Processing
Job States
Default Retry Configuration:
max_retries set to 0 in config/initializers/sidekiq.rbsidekiq_options retry: 0 in app/jobs/application_job.rb.Error Handling Patterns:
Transient Errors (Network issues, temporary service unavailability)
Permanent Errors (Invalid data, business logic failures)
Timeout Handling
config/initializers/sidekiq.rbFallback Mechanisms:
Scheduled Retry Jobs
Dead Queue Processing
Monitoring & Alerting
Error Recovery Flow:
Job Fails → Retry #1 (with exponential backoff)
→ Still Fails → Move to Dead Queue
→ Manual Investigation
→ Optional: Manual Retry from Dead Queue
→ OR: Scheduled Retry Job picks up related operation
Workers process jobs from queues in strict priority order:
high_priority - Critical operations processed firstdefault - Standard operationsBest Practices:
high_priority sparingly for truly urgent operationslong_running queue to prevent blockingLago's production deployment includes multiple worker types, each handling specific workloads:
| Worker | Queue(s) | Purpose | Required | Scaling Considerations |
|---|---|---|---|---|
Default Worker (worker) | high_priority, default, mailers, clock, providers, webhook, invoices, wallets, integrations, low_priority, long_running | Handles all job types when dedicated workers are disabled | ✅ Yes | Scale based on overall job volume; start with 3-5 replicas |
| Analytics Worker | analytics | Processes analytics calculations and reporting | Optional | Enable with SIDEKIQ_ANALYTICS=true; scale based on analytics job volume |
| Billing Worker | billing | Handles billing operations and invoice generation | Recommended | Enable with SIDEKIQ_BILLING=true; critical for billing-heavy workloads |
| Clock Worker | clock_worker | Processes scheduled jobs from Clockwork | Optional | Enable with SIDEKIQ_CLOCK=true; single instance usually sufficient |
| Events Worker | events | Processes incoming usage events | Highly Recommended | Enable with SIDEKIQ_EVENTS=true; scale based on event ingestion rate |
| Payments Worker | payments | Handles payment processing operations | Recommended | Enable with SIDEKIQ_PAYMENTS=true; scale based on payment volume |
| PDF Worker | pdfs | Generates PDF invoices and documents | Highly Recommended | Enable with SIDEKIQ_PDF=true; PDF generation is CPU-intensive |
| Webhook Worker | webhook_worker | Delivers webhooks to customer endpoints | Highly Recommended | Enable with SIDEKIQ_WEBHOOK=true; isolate webhook delays from core processing |
Those workers are not related to Sidekiq and do not pull jobs from Redis. They are part of the event processing pipeline and use Kafka as their event store.
| Worker | Purpose | Required | Notes |
|---|---|---|---|
| Events Consumer Worker | Consumes events from external queue (e.g., Kafka, SQS) | Conditional | Required if using event streaming architecture |
| Events Processor Worker | Processes and aggregates usage events | Conditional | Part of event processing pipeline; handles complex event transformations |
| Service | Purpose | Required | Notes |
|---|---|---|---|
API (api) | Main Rails API server | ✅ Yes | Handles HTTP requests; scale based on request volume |
App (app) | Frontend application | ✅ Yes | Serves the user interface |
PDF (pdf) | PDF generation service | Recommended | Repackaged Gotemberg server, it generates PDF and is triggered by the pdf-worker process through an API call |
| Service | Purpose | Required |
|---|---|---|
Clock Process (clock) | Clockwork scheduler for recurring jobs | ✅ Yes |
Based on production deployment data from high-volume clusters, here are recommended resource configurations:
| Workload | CPU Request | CPU Limit | Memory Request | Memory Limit | Recommended Replicas | Notes |
|---|---|---|---|---|---|---|
| API | 4 cores | - | 4Gi | 4Gi | 10-30+ | Scale based on request volume; high traffic requires more replicas |
| App | 100m | - | 128Mi | 128Mi | 2-3 | Only serves static assets through nginx, no need to allocate a lot of resources |
| Clock Process | 100m | - | 812Mi | 812Mi | 1 | This only enqueues jobs and is not impacted by the volume of requests |
| Default Worker | 1100m | - | 2Gi | 2Gi | 3-5 | Reduce replicas when using dedicated workers |
| Analytics Worker | 1core | - | 1100Mi | 1100Mi | 3-5 | CPU-intensive analytics calculations |
| Billing Worker | 1100m | - | 1100Mi | 1100Mi | 3-5 | Critical for billing operations; scale during billing cycles |
| Events Worker | 500m | - | 1Gi | 1Gi | 2-5 | Scale based on event ingestion rate |
| Events Consumer Worker | 1100m | - | 1Gi | 1Gi | 1 | Single replica often sufficient with consumer groups |
| Events Processor Worker | 2 cores | - | 2Gi | 2Gi | 1 | CPU and memory intensive event processing |
| PDF Worker | 1100m | - | 1Gi | 1Gi | 1 | Only reads from sidekiq queue and trigger a PDF generation through the PDF deployment (see next) |
| 2 cores | - | 1Gi | 1Gi | 2-4 | Generates PDF through gotemberg, triggered by worker through HTTP call | |
| Webhook Worker | 1100m | - | 1Gi | 1Gi | 3-10 | Scale based on webhook volume; network I/O bound |
| Clock Worker | 3 cores | - | 8Gi | 8Gi | 1 | High-memory variant for special processing needs |
When to Scale Up (Increase Resources) (see Monitoring for metrics):
sidekiq_queue_latency_seconds)sidekiq_queue_enqueued_jobs)When to Scale Out (Add Replicas):
Resource Optimization Tips:
CPU Limits: Generally avoid CPU limits to prevent throttling; use requests for scheduling
Memory Limits: Set memory limits to prevent OOM but allow headroom (20-50% above requests)
Dedicated Workers: Enable dedicated workers for high-volume job types to isolate resource usage
Autoscaling: Configure Horizontal Pod Autoscaler (HPA) based on:
sidekiq_queue_enqueued_jobs - see Monitoring)Concurrency Tuning: Adjust SIDEKIQ_CONCURRENCY based on:
For smaller deployments, minimum required services:
| Service | Replicas | Resources |
|---|---|---|
| API | 2 | 1 core, 2Gi RAM |
| Default Worker | 2 | 500m CPU, 1Gi RAM |
| Clock Worker | 1 | 100m CPU, 512Mi RAM |
| App | 1 | 100m CPU, 128Mi RAM |
Recommended additions as you scale:
Lago uses Clockwork to schedule recurring jobs. The clock process runs independently and enqueues jobs into Sidekiq at specified intervals.
Start command: bundle exec clockwork ./clock.rb
| Job | Interval | Description | Configuration |
|---|---|---|---|
| Activate Subscriptions | Every 5 minutes | Activates pending subscriptions | - |
| Refresh Draft Invoices | Every 5 minutes | Updates draft invoice data | - |
| Process Subscription Activity | Configurable (default: 1 minute) | Processes subscription activities | LAGO_SUBSCRIPTION_ACTIVITY_PROCESSING_INTERVAL_SECONDS |
| Refresh Lifetime Usages | Configurable (default: 5 minutes) | Refreshes lifetime usage data | LAGO_LIFETIME_USAGE_REFRESH_INTERVAL_SECONDS, disable with LAGO_DISABLE_LIFETIME_USAGE_REFRESH=true |
| Refresh Wallets Ongoing Balance | Every 5 minutes | Updates wallet balances | Requires cache configuration (LAGO_MEMCACHE_SERVERS or LAGO_REDIS_CACHE_URL), disable with LAGO_DISABLE_WALLET_REFRESH=true |
| Refresh Flagged Subscriptions | Every 1 minute | Refreshes flagged subscriptions | Requires LAGO_REDIS_STORE_URL |
| Job | Schedule | Description | Configuration |
|---|---|---|---|
| Terminate Ended Subscriptions | At :05 | Ends subscriptions that have reached their end date | - |
| Post-Validate Events | At :05 | Validates events | Disable with LAGO_DISABLE_EVENTS_VALIDATION=true |
| Bill Customers | At :10 | Processes subscription billing | - |
| API Keys Track Usage | At :15 | Tracks API key usage metrics | - |
| Compute Daily Usage | At :15 | Calculates daily usage statistics | - |
| Finalize Invoices | At :20 | Finalizes pending invoices | - |
| Mark Invoices as Payment Overdue | At :25 | Updates overdue invoice status | - |
| Terminate Coupons | At :30 | Expires coupons that have reached their end date | - |
| Retry Generating Subscription Invoices | At :30 | Retries failed invoice generation | - |
| Bill Ended Trial Subscriptions | At :35 | Bills subscriptions when trials end | - |
| Terminate Wallets | At :45 | Expires wallets | - |
| Process Dunning Campaigns | At :45 | Executes dunning campaign actions | - |
| Termination Alert | At :50 | Sends alerts for upcoming subscription terminations | - |
| Terminate Expired Wallet Transaction Rules | At :50 | Cleans up expired wallet rules | - |
| Top Up Wallet Interval Credits | At :55 | Adds recurring wallet credits | - |
| Job | Interval | Description |
|---|---|---|
| Retry Failed Invoices | Every 15 minutes | Attempts to regenerate failed invoices |
| Retry Inbound Webhooks | Every 15 minutes | Retries failed inbound webhook processing |
| Job | Schedule | Description |
|---|---|---|
| Clean Webhooks | At 01:00 | Removes old webhook records |
| Clean Inbound Webhooks | At 01:10 | Removes old inbound webhook records |
Lago uses three separate Redis instances for different purposes:
Configuration:
REDIS_URL - Connection URI (host, port, database)REDIS_PASSWORD - Password (separate for security)Purpose: Stores Sidekiq job queues and job data
Usage: All Sidekiq workers connect to this Redis instance to fetch and process jobs
Configuration:
LAGO_REDIS_CACHE_URL - Connection URILAGO_REDIS_CACHE_PASSWORD - Password (separate for security)Purpose: Rails application cache store
Usage:
Rails.cache)Configuration:
LAGO_REDIS_STORE_URL - Connection URILAGO_REDIS_STORE_PASSWORD - Password (separate for security)LAGO_REDIS_STORE_SSL - Use SSL to access Redis. It will be used only if LAGO_REDIS_STORE_URL does not contains the rediss:// prefixLAGO_REDIS_STORE_DISABLE_SSL_VERIFY - Turn off SSL certificate verificationPurpose: Dedicated storage for event-related data and event processing workflows
Usage:
ConsumeSubscriptionRefreshedQueueJob)Lago follows a secure configuration pattern for Redis connections:
*_URL environment variable containing the connection details (host, port, database, protocol)*_PASSWORD environment variables:
REDIS_PASSWORDLAGO_REDIS_CACHE_PASSWORDLAGO_REDIS_STORE_PASSWORDBenefits:
Architecture Note: The separation of Redis instances allows for independent scaling and isolation of concerns—queue management, caching, and event processing can be optimized and monitored separately.
Lago implements multiple encryption mechanisms to protect sensitive data:
Configuration:
ENCRYPTION_KEY_DERIVATION_SALT - Salt for key derivationENCRYPTION_PRIMARY_KEY - Primary encryption keyPurpose: Encrypts sensitive data at the database row level using Rails' built-in Active Record encryption
Usage:
Mechanism: Uses Active Record's non-deterministic encryption to secure data at rest (deterministic encryption is not used in Lago)
Configuration:
hmac_key stored in the databasePurpose: Signs webhook payloads using symmetric cryptography
Usage:
Mechanism:
Base64.strict_encode64(OpenSSL::HMAC.digest("sha-256", hmac_key, payload))X-Lago-SignatureConfiguration:
SECRET_KEY_BASE - Master signing key used by RailsPurpose: Signs data for internal application security and secure client communications
Usage:
Mechanism:
Configuration:
RSA_PRIVATE_KEY - RSA private key (asymmetric cryptography)Purpose: Signs webhook payloads using JWT with asymmetric cryptography
Usage:
Mechanism:
{ data: webhook_payload, iss: LAGO_API_URL }X-Lago-SignatureWebhooks support two configurable signing algorithms (set per webhook_endpoint):
signature_algo: :hmac)hmac_keysignature_algo: :jwt)RSA_PRIVATE_KEYWhen a user is consuming some resources from the customer a usage event is sent to Lago:
[!NOTE] A detailed architecture diagram will be added to this section in a future update.
At least once a month a bill is issued to the users. The flow is as follow
[!NOTE] A detailed architecture diagram will be added to this section in a future update.
Customer: An individual or entity that operates within the application, typically representing an organization or team that manages billing, subscriptions, or other business operations. Customers interact with the system to configure, monitor, and manage their own users and related resources.
User: An external party or account that is billed or managed by a customer. Users are the end recipients of services, subscriptions, or usage tracked by the application, and are associated with billing events, invoices, and usage records generated by the customer's organization.