Refreshing pre-aggregations - Cube

Pre-aggregation refresh is the process of building pre-aggregations and updating them with new data. Pre-aggregation refresh is the responsibility of the refresh worker.

Configuration

You can use the following environment variables to configure the refresh worker behavior:

Pre-aggregation data source

By default, each data source builds and stores its pre-aggregations using its own connection. You can instead point a data source's pre-aggregations at a dedicated connection by adding a PRE_AGGREGATIONS segment to its environment variables. When set, that source's pre-aggregations are built on and read from the dedicated connection rather than the source's own.

Use the CUBEJS_PRE_AGGREGATIONS_DB_* variables for the default data source, and the CUBEJS_DS_<NAME>_PRE_AGGREGATIONS_DB_* variables for a named data source:

dotenv

# Default data source
CUBEJS_DB_TYPE=postgres
CUBEJS_DB_HOST=localhost

# Dedicated pre-aggregation data source for the default data source
CUBEJS_PRE_AGGREGATIONS_DB_TYPE=postgres
CUBEJS_PRE_AGGREGATIONS_DB_HOST=preagg-host

# A named data source and its dedicated pre-aggregation data source
CUBEJS_DATASOURCES=default,analytics
CUBEJS_DS_ANALYTICS_DB_TYPE=postgres
CUBEJS_DS_ANALYTICS_DB_HOST=remotehost
CUBEJS_DS_ANALYTICS_PRE_AGGREGATIONS_DB_TYPE=postgres
CUBEJS_DS_ANALYTICS_PRE_AGGREGATIONS_DB_HOST=analytics-preagg-host

The PRE_AGGREGATIONS variant supports the same connection variables as the regular CUBEJS_DB_* / CUBEJS_DS_<NAME>_DB_* data source variables (for example _DB_TYPE, _DB_HOST, _DB_PORT, _DB_USER, _DB_PASS, and _DB_SSL).

Troubleshooting

`Refresh scheduler interval error`

Sometimes, you might come across the following error:

json

{
  "message": "Refresh Scheduler Interval Error",
  "error": "Previous interval #2 was not finished with 60000 interval"
}

It indicates that your refresh worker is overloaded. You probably have a lot of tenants, a lot of pre-aggregations to refresh, or both.

If you're using multitenancy, you'd need to deploy several Cube clusters (each one per a reduced set of tenants) so there will be multiple refresh workers which will work only on a subset of your tenants.

If you're using Cube Cloud, you can use a Multi-cluster deployment that would automatically do this for you.