docs/sources/setup/migrate/ssd-to-ha-monolithic/_index.md
This guide provides instructions for migrating from a simple scalable deployment (SSD) to a highly available (HA) monolithic deployment of Loki. Before starting the migration, make sure you have read the planning section.
{{< admonition type="warning" >}} Simple Scalable Deployment (SSD) mode is being deprecated and will be removed with the Loki 4.0 release. You should plan to migrate from SSD to microservices or HA monolithic deployment. You will not be able to run Loki 4.0 in SSD mode. {{< /admonition >}}
{{< admonition type="note" >}} This guide assumes a Docker compose setup with NGINX as gateway. However, the migration process can be mirrored for other deployment methods (Helm, Tanka, etc.) as well. {{< /admonition >}}
Migrating from a simple scalable deployment to a HA monolithic deployment with zero downtime is possible but requires careful planning. The following considerations should be taken into account:
| HA Monolithic | |
|---|---|
| Durability | ✅ |
| High availability | ✅ |
| Separation of execution paths | ❌ |
| Operational complexity | 🟧 medium |
| Scalability | 🟧 medium |
Before beginning the actual migration you want to check your existing deployment.
Set ingester.wal.flush_on_shutdown: true to make sure that chunks are written and uploaded to object storage when ingesters shut down.
Set memberlist.cluster_label: <your-unique-cluster-name> and memberlist.cluster_label_verification_disabled: false in your existing deployment.
Set ingester.lifecycler.unregister_on_shutdown: true so that after final shutdown ingesters leave the ring immediately.
Restart your existing deployment to ensure configuration changes take effect.
In this stage, we will deploy the new component alongside the existing SSD components.
Since there is only a single type of Loki node in the HA monolithic deployment, all instances can be configured equally and started with the -target=all target.
To achive high availability you need at least three Loki instances and a replication factor (common.replication_factor) of 3. This ensures that you still have high availability - a quorum of two - during restarts where one instance is not available.
However, because you must run only a single main compactor (-compactor.horizontal-scaling-mode=main) at any given time, you must configure one Loki instance as main. This is done by overriding the default value worker with main in the configuration of that node using an environment variable or via the CLI argument. The common.compactor_grpc_address setting needs to point to this dedicated compactor node, which in our case is loki-1:9095.
To achieve a zero-downtime transition you need to configure the HA monolithic storage equal to the existing deployment. We strongly recommend using the Thanos object client (storage_config.use_thanos_objstore: true and the storage_config.object_store configuration block), because the legacy object store clients are deprecated.
Once all Loki instances are started and running you can check the ring page (/ring) for one of the instances to verify that all instances are registered and in an ACTIVE state.
The final stage of the migration is routing the traffic to the new downstream instances in the reverse proxy / load balancer and shutting down the old SSD components.
When using NGINX as reverse proxy you can use the upstream and proxy_pass directives to route the traffic.
http {
...
upstream loki {
server loki-1:3100;
...
server loki-n:3100;
}
upstream compactor {
server loki-1:3100;
}
server {
listen 3100;
...
location / {
proxy_pass http://loki;
...
}
location ~ ^/loki/api/v1/delete {
proxy_pass http://compactor;
...
}
}
}
The dedicated location for the compactor is needed to route delete requests correctly to the main compactor instance. This is only needed when deletes are enabled and you want to be able to access the API from outside.
Once the reverse proxy configuration has been updated and the service is restarted, the new components will receive the traffic for both writes (push) and reads (queries).
However, you also need to shut down (or at least flush) the write components of the old simple scalable deployment in order to query the data that was still held in memory.
Finally, all old write, read, and backend SSD targets can be terminated and cleaned up.