Documentation/Storage-Configuration/Block-Storage-RBD/rbd-async-disaster-recovery-failover-failback.md
Rook comes with the volume replication support, which allows users to perform disaster recovery and planned migration of clusters.
The following document will help to track the procedure for failover and failback in case of a Disaster recovery or Planned migration use cases.
!!! note The document assumes that RBD Mirroring is set up between the peer clusters. For information on rbd mirroring and how to set it up using rook, please refer to the rbd-mirroring guide.
!!! info Use cases: Datacenter maintenance, technology refresh, disaster avoidance, etc.
The Relocation operation is the process of switching production to a backup facility(normally your recovery site) or vice versa. For relocation, access to the image on the primary site should be stopped. The image should now be made primary on the secondary cluster so that the access can be resumed there.
!!! note :memo: Periodic or one-time backup of the application should be available for restore on the secondary site (cluster-2).
Follow the below steps for planned migration of workload from the primary cluster to the secondary cluster:
replicationState to secondary at the Primary Site.
When the operator sees this change, it will pass the information down to the
driver via GRPC request to mark the dataSource as secondary.claimRef section in the PV objects. (See this for details)replicationState should be primary for all the PVC’s on the secondary site.primary on the secondary site.primary, the PVC is now ready
to be used. Now, we can scale up the applications to use the PVC.!!! warning :memo: In Async Disaster recovery use case, we don't get the complete data. We will only get the crash-consistent data based on the snapshot interval time.
!!! info Use cases: Natural disasters, Power failures, System failures, and crashes, etc.
!!! note To effectively resume operations after a failover/relocation, backup of the kubernetes artifacts like deployment, PVC, PV, etc need to be created beforehand by the admin; so that the application can be restored on the peer cluster. For more information, see backup and restore.
In case of Disaster recovery, create VolumeReplication CR at the Secondary Site.
Since the connection to the Primary Site is lost, the operator automatically
sends a GRPC request down to the driver to forcefully mark the dataSource as primary
on the Secondary Site.
primary on the secondary site.primary, the PVC is now ready to be used. Now,
we can scale up the applications to use the PVC.Once the failed cluster is recovered on the primary site and you want to failback from secondary site, follow the below steps:
primary to secondary on the primary site.primary to
secondary in secondary site.secondary to primary in primary site.