Failback an application
Once your unhealthy Kubernetes cluster is back up and running, the Portworx nodes in that cluster will not immediately rejoin the cluster. They will stay in
Out of Quorum
state until you explicitly Activate this cluster domain.
After this domain is marked as Active you can failback the applications if you want.
The following considerations are used in the examples on this page. Update them to the appropriate values for your environment:
- Source Cluster is the Kubernetes cluster which is down and where your applications were originally running. The cluster domain for this source cluster is
us-east-1a
. - Destination Cluster is the Kubernetes cluster where the applications will be failed over. The cluster domain for this destination cluster is
us-east-1b
.
Reactivate your source cluster domain
Follow these steps from your destination cluster to initiate a failback:
Run the following command to activate the source cluster:
storkctl activate clusterdomain us-east-1a
Cluster Domain activate operation started successfully for us-east-1a
Verify if the source cluster domain is activated:
storkctl get clusterdomainsstatus
NAME LOCAL-DOMAIN ACTIVE INACTIVE CREATED px-dr-cluster us-east-1a us-east-1a (InSync), us-east-1b (InSync) 29 Nov 22 22:09 UTC
Reverse sync your clusters
If the destination cluster has been running applications for some time, it is possible that the state of your application might differ from your source cluster. This is because new resources are created, or data in the stateful application has changed on your destination cluster. To ensure that you have the most updated applications on your source cluster before failing back your application, you must reverse sync your clusters using the reverse migration schedule that you created previously.
Activate your reverse migration schedule on your destination cluster:
storkctl resume migrationschedule reversemigrationschedule -n <migrationnamespace>
Verify if at least one migration cycle has been successfully completed:
storkctl get migration -n <migrationnamespace>
NAME CLUSTERPAIR STAGE STATUS VOLUMES RESOURCES CREATED ELAPSED TOTAL BYTES TRANSFERRED reversemigrationschedule-interval-2023-02-01-201747 <your-remote-clusterpair> Final Successful 0/0 4/4 01 Feb 23 20:17 UTC Volumes () Resources (21.71709746s) 0
Deactivate the reverse migration schedule:
storkctl suspend migrationschedule reversemigrationschedule -n <migrationnamespace>
Stop the application on the destination cluster
Stop the applications from running by changing the replica count of your deployments and statefulsets to 0:
kubectl scale --replicas 0 statefulset/<your-app-name> -n <migrationnamespace>
Start back the application on the source cluster
After you have stopped the applications on the destination cluster, start the applications on the source cluster by editing the replica count:
storkctl activate migration -n <migrationnamespace>
Verify if your application (for example, Zookeeper) pods are up and running:
kubectl get pods -n <migrationnamespace>
NAME READY STATUS RESTARTS AGE zk-0 1/1 Running 0 4m zk-1 1/1 Running 0 5m zk-1 1/1 Running 0 7m
Resume the migration schedule:
storkctl resume migrationschedule migrationschedule -n <migrationnamespace>
MigrationSchedule migrationschedule resumed successfully
Verify if the migration schedule is active:
storkctl get migrationschedule -n <migrationnamespace>
NAME POLICYNAME CLUSTERPAIR SUSPEND LAST-SUCCESS-TIME LAST-SUCCESS-DURATION migrationschedule <your-schedule-policy> <your-clusterpair-name> false 01 Dec 23 22:25 UTC 10s
The
false
value for theSUSPEND
field shows that the migration schedule for your policy is active on the source cluster. Hence, your application has successfully failed back to your source cluster.