diff --git a/docs/disaster-recovery.md b/docs/disaster-recovery.md new file mode 100644 index 000000000..f27711b5a --- /dev/null +++ b/docs/disaster-recovery.md @@ -0,0 +1,409 @@ +# Failover and Failback In Disaster Recovery + +[RBD mirroring](https://docs.ceph.com/en/latest/rbd/rbd-mirroring/) + is an asynchronous replication of RBD images between multiple Ceph clusters. + This capability is available in two modes: + +* Journal-based: Every write to the RBD image is first recorded + to the associated journal before modifying the actual image. + The remote cluster will read from this associated journal and + replay the updates to its local image. +* Snapshot-based: This mode uses periodically scheduled or + manually created RBD image mirror-snapshots to replicate + crash-consistent RBD images between clusters. + +This documentation assumes that `rbd mirroring` is set up between + two clusters. +For more information on how to set up rbd mirroring, refer to + [ceph documentation](https://docs.ceph.com/en/latest/rbd/rbd-mirroring/). + +## Deploy the Volume Replication CRD + +Volume Replication Operator is a kubernetes operator that provides common + and reusable APIs for storage disaster recovery. + It is based on [csi-addons/spec](https://github.com/csi-addons/spec) + specification and can be used by any storage provider. + +Volume Replication Operator follows controller pattern and provides + extended APIs for storage disaster recovery. + The extended APIs are provided via Custom Resource Definition (CRD). + +>:bulb: For more information, please refer to the +> [volume-replication-operator](https://github.com/csi-addons/volume-replication-operator). + +* Deploy the `VolumeReplicationClass` CRD + +```bash + kubectl create -f https://raw.githubusercontent.com/csi-addons/volume-replication-operator/release-v0.1/config/crd/bases/replication.storage.openshift.io_volumereplicationclasses.yaml + + customresourcedefinition.apiextensions.k8s.io/volumereplicationclasses.replication.storage.openshift.io created + +``` + +* Deploy the `VolumeReplication` CRD + +```bash + kubectl create -f https://raw.githubusercontent.com/csi-addons/volume-replication-operator/release-v0.1/config/crd/bases/replication.storage.openshift.io_volumereplications.yaml + + customresourcedefinition.apiextensions.k8s.io/volumereplications.replication.storage.openshift.io created created + ``` + +The VolumeReplicationClass and VolumeReplication CRDs are now created. + +>:bulb: **Note:** Use the latest available release for Volume Replication Operator. +> See [releases](https://github.com/csi-addons/volume-replication-operator/branches) +> for more information. + +### Add RBAC rules for Volume Replication Operator + +Add the below mentioned rules to `rbd-external-provisioner-runner` + ClusterRole in [csi-provisioner-rbac.yaml](https://github.com/ceph/ceph-csi/blob/release-v3.3/deploy/rbd/kubernetes/csi-provisioner-rbac.yaml) + +```yaml + - apiGroups: ["replication.storage.openshift.io"] + resources: ["volumereplications", "volumereplicationclasses"] + verbs: ["create", "delete", "get", "list", "patch", "update", "watch"] + - apiGroups: ["replication.storage.openshift.io"] + resources: ["volumereplications/finalizers"] + verbs: ["update"] + - apiGroups: ["replication.storage.openshift.io"] + resources: ["volumereplications/status"] + verbs: ["get", "patch", "update"] + - apiGroups: ["replication.storage.openshift.io"] + resources: ["volumereplicationclasses/status"] + verbs: ["get"] +``` + +### Deploy the Volume Replication Sidecar + +To deploy `volume-replication` sidecar container in `csi-rbdplugin-provisioner` + pod, add the following yaml to + [csi-rbdplugin-provisioner deployment](https://github.com/ceph/ceph-csi/blob/release-v3.3/deploy/rbd/kubernetes/csi-rbdplugin-provisioner.yaml). + +```yaml + - name: volume-replication + image: quay.io/csiaddons/volumereplication-operator:v0.1.0 + args : + - "--metrics-bind-address=0" + - "--leader-election-namespace=$(NAMESPACE)" + - "--driver-name=rbd.csi.ceph.com" + - "--csi-address=$(ADDRESS)" + - "--rpc-timeout=150s" + - "--health-probe-bind-address=:9998" + - "--leader-elect=true" + env: + - name: ADDRESS + value: unix:///csi/csi-provisioner.sock + - name: NAMESPACE + valueFrom: + fieldRef: + fieldPath: metadata.namespace + imagePullPolicy: "IfNotPresent" + volumeMounts: + - name: socket-dir + mountPath: /csi +``` + +## VolumeReplicationClass and VolumeReplication + +### VolumeReplicationClass + +*VolumeReplicationClass* is a cluster scoped resource that contains + driver related configuration parameters. It holds the storage admin + information required for the volume replication operator. + +### VolumeReplication + +*VolumeReplication* is a namespaced resource that contains references + to storage object to be replicated and VolumeReplicationClass + corresponding to the driver providing replication. + +>:bulb: For more information, please refer to the +> [volume-replication-operator](https://github.com/csi-addons/volume-replication-operator). + +Let's say we have a *PVC* (rbd-pvc) in BOUND state; created using + *StorageClass* with `Retain` reclaimPolicy. + +```bash +kubectl get pvc --context=cluster-1 + + NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE + rbd-pvc Bound pvc-65dc0aac-5e15-4474-90f4-7a3532c621ec 1Gi RWO csi-rbd-sc 44s + ``` + +* Create Volume Replication Class on cluster-1 + + ```yaml + $cat <:bulb: **Note:** The `schedulingInterval` can be specified in formats of +> minutes, hours or days using suffix `m`,`h` and `d` respectively. +> The optional schedulingStartTime can be specified using the ISO 8601 +> time format. + +* Once VolumeReplicationClass is created,create a Volume Replication for + the PVC which we intend to replicate to secondary cluster. + + ```yaml + $cat <:memo: *VolumeReplication* is a namespace scoped object. Thus, +> it should be created in the same namespace as of PVC. + +`replicationState` is the state of the volume being referenced. + Possible values are primary, secondary, and resync. + +* `primary` denotes that the volume is primary. +* `secondary` denotes that the volume is secondary. +* `resync` denotes that the volume needs to be resynced. + +To check VolumeReplication CR status: + +```yaml +kubectl get volumereplication pvc-volumereplication --context=cluster-1 -oyaml + +... +spec: + dataSource: + apiGroup: "" + kind: PersistentVolumeClaim + name: rbd-pvc + replicationState: primary + volumeReplicationClass: rbd-volumereplicationclass +status: + conditions: + - lastTransitionTime: "2021-05-04T07:39:00Z" + message: "" + observedGeneration: 1 + reason: Promoted + status: "True" + type: Completed + - lastTransitionTime: "2021-05-04T07:39:00Z" + message: "" + observedGeneration: 1 + reason: Healthy + status: "False" + type: Degraded + - lastTransitionTime: "2021-05-04T07:39:00Z" + message: "" + observedGeneration: 1 + reason: NotResyncing + status: "False" + type: Resyncing + lastCompletionTime: "2021-05-04T07:39:00Z" + lastStartTime: "2021-05-04T07:38:59Z" + message: volume is marked primary + observedGeneration: 1 + state: Primary +``` + +* Take a backup of PVC and PV object on primary cluster(cluster-1) + + * Take backup of the PVC `rbd-pvc` + + ```bash + kubectl get pvc rbd-pvc -oyaml >pvc-backup.yaml + ``` + + * Take a backup of the PV, corresponding to the PVC + + ```bash + kubectl get pv/pvc-65dc0aac-5e15-4474-90f4-7a3532c621ec -oyaml >pv_backup.yaml + ``` + +>:bulb: We can also take backup using external tools like **Velero**. +> Refer [velero documentation]((https://velero.io/docs/main/)) for more information. + +* Restoring on the secondary cluster(cluster-2) + + * Create storageclass on the secondary cluster + + ```bash + kubectl create -f examples/rbd/storageclass.yaml --context=cluster-2 + + storageclass.storage.k8s.io/csi-rbd-sc created + ``` + + * Create VolumeReplicationClass on the secondary cluster + + ```bash + cat < Use cases: Datacenter maintenance, Technology refresh, Disaster avoidance, etc. + +### Failover + +The failover operation is the process of switching production to a + backup facility (normally your recovery site). In the case of Failover, + access to the image on the primary site should be stopped. +The image should now be made *primary* on the secondary cluster so that + the access can be resumed there. + +:memo: As mentioned in the pre-requisites, periodic or one time backup of + the application should be available for restore on the secondary site (cluster-b). + +Follow the below steps for planned migration of workload from primary + cluster to secondary cluster: + +* Scale down all the application pods which are using the + mirrored PVC on the Primary Cluster +* Take a back up of PVC and PV object from the primary cluster. + This can be done using some backup tools like + [velero](https://velero.io/docs/main/). +* Update `replicationState` to `secondary` in VolumeReplication CR at Primary Site. + When the operator sees this change, it will pass the information down to the + driver via GRPC request to mark the dataSource as `secondary`. +* If you are manually recreating the PVC and PV on the secondary cluster, + remove the `claimRef` section in the PV objects. +* Recreate the storageclass, PVC, and PV objects on the secondary site. +* As you are creating the static binding between PVC and PV, a new PV won’t + be created here, the PVC will get bind to the existing PV. +* Create the VolumeReplicationClass on the secondary site. +* Create the VolumeReplications for all the PVC’s for which mirroring + is enabled + * `replicationState` should be `primary` for all the PVC’s on + the secondary site. +* Check whether the image is marked `primary` on the secondary site + by verifying it in VolumeReplication CR status. +* Once the Image is marked as `primary`, the PVC is now ready + to be used. Now, we can scale up the applications to use the PVC. + +>:memo: **WARNING**: In Async Disaster recovery use case, we don't +> get the complete data. We will only get the crash-consistent data +> based on the snapshot interval time. + +### Failback + +To perform a failback operation to primary cluster in case of planned migration +, just repeat the Failback steps in vice-versa. + +>:memo: **Remember**: We can skip the backup-restore operations +> in case of failback if the required yamls are already present on +> the primary cluster. Any new PVCs will still need to be restored on the +> primary site. + +## Disaster Recovery + +> Use cases: Natural disasters, Power failures, System failures, and crashes, etc. + +### Failover (abrupt shutdown) + +In case of Disaster recovery, create VolumeReplication CR at the Secondary Site. + Since the connection to the Primary Site is lost, the operator automatically + sends a GRPC request down to the driver to forcefully mark the dataSource as `primary`. + +* If you are manually creating the PVC and PV on the secondary cluster, remove + the claimRef section in the PV objects. +* Create the storageclass, PVC, and PV objects on the secondary site. +* As you are creating the static binding between PVC and PV, a new PV won’t be + created here, the PVC will get bind to the existing PV. +* Create the VolumeReplicationClass and VolumeReplication CR on the secondary site. +* Check whether the image is `primary` on secondary site, by verifying in + the VolumeReplication CR status. +* Once the Image is marked as `primary`, the PVC is now ready to be used. Now, + we can scale up the applications to use the PVC. + +### Failback (post-disaster recovery) + +Once the failed cluster is recovered on the primary site and you want to failback + from secondary site, follow the below steps: + +* Update the VolumeReplication CR replicationState + from `primary` to `secondary` on the primary site. +* Scale down the applications on the secondary site. +* Update the VolumeReplication CR replicationState from `primary` to + `secondary` in secondary site. +* On the primary site, verify that the VolumeReplication status is marked as + volume ready to use +* Once the volume is marked to ready to use, change the replicationState state + from `secondary` to `primary` in primary site. +* Scale up the applications again on the primary site.