Disaster recovery planning and business continuity planning are very important for any organization to come out of a disaster very quickly. Disaster recovery for on-premises environments requires a lot of effort and planning because it involves a lot of third-party services like transportation, network connectivity, etc., and various staff help to setup networking, systems, etc. Disaster recovery planning for the cloud will not require much effort, but we need to have strong planning to build redundancy to recover very quickly.
The resilience of the AWS cloud environment
is a shared responsibility. AWS infrastructure is available across
different AWS regions. Each region is a fully isolated geographical area; within
each region, multiple isolated availability zones are available to handle
failure. All AWS regions and availability zones are interconnected with high
bandwidth. When we use AWS as a cloud, we have various options to manage the
high availability of the system.
Within region high availability
Regions represent a separate geographic area, and availability zones are highly available data centres within each AWS region. Each availability zone has isolated power, cooling, networking, etc. AWS provides a built-in option for dealing with an availability zone outage. We have to configure our environment with multi-AZ redundancy so that if an entire availability zone goes down, AWS is able to failover workloads to another availability zone. Within a region, the high availability architecture option will ensure compliance by keeping data in the permitted region and ensuring high availability.
Cross region high availability
A multi-region disaster recovery strategy will be helpful to address the rare scenario of an AWS region being down due to a natural disaster or technical issue. Very highly sensitive applications are required to plan cross region replication options. When we plan this approach, we need to consider the AWS availability for each service. Most of the AWS services are committed to high availability. Cross region high availability can be achieved in different ways based on our budget and compliance needs. We need to choose the proper strategy.
- Back up and restore
- Pilot light
- Warm Standby
- Multi-site active/active
Backup and restore
This approach will help us to solve the
data loss issue. This approach will have a high RPO and RTO rate. RPO will determine
how frequently we schedule data backups. As the environment is not yet ready,
building an environment using backed up data will take time, so our RTO is also
very high.
Pilot light
This approach will replicate the data to
another region and also set up a core infrastructure. Servers are switched off
and will be used when needed for testing and recovery. This approach will reduce
the RTO and RPO based on the backup schedule. This approach is cost effective
in terms of recovery, but database corruption or any malware attack still
require a backup.
Warm Standby
This method is similar to pilot light, but a scaled-down version of the environment is now operational. Disaster and recovery testing can be carried out anytime, so comparatively, this will improve the confidence of those who recover quickly. RTO will slightly improve when compared to pilot light, and RPO is based on the replication schedule.
Multi-site active/active
In this approach, both sites in different regions will be active and running. Requests will be distributed across regions by default. If any one of the regions is down, another region automatically picks up the request. This approach is the most costly. RPO and RTO will be reduced to near zero, but backup will be required if there is any data corruption or malware attack.
These strategies increase the possibilities
of high availability in a disaster scenario. Each strategy addresses a subset
of disasters but not all of them. Depending on the disaster, RPO and RTO will
change.
Cross region and cross account high availability
For security or compliance reasons, many
organizations require complete separation of environments and access between
their primary and secondary regions. This helps mitigate the malicious threat
to an organization that comes from people within the organization or any
malware attack on our primary account. Having our backups or primary database
routinely copied to the secondary account will help to recover the primary
account.
The AWS backup feature can be used to
backup data across accounts. AWS Backup is a fully managed service for
centrally and automatically managing backups. Using this service, we can
configure backup policies and monitor the activities of our AWS resources in
one place.