Jahith's Tech Sharing: December 2022

Disaster recovery planning and business continuity planning are very important for any organization to come out of a disaster very quickly. Disaster recovery for on-premises environments requires a lot of effort and planning because it involves a lot of third-party services like transportation, network connectivity, etc., and various staff help to setup networking, systems, etc. Disaster recovery planning for the cloud will not require much effort, but we need to have strong planning to build redundancy to recover very quickly.

The resilience of the AWS cloud environment is a shared responsibility. AWS infrastructure is available across different AWS regions. Each region is a fully isolated geographical area; within each region, multiple isolated availability zones are available to handle failure. All AWS regions and availability zones are interconnected with high bandwidth. When we use AWS as a cloud, we have various options to manage the high availability of the system.

Within region high availability

Regions represent a separate geographic area, and availability zones are highly available data centres within each AWS region. Each availability zone has isolated power, cooling, networking, etc. AWS provides a built-in option for dealing with an availability zone outage. We have to configure our environment with multi-AZ redundancy so that if an entire availability zone goes down, AWS is able to failover workloads to another availability zone. Within a region, the high availability architecture option will ensure compliance by keeping data in the permitted region and ensuring high availability.

Cross region high availability

A multi-region disaster recovery strategy will be helpful to address the rare scenario of an AWS region being down due to a natural disaster or technical issue. Very highly sensitive applications are required to plan cross region replication options. When we plan this approach, we need to consider the AWS availability for each service. Most of the AWS services are committed to high availability. Cross region high availability can be achieved in different ways based on our budget and compliance needs. We need to choose the proper strategy.

Back up and restore
Pilot light
Warm Standby
Multi-site active/active

Backup and restore

This approach will help us to solve the data loss issue. This approach will have a high RPO and RTO rate. RPO will determine how frequently we schedule data backups. As the environment is not yet ready, building an environment using backed up data will take time, so our RTO is also very high.

Pilot light

This approach will replicate the data to another region and also set up a core infrastructure. Servers are switched off and will be used when needed for testing and recovery. This approach will reduce the RTO and RPO based on the backup schedule. This approach is cost effective in terms of recovery, but database corruption or any malware attack still require a backup.

Warm Standby

This method is similar to pilot light, but a scaled-down version of the environment is now operational. Disaster and recovery testing can be carried out anytime, so comparatively, this will improve the confidence of those who recover quickly. RTO will slightly improve when compared to pilot light, and RPO is based on the replication schedule.

Multi-site active/active

In this approach, both sites in different regions will be active and running. Requests will be distributed across regions by default. If any one of the regions is down, another region automatically picks up the request. This approach is the most costly. RPO and RTO will be reduced to near zero, but backup will be required if there is any data corruption or malware attack.

These strategies increase the possibilities of high availability in a disaster scenario. Each strategy addresses a subset of disasters but not all of them. Depending on the disaster, RPO and RTO will change.

Cross region and cross account high availability

For security or compliance reasons, many organizations require complete separation of environments and access between their primary and secondary regions. This helps mitigate the malicious threat to an organization that comes from people within the organization or any malware attack on our primary account. Having our backups or primary database routinely copied to the secondary account will help to recover the primary account.

The AWS backup feature can be used to backup data across accounts. AWS Backup is a fully managed service for centrally and automatically managing backups. Using this service, we can configure backup policies and monitor the activities of our AWS resources in one place.

Often, people are confused about high availability and disaster recovery because availability and disaster recovery will share some of the same common best practices, like monitoring issues, deploying to multiple locations, and automatic failover.

Availability is different from disaster recovery in terms of objective and focus. Availability is more concerned with ensuring high system and application availability in a given time frame. Application availability is calculated by dividing uptime by the total sum of uptime and downtime. Disaster recovery is concerned with the recovery of applications, environments, and people from large-scale disasters. To determine the goal of disaster recovery, the two most important parameters, Recover Point Objective (RPO) and Recovery Point Objective (RTO), are used.

Availability	Disaster Recover
High availability is about to eliminate single points of failure.	Disaster recovery is a process contains set of policies and procedures will trigger when loss of high availability
High availability helps us make sure our system is operational in identified failure scenarios.	Disaster recovery and business continuity planning address man-made disasters such as cyber attacks, terrorism attacks, human error, and natural disasters such as floods, hurricanes, earthquakes, and so on
Application availability is calculated by dividing uptime by the total sum of uptime and downtime.	Depending on the disaster, the following two main objectives will be defined: Recover point objective (RPO): amount of data loss due to disaster Recover Time Objective (RTO): Maximum amount of time required to recover an application
High availability system will be build with proper redundancy. In the cloud, multi-AZ and multi-region hosting will help to ensure high availability.	A proper disaster recovery strategy will help to recover quickly from a DDoS attack, pandemic-like situation like COVID.

Any product company needs to plan for high availability and disaster recovery for business continuity. High availability planning protects us from high probability events that occur on a regular basis. Instance and data storage failure due to software or hardware issues, a web server not responding due to an unexpected issue, a load-induced outage, and so on are all common occurrences; hosting an instance in more than one availability zone or region will help to resolve this issue.

The disaster recovery process will help to recover from major outages caused by human-made or natural disasters. Creating a backup of data storage in a different data centre and storing a month's worth of data will aid in data recovery in the event of data loss or data corruption, as well as protecting data loss from cyber-attack. Replicating the data storage and environment in a different location will help to recover quickly from complete environmental failure, but this will not be helpful for data loss or corruption issues.

Over all, High Availability and Disaster Recovery are aimed at the same problem: keeping systems up and running in an operational state, with the main difference being that HA is intended to handle problems while a system is running, while DR is intended to handle problems after a system fails.

Jahith's Tech Sharing

Wednesday, December 28, 2022

AWS Disaster Recovery

Tuesday, December 27, 2022

High Availability and Disaster Recover.