High Availability and Disaster Recovery

Understanding recovery time and recovery point objectives are crucial to your high availability and disaster recovery (HADR) plan as they're the foundation for any availability solution. ## Terms **Recovery Time Objective (RTO):** - **Definition:** Maximum time to bring resources online after an outage. - **Impact:** Exceeding RTO can lead to financial penalties and operational disruptions. - **Component-Level RTO:** Defined for individual components and overall solution. **Recovery Point Objective (RPO):** - **Definition:** Maximum acceptable data loss measured in time, as defined by business requirements - **Example:** If RPO is 15 minutes, data can be restored to within 15 minutes of the outage. **Defining RTO and RPO:** - **Business Requirements:** Driven by business needs and technological capabilities. - **Cost of Downtime:** Helps in defining realistic RTO and RPO. - **Component Dependencies:** Overall RTO is determined by the slowest component to recover. **High Availability (HA) vs. Disaster Recovery (DR):** - **HA:** Localized events with minimal downtime, often measured in minutes. - **DR:** Larger scale events requiring more time, often measured in hours or longer. - **Separate RTOs and RPOs:** Defined for both HA and DR scenarios. ## HADR for IaaS Deployments **SQL Server HADR Features**: - **[[Failover Cluster Instances]] (FCI)**: Provides instance-level protection with a unique name and IP address, requiring shared storage and Active Directory Domain Services. - **[[Always On Availability Group]] (AG)**: Offers database-level protection with primary and secondary replicas, supporting synchronous or asynchronous data movement. - **[[Log Shipping]]**: A simple method for disaster recovery based on backup, copy, and restore, providing database-level protection. ## HADR for PaaS Deployments - **[[Azure SQL Database]]**: Features active geo-replication and autofailover groups. - **Azure Database for MySQL**: Guarantees 99.99% availability with built-in failover mechanisms. - **Azure Database for PostgreSQL**: Offers a scale-out hyperscale solution called Citus, with standby replicas for high availability. ## Examples ### IaaS High Availability Solutions 1. **[[Always On Availability Group]] (AG) in a Single Region**: ![module-77-high-availability-final-02.png (576×362)](https://learn.microsoft.com/en-us/training/wwl-data-ai/describe-high-availability-disaster-recovery-strategies/media/module-77-high-availability-final-02.png) - **Description**: Protects data by maintaining multiple copies on different virtual machines (VMs). - **Benefits**: Meets recovery time objective (RTO) and recovery point objective (RPO) with minimal-to-no data loss, provides easy access to primary and secondary replicas, and enhances availability during patching scenarios. 2. **Always On [[Failover Cluster Instances]] (FCI) in a Single Region**: ![module-77-high-availability-final-03.png (348×378)](https://learn.microsoft.com/en-us/training/wwl-data-ai/describe-high-availability-disaster-recovery-strategies/media/module-77-high-availability-final-03.png) - **Description**: Traditional method for SQL Server high availability, designed for physical deployments but still applicable in virtualized environments. - **Benefits**: Popular solution, improved shared storage with Azure Shared Disk, meets most RTO and RPO for high availability, and enhances availability during patching scenarios. ### Disaster Recovery Solutions 1. **Multi-Region or Hybrid Always On Availability Group**: ![module-77-high-availability-final-04.png (723×281)](https://learn.microsoft.com/en-us/training/wwl-data-ai/describe-high-availability-disaster-recovery-strategies/media/module-77-high-availability-final-04.png) - **Description**: Configures AG across multiple Azure regions or as a hybrid architecture, with all nodes participating in the same Windows Server Failover Cluster (WSFC). - **Benefits**: Proven solution, works with both Standard and Enterprise editions of SQL Server, provides redundancy with additional data copies, and offers both high availability and disaster recovery. 2. **Distributed Availability Group**: ![module-77-high-availability-final-05.png (723×281)](https://learn.microsoft.com/en-us/training/wwl-data-ai/describe-high-availability-disaster-recovery-strategies/media/module-77-high-availability-final-05.png) - **Description**: Enterprise Edition feature introduced in SQL Server 2016, consisting of multiple AGs with a global primary and a forwarder. - **Benefits**: Separates WSFC as a single point of failure, easier quorum management, and supports failover between locations. 3. **Log Shipping**: ![module-77-high-availability-final-06.png (431×278)](https://learn.microsoft.com/en-us/training/wwl-data-ai/describe-high-availability-disaster-recovery-strategies/media/module-77-high-availability-final-06.png) - **Description**: One of the oldest HADR methods, based on transaction log backups. - **Benefits**: Tried-and-true feature, easy to deploy and administer, tolerant of less robust networks, meets most RTO and RPO goals for disaster recovery, and protects FCIs. 4. **Azure Site Recovery**: ![module-77-high-availability-final-07.png (1054×712)](https://learn.microsoft.com/en-us/training/wwl-data-ai/describe-high-availability-disaster-recovery-strategies/media/module-77-high-availability-final-07.png) - **Description**: Non-SQL Server-based disaster recovery solution provided as part of the Azure platform. - **Benefits**: Works with more than just SQL Server, may meet RTO and possibly RPO, and integrates with the Azure platform. ### PaaS Deployment Options 1. **Azure SQL Database**: - **Features**: Active geo-replication and autofailover groups. - **Benefits**: High availability and disaster recovery capabilities built into the service. 2. **Azure Database for MySQL**: - **Features**: Guarantees 99.99% availability with built-in failover mechanisms. - **Benefits**: Reliable high availability solution for MySQL databases. 3. **Azure Database for PostgreSQL**: - **Features**: Scale-out hyperscale solution called Citus, with standby replicas for high availability. - **Benefits**: Scalable and highly available PostgreSQL deployment. ### Best Practices - **Coding Retry Logic**: Emphasizes the importance of implementing retry logic in applications to handle transient failures and dropped connections. ## Resources - [Describe recovery time objective and recovery point objective - Training | Microsoft Learn](https://learn.microsoft.com/en-us/training/modules/describe-high-availability-disaster-recovery-strategies/2-describe-recovery-time-objective-recovery-point-objective) - [Explore high availability and disaster recovery options - Training | Microsoft Learn](https://learn.microsoft.com/en-us/training/modules/describe-high-availability-disaster-recovery-strategies/3-explore-high-availability-disaster-recovery-options) - [Describe high availability and disaster recovery options for PaaS deployments - Training | Microsoft Learn](https://learn.microsoft.com/en-us/training/modules/describe-high-availability-disaster-recovery-strategies/5-describe-high-availability-disaster-recovery-options)