Understanding recovery time and recovery point objectives are crucial to your high availability and disaster recovery (HADR) plan as they're the foundation for any availability solution. ## Terms **Recovery Time Objective (RTO):** - **Definition:** Maximum time to bring resources online after an outage. - **Impact:** Exceeding RTO can lead to financial penalties and operational disruptions. - **Component-Level RTO:** Defined for individual components and overall solution. **Recovery Point Objective (RPO):** - **Definition:** Maximum acceptable data loss measured in time, as defined by business requirements - **Example:** If RPO is 15 minutes, data can be restored to within 15 minutes of the outage. **Defining RTO and RPO:** - **Business Requirements:** Driven by business needs and technological capabilities. - **Cost of Downtime:** Helps in defining realistic RTO and RPO. - **Component Dependencies:** Overall RTO is determined by the slowest component to recover. **High Availability (HA) vs. Disaster Recovery (DR):** - **HA:** Localized events with minimal downtime, often measured in minutes. - **DR:** Larger scale events requiring more time, often measured in hours or longer. - **Separate RTOs and RPOs:** Defined for both HA and DR scenarios. ## HADR for IaaS Deployments **SQL Server HADR Features**: - **[[Failover Cluster Instances]] (FCI)**: Provides instance-level protection with a unique name and IP address, requiring shared storage and Active Directory Domain Services. - **[[Always On Availability Group]] (AG)**: Offers database-level protection with primary and secondary replicas, supporting synchronous or asynchronous data movement. - **[[Log Shipping]]**: A simple method for disaster recovery based on backup, copy, and restore, providing database-level protection. ## HADR for PaaS Deployments - **[[Azure SQL Database]]**: Features active geo-replication and autofailover groups. - **Azure Database for MySQL**: Guarantees 99.99% availability with built-in failover mechanisms. - **Azure Database for PostgreSQL**: Offers a scale-out hyperscale solution called Citus, with standby replicas for high availability. ## Examples ### IaaS High Availability Solutions 1. **[[Always On Availability Group]] (AG) in a Single Region**: ![module-77-high-availability-final-02.png (576×362)](https://learn.microsoft.com/en-us/training/wwl-data-ai/describe-high-availability-disaster-recovery-strategies/media/module-77-high-availability-final-02.png) - **Description**: Protects data by maintaining multiple copies on different virtual machines (VMs). - **Benefits**: Meets recovery time objective (RTO) and recovery point objective (RPO) with minimal-to-no data loss, provides easy access to primary and secondary replicas, and enhances availability during patching scenarios. 2. **Always On [[Failover Cluster Instances]] (FCI) in a Single Region**: ![module-77-high-availability-final-03.png (348×378)](https://learn.microsoft.com/en-us/training/wwl-data-ai/describe-high-availability-disaster-recovery-strategies/media/module-77-high-availability-final-03.png) - **Description**: Traditional method for SQL Server high availability, designed for physical deployments but still applicable in virtualized environments. - **Benefits**: Popular solution, improved shared storage with Azure Shared Disk, meets most RTO and RPO for high availability, and enhances availability during patching scenarios. ### Disaster Recovery Solutions 1. **Multi-Region or Hybrid Always On Availability Group**: ![module-77-high-availability-final-04.png (723×281)](https://learn.microsoft.com/en-us/training/wwl-data-ai/describe-high-availability-disaster-recovery-strategies/media/module-77-high-availability-final-04.png) - **Description**: Configures AG across multiple Azure regions or as a hybrid architecture, with all nodes participating in the same Windows Server Failover Cluster (WSFC). - **Benefits**: Proven solution, works with both Standard and Enterprise editions of SQL Server, provides redundancy with additional data copies, and offers both high availability and disaster recovery. 2. **Distributed Availability Group**: ![module-77-high-availability-final-05.png (723×281)](https://learn.microsoft.com/en-us/training/wwl-data-ai/describe-high-availability-disaster-recovery-strategies/media/module-77-high-availability-final-05.png) - **Description**: Enterprise Edition feature introduced in SQL Server 2016, consisting of multiple AGs with a global primary and a forwarder. - **Benefits**: Separates WSFC as a single point of failure, easier quorum management, and supports failover between locations. 3. **Log Shipping**: ![module-77-high-availability-final-06.png (431×278)](https://learn.microsoft.com/en-us/training/wwl-data-ai/describe-high-availability-disaster-recovery-strategies/media/module-77-high-availability-final-06.png) - **Description**: One of the oldest HADR methods, based on transaction log backups. - **Benefits**: Tried-and-true feature, easy to deploy and administer, tolerant of less robust networks, meets most RTO and RPO goals for disaster recovery, and protects FCIs. 4. **Azure Site Recovery**: ![module-77-high-availability-final-07.png (1054×712)](https://learn.microsoft.com/en-us/training/wwl-data-ai/describe-high-availability-disaster-recovery-strategies/media/module-77-high-availability-final-07.png) - **Description**: Non-SQL Server-based disaster recovery solution provided as part of the Azure platform. - **Benefits**: Works with more than just SQL Server, may meet RTO and possibly RPO, and integrates with the Azure platform. ### PaaS Deployment Options 1. **Azure SQL Database**: - **Features**: Active geo-replication and autofailover groups. - **Benefits**: High availability and disaster recovery capabilities built into the service. 2. **Azure Database for MySQL**: - **Features**: Guarantees 99.99% availability with built-in failover mechanisms. - **Benefits**: Reliable high availability solution for MySQL databases. 3. **Azure Database for PostgreSQL**: - **Features**: Scale-out hyperscale solution called Citus, with standby replicas for high availability. - **Benefits**: Scalable and highly available PostgreSQL deployment. ### Best Practices - **Coding Retry Logic**: Emphasizes the importance of implementing retry logic in applications to handle transient failures and dropped connections. ## Resources - [Describe recovery time objective and recovery point objective - Training | Microsoft Learn](https://learn.microsoft.com/en-us/training/modules/describe-high-availability-disaster-recovery-strategies/2-describe-recovery-time-objective-recovery-point-objective) - [Explore high availability and disaster recovery options - Training | Microsoft Learn](https://learn.microsoft.com/en-us/training/modules/describe-high-availability-disaster-recovery-strategies/3-explore-high-availability-disaster-recovery-options) - [Describe high availability and disaster recovery options for PaaS deployments - Training | Microsoft Learn](https://learn.microsoft.com/en-us/training/modules/describe-high-availability-disaster-recovery-strategies/5-describe-high-availability-disaster-recovery-options)