In Azure SQL PaaS (Database and Managed Instance), failover is a disaster recovery and business continuity mechanism managed through Failover Groups.
There are two primary types of failover: Failover (Planned) and Forced Failover (Unplanned).
1. Failover (Planned Failover)
This is a “graceful” role reversal where the primary and secondary databases switch roles without data loss.
Data Integrity: Performs full data synchronization between the primary and secondary databases before the role switch, ensuring zero data loss.
Prerequisite: The primary database must be accessible and online to initiate this operation.
Common Use Cases:
- Performing Disaster Recovery (DR) drills in production.
- Relocating workloads to a different region for maintenance or proximity.
- Failback: Returning the workload to the original primary region after an outage is resolved.
Behavior: Applications may experience a brief disconnection (typically up to 60 seconds) while DNS entries update and connections are rerouted to the new primary
2. Forced Failover (Unplanned Failover)
This is an emergency recovery method used when the primary database is unavailable or the region has suffered a catastrophic failure
Data Integrity: Immediately promotes the secondary to the primary role without waiting for asynchronous data propagation. This can result in potential data loss.
Requirement: Used when the primary is inaccessible and the business cannot wait for recovery.
Post-Failover State: Once the outage is mitigated, the old primary automatically reconnects as the new secondary. However, it might enter an “inconsistent” or “split-brain” state if both replicas think they are primary, requiring manual resolution.
Microsoft Managed Policy: If configured, Microsoft can trigger a forced failover automatically after a defined Grace Period (minimum 1 hour) if a regional outage occurs.
Summary Comparison Table
| Feature | Failover (Planned) | Forced Failover (Unplanned) |
|---|---|---|
| Data Loss | None (Zero data loss) | Potential data loss |
| Primary Status | Must be online/accessible | Usually offline/inaccessible |
| Main Objective | DR drills, maintenance, failback | Emergency disaster recovery |
| Data Sync | Full synchronization before switch | No synchronization before switch |
| Initiation | Always manual (Customer-initiated) | Manual OR Automatic (Microsoft Managed) |
Key Management Features
- Listeners: Failover groups provide two DNS endpoints—one for read-write and one for read-only traffic—that automatically update their routing after any failover.
- Failover Policies:
- Customer-managed: You decide when to trigger a failover.
- Microsoft-managed: Azure automatically triggers forced failover after a set
GracePeriodWithDataLossHours
Testing: You can manually trigger a failover via the Azure Portal, PowerShell (Invoke-AzSqlDatabaseFailover), or Azure CLI to test application resiliency
Happy leaning..!