Failover and Forced Failover- Azure SQL PaaS

In Azure SQL PaaS (Database and Managed Instance), failover is a disaster recovery and business continuity mechanism managed through Failover Groups.

There are two primary types of failover: Failover (Planned) and Forced Failover (Unplanned).

1. Failover (Planned Failover)

This is a “graceful” role reversal where the primary and secondary databases switch roles without data loss.

Data Integrity: Performs full data synchronization between the primary and secondary databases before the role switch, ensuring zero data loss.

Prerequisite: The primary database must be accessible and online to initiate this operation.

Common Use Cases:

Performing Disaster Recovery (DR) drills in production.
Relocating workloads to a different region for maintenance or proximity.
Failback: Returning the workload to the original primary region after an outage is resolved.

Behavior: Applications may experience a brief disconnection (typically up to 60 seconds) while DNS entries update and connections are rerouted to the new primary

2. Forced Failover (Unplanned Failover)

This is an emergency recovery method used when the primary database is unavailable or the region has suffered a catastrophic failure

Data Integrity: Immediately promotes the secondary to the primary role without waiting for asynchronous data propagation. This can result in potential data loss.
Requirement: Used when the primary is inaccessible and the business cannot wait for recovery.
Post-Failover State: Once the outage is mitigated, the old primary automatically reconnects as the new secondary. However, it might enter an “inconsistent” or “split-brain” state if both replicas think they are primary, requiring manual resolution.
Microsoft Managed Policy: If configured, Microsoft can trigger a forced failover automatically after a defined Grace Period (minimum 1 hour) if a regional outage occurs.

Summary Comparison Table

Feature	Failover (Planned)	Forced Failover (Unplanned)
Data Loss	None (Zero data loss)	Potential data loss
Primary Status	Must be online/accessible	Usually offline/inaccessible
Main Objective	DR drills, maintenance, failback	Emergency disaster recovery
Data Sync	Full synchronization before switch	No synchronization before switch
Initiation	Always manual (Customer-initiated)	Manual OR Automatic (Microsoft Managed)

Key Management Features

Listeners: Failover groups provide two DNS endpoints—one for read-write and one for read-only traffic—that automatically update their routing after any failover.
Failover Policies:
- Customer-managed: You decide when to trigger a failover.
- Microsoft-managed: Azure automatically triggers forced failover after a set GracePeriodWithDataLossHours

Testing: You can manually trigger a failover via the Azure Portal, PowerShell (Invoke-AzSqlDatabaseFailover), or Azure CLI to test application resiliency

Happy leaning..!