documentdb-high-availability
Installation
SKILL.md
High Availability, Replication & DR — Azure DocumentDB
Azure DocumentDB's resiliency model has three layers. Pick the right combination for the workload — production-critical workloads should use all three.
| Layer | What it protects against | SLA contribution | Automatic? |
|---|---|---|---|
| In-region HA (standby shard per primary, synchronous replication) | Node / zone failures within a region | 99.99% | ✅ Failover is automatic; connection string is unchanged |
| Cross-region replica (active-passive, asynchronous) | Regional outage; provides read scale-out | + 0.005% → 99.995% combined | ❌ Promotion is customer-triggered (shared-responsibility DR); HA must be re-enabled on the promoted cluster |
| Automatic backups (35 d active / 7 d deleted clusters) | Accidental deletion or corruption | — | ✅ Continuous, no perf impact |
Replication model at a glance
- Primary ↔ standby shard (in-region): synchronous — every write commits to both before the client gets an ack, so failover is lossless and reads on the standby (after promotion) are strongly consistent. With HA on, each shard has 6 replicas in total: 3 LRS replicas under the primary shard + 3 LRS replicas under the standby shard. In AZ-enabled regions the primary and standby sit in different availability zones.
- Primary cluster ↔ cross-region replica: asynchronous — design for eventual consistency on the replica. Some writes acknowledged on the primary may not yet be on the replica, so regional promotion has a non-zero RPO (recent writes can be lost). Replication lag scales with the primary's write intensity and the load on both clusters.
- Without HA: each shard uses locally-redundant storage (LRS) with 3 synchronous Azure Storage replicas. Single-replica failures are auto-healed by Azure Storage (CRC checks + network checksums protect against silent corruption), but a zone or region failure can cause downtime and possible data loss. HA is also a prerequisite for availability-zone placement.
Applications connect to a cluster through a single connection string and endpoint regardless of shard count. The multi-shard topology is fully abstracted — a 16-shard cluster looks like one MongoDB endpoint to the driver.