Model of Patroni Passive Failure
RTO Timeline
Failure Model
| Phase | Best | Worst | Average | Description |
|---|---|---|---|---|
| Lease Expiration | ttl - loop | ttl | ttl - loop/2 | Best: crash just before refresh Worst: crash right after refresh |
| Replica Detection | 0 | loop | loop / 2 | Best: exactly at check point Worst: just missed check point |
| Lock Contest & Promote | 0 | 2 | 1 | Best: direct lock and promote Worst: API timeout + Promote |
| Health Check | (rise-1) × fastinter | (rise-1) × fastinter + inter | (rise-1) × fastinter + inter/2 | Best: state change before check Worst: state change right after check |
Key Difference Between Passive and Active Failover:
| Scenario | Patroni Status | Lease Handling | Primary Wait Time |
|---|---|---|---|
| Active Failover (PG crash) | Alive, healthy | Actively tries to restart PG, releases lease on timeout | primary_start_timeout |
| Passive Failover (Node crash) | Dies with node | Cannot actively release, must wait for TTL expiration | ttl |
In passive failover scenarios, Patroni dies along with the node and cannot actively release the Leader Key. The lease in DCS can only trigger cluster election after TTL naturally expires.
Timeline Analysis
Phase 1: Lease Expiration
The Patroni primary refreshes the Leader Key every loop_wait cycle, resetting TTL to the configured value.
Timeline:
t-loop t t+ttl-loop t+ttl
| | | |
Last Refresh Failure Best Case Worst Case
|←── loop ──→| | |
|←──────────── ttl ─────────────────────→|
- Best case: Failure occurs just before lease refresh (elapsed
loopsince last refresh), remaining TTL =ttl - loop - Worst case: Failure occurs right after lease refresh, must wait full
ttl - Average case:
ttl - loop/2
Phase 2: Replica Detection
Replicas wake up on loop_wait cycles and check the Leader Key status in DCS.
Timeline:
Lease Expired Replica Wakes
| |
|←── 0~loop ─→|
- Best case: Replica happens to wake when lease expires, wait
0 - Worst case: Replica just entered sleep when lease expires, wait
loop - Average case:
loop/2
Phase 3: Lock Contest & Promote
When replicas detect Leader Key expiration, they start the election process. The replica that acquires the Leader Key executes pg_ctl promote to become the new primary.
- Via REST API, parallel queries to check each replica’s replication position, typically 10ms, hardcoded 2s timeout.
- Compare WAL positions to determine the best candidate, replicas attempt to create Leader Key (CAS atomic operation)
- Execute
pg_ctl promoteto become primary (very fast, typically negligible)
Election Flow:
ReplicaA ──→ Query replication position ──→ Compare ──→ Contest lock ──→ Success
ReplicaB ──→ Query replication position ──→ Compare ──→ Contest lock ──→ Fail
- Best case: Single replica or immediate lock acquisition and promotion, constant overhead
0.1s - Worst case: DCS API call timeout:
2s - Average case:
1sconstant overhead
Phase 4: Health Check
HAProxy detects the new primary online, requiring rise consecutive successful health checks.
Detection Timeline:
New Primary First Check Second Check Third Check (UP)
| | | |
|←─ 0~inter ─→|←─ fast ─→|←─ fast ─→|
- Best case: New primary promoted just before check,
(rise-1) × fastinter - Worst case: New primary promoted right after check,
(rise-1) × fastinter + inter - Average case:
(rise-1) × fastinter + inter/2
RTO Formula
Sum all phase times to get total RTO:
Best Case
Average Case
Worst Case
Model Calculation
Substitute the four RTO model parameters into the formulas above:
pg_rto_plan: # [ttl, loop, retry, start, margin, inter, fastinter, downinter, rise, fall]
fast: [ 20 ,5 ,5 ,15 ,5 ,'1s' ,'0.5s' ,'1s' ,3 ,3 ] # rto < 30s
norm: [ 30 ,5 ,10 ,25 ,5 ,'2s' ,'1s' ,'2s' ,3 ,3 ] # rto < 45s
safe: [ 60 ,10 ,20 ,45 ,10 ,'3s' ,'1.5s' ,'3s' ,3 ,3 ] # rto < 90s
wide: [ 120 ,20 ,30 ,95 ,15 ,'4s' ,'2s' ,'4s' ,3 ,3 ] # rto < 150s
Four Mode Calculation Results (unit: seconds, format: min / avg / max)
| Phase | fast | norm | safe | wide |
|---|---|---|---|---|
| Lease Expiration | 15 / 17 / 20 | 25 / 27 / 30 | 50 / 55 / 60 | 100 / 110 / 120 |
| Replica Detection | 0 / 3 / 5 | 0 / 3 / 5 | 0 / 5 / 10 | 0 / 10 / 20 |
| Lock Contest & Promote | 0 / 1 / 2 | 0 / 1 / 2 | 0 / 1 / 2 | 0 / 1 / 2 |
| Health Check | 1 / 2 / 2 | 2 / 3 / 4 | 3 / 5 / 6 | 4 / 6 / 8 |
| Total | 16 / 23 / 29 | 27 / 34 / 41 | 53 / 66 / 78 | 104 / 127 / 150 |
Feedback
Was this page helpful?
Thanks for the feedback! Please let us know how we can improve.
Sorry to hear that. Please let us know how we can improve.