Model of Patroni Passive Failure
Failover path triggered by node crash causing leader lease expiration and cluster election
Patroni failures can be classified into 10 categories by failure target, and further consolidated into five categories based on detection path, which are detailed in this section.
| # | Failure Scenario | Description | Final Path |
|---|---|---|---|
| 1 | PG process crash | crash, OOM killed | Active Detection |
| 2 | PG connection refused | max_connections | Active Detection |
| 3 | PG zombie | Process alive but unresponsive | Active Detection (timeout) |
| 4 | Patroni process crash | kill -9, OOM | Passive Detection |
| 5 | Patroni zombie | Process alive but stuck | Watchdog |
| 6 | Node down | Power outage, hardware failure | Passive Detection |
| 7 | Node zombie | IO hang, CPU starvation | Watchdog |
| 8 | Primary ↔ DCS network failure | Firewall, switch failure | Network Partition |
| 9 | Storage failure | Disk failure, disk full, mount failure | Active Detection or Watchdog |
| 10 | Manual switchover | Switchover/Failover | Manual Trigger |
However, for RTO calculation purposes, all failures ultimately converge to two paths. This section explores the upper bound, lower bound, and average RTO for these two scenarios.
flowchart LR
A([Primary Failure]) --> B{Patroni<br/>Detected?}
B -->|PG Crash| C[Attempt Local Restart]
B -->|Node Down| D[Wait TTL Expiration]
C -->|Success| E([Local Recovery])
C -->|Fail/Timeout| F[Release Leader Lock]
D --> F
F --> G[Replica Election]
G --> H[Execute Promote]
H --> I[HAProxy Detects]
I --> J([Service Restored])
style A fill:#dc3545,stroke:#b02a37,color:#fff
style E fill:#198754,stroke:#146c43,color:#fff
style J fill:#198754,stroke:#146c43,color:#fffFailover path triggered by node crash causing leader lease expiration and cluster election
PostgreSQL primary process crashes while Patroni stays alive and attempts restart, triggering failover after timeout
Was this page helpful?
Thanks for the feedback! Please let us know how we can improve.
Sorry to hear that. Please let us know how we can improve.