RTO Trade-offs

Trade-off analysis for RTO (Recovery Time Objective), finding the optimal balance between recovery speed and false failover risk.

RTO (Recovery Time Objective) defines the maximum time required for the system to restore write capability when the primary fails.

For critical transaction systems where availability is paramount, the shortest possible RTO is typically required, such as under one minute.

However, shorter RTO comes at a cost: increased false failover risk. Network jitter may be misinterpreted as a failure, leading to unnecessary failovers. For cross-datacenter/cross-region deployments, RTO requirements are typically relaxed (e.g., 1-2 minutes) to reduce false failover risk.


Trade-offs

The upper limit of unavailability during failover is controlled by the pg_rto parameter. Pigsty provides four preset RTO modes: fast, norm, safe, wide, each optimized for different network conditions and deployment scenarios. The default is norm mode (~45 seconds). You can also specify the RTO upper limit directly in seconds, and the system will automatically map to the closest mode.

When the primary fails, the entire recovery process involves multiple phases: Patroni detects the failure, DCS lock expires, new primary election, promote execution, HAProxy detects the new primary. Reducing RTO means shortening the timeout for each phase, which makes the cluster more sensitive to network jitter, thereby increasing false failover risk.

You need to choose the appropriate mode based on actual network conditions, balancing recovery speed and false failover risk. The worse the network quality, the more conservative mode you should choose; the better the network quality, the more aggressive mode you can choose.

flowchart LR
    A([Primary Failure]) --> B{Patroni<br/>Detected?}

    B -->|PG Crash| C[Attempt Local Restart]
    B -->|Node Down| D[Wait TTL Expiration]

    C -->|Success| E([Local Recovery])
    C -->|Fail/Timeout| F[Release Leader Lock]

    D --> F
    F --> G[Replica Election]
    G --> H[Execute Promote]
    H --> I[HAProxy Detects]
    I --> J([Service Restored])

    style A fill:#dc3545,stroke:#b02a37,color:#fff
    style E fill:#198754,stroke:#146c43,color:#fff
    style J fill:#198754,stroke:#146c43,color:#fff

Four Modes

Pigsty provides four RTO modes to help users make trade-offs under different network conditions.

Namefastnormsafewide
Use CaseSame rackSame datacenter (default)Same region, cross-DCCross-region/continent
Network< 1ms, very stable1-5ms, normal10-50ms, cross-DC100-200ms, public network
Target RTO30s45s90s150s
False Failover RiskHigherMediumLowerVery Low
Configurationpg_rto: fastpg_rto: normpg_rto: safepg_rto: wide

RTO Timeline

Patroni / PG HA has two critical failure paths. For detailed RTO timing analysis, see: Active Failure Detection and Passive Lease Expiration.


Implementation

The four RTO modes differ in how the following 10 Patroni and HAProxy HA-related parameters are configured.

ComponentParameterfastnormsafewideDescription
patronittl203060120Leader lock TTL (seconds)
loop_wait551020HA loop check interval (seconds)
retry_timeout5102030DCS operation retry timeout (seconds)
primary_start_timeout15254595Primary restart wait time (seconds)
safety_margin551015Watchdog safety margin (seconds)
haproxyinter1s2s3s4sNormal state check interval
fastinter0.5s1s1.5s2sState transition check interval
downinter1s2s3s4sDOWN state check interval
rise3333Consecutive successes to mark UP
fall3333Consecutive failures to mark DOWN

Patroni Parameters

  • ttl: Leader lock TTL. Primary must renew within this time, otherwise lock expires and triggers election. Directly determines passive failure detection delay.
  • loop_wait: Patroni main loop interval. Each loop performs one health check and state sync, affects failure discovery timeliness.
  • retry_timeout: DCS operation retry timeout. During network partition, Patroni retries continuously within this period; after timeout, primary actively demotes to prevent split-brain.
  • primary_start_timeout: Wait time for Patroni to attempt local restart after PG crash. After timeout, releases Leader lock and triggers failover.
  • safety_margin: Watchdog safety margin. Ensures sufficient time to trigger system restart during failures, avoiding split-brain.

HAProxy Parameters

  • inter: Health check interval in normal state, used when service status is stable.
  • fastinter: Check interval during state transition, uses shorter interval to accelerate confirmation when state change detected.
  • downinter: Check interval in DOWN state, uses this interval to probe recovery after service marked DOWN.
  • rise: Consecutive successes required to mark UP. After new primary comes online, must pass rise consecutive checks before receiving traffic.
  • fall: Consecutive failures required to mark DOWN. Service must fail fall consecutive times before being marked DOWN.

Key Constraint

Patroni core constraint: Ensures primary can complete demotion before TTL expires, preventing split-brain.

loop_wait+2×retry_timeoutttlloop\_wait + 2 \times retry\_timeout \leq ttl

Recommendations

fast mode is suitable for scenarios with extremely high RTO requirements, but requires sufficiently good network quality (latency < 1ms, very low packet loss). Recommended only for same-rack or same-switch deployments, and should be thoroughly tested in production before enabling.

norm mode (default) is Pigsty’s default configuration, sufficient for the vast majority of same-datacenter deployments. An average recovery time of 21 seconds is within acceptable range while providing a reasonable tolerance window to avoid false failovers from network jitter.

safe mode is suitable for same-city cross-datacenter deployments with higher network latency or occasional jitter. The longer tolerance window effectively prevents false failovers from network jitter, making it the recommended configuration for cross-datacenter disaster recovery.

wide mode is suitable for cross-region or even cross-continent deployments with high network latency and possible public-network-level packet loss. In such scenarios, stability is more important than recovery speed, so an extremely wide tolerance window ensures very low false failover rate.

ScenarioRecommended ModeRationale
Dev/Test environmentfastQuick feedback, low impact from false failover
Same-datacenter productionnormDefault choice, well-balanced
Same-city active-active/cross-DC DRsafeTolerates network jitter, reduces false failover
Geo-DR/cross-country deploymentwideAdapts to high-latency public network, very low false failover rate
Uncertain network qualitysafeConservative choice, avoids false failover

Typically you only need to set pg_rto to the mode name, and Pigsty will automatically configure Patroni and HAProxy parameters. For backward compatibility, Pigsty still supports configuring RTO directly in seconds, but the effect is equivalent to specifying norm mode.

The mode configuration actually loads the corresponding parameter set from pg_rto_plan. You can modify or override this configuration to implement custom RTO strategies.

pg_rto_plan:  # [ttl, loop, retry, start, margin, inter, fastinter, downinter, rise, fall]
  fast: [ 20  ,5  ,5  ,15 ,5  ,'1s' ,'0.5s' ,'1s' ,3 ,3 ]  # rto < 30s
  norm: [ 30  ,5  ,10 ,25 ,5  ,'2s' ,'1s'   ,'2s' ,3 ,3 ]  # rto < 45s
  safe: [ 60  ,10 ,20 ,45 ,10 ,'3s' ,'1.5s' ,'3s' ,3 ,3 ]  # rto < 90s
  wide: [ 120 ,20 ,30 ,95 ,15 ,'4s' ,'2s'   ,'4s' ,3 ,3 ]  # rto < 150s

Last Modified 2026-01-15: fix some legacy commands (5535c22)