This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Module: ETCD

Pigsty has built-in etcd support, which is a reliable distributive consensus storage (DCS), empowering PostgreSQL HA.

1: Configuration
2: Parameters
3: Playbook
4: Administration
5: Monitoring
6: Metrics
7: FAQ

ETCD is a distributed, reliable key-value store for the most critical data of a distributed system

Configuration | Administration | Playbook | Dashboard | Parameter

Pigsty use etcd as DCS: Distributed configuration storage (or distributed consensus service). Which is critical to PostgreSQL High-Availability & Auto-Failover.

You have to install ETCD module before any PGSQL modules, since patroni & vip-manager will rely on etcd to work. Unless you are using an external etcd cluster.

You don’t need NODE module to install ETCD, but it requires a valid CA on your local files/pki/ca. Check ETCD Administration SOP for more details.

1 - Configuration

Configure etcd clusters according to your needs, and access the service

You have to define the etcd cluster in the config inventory before deploying it.

Usually you can choose an etcd cluster with:

One Node, no high availability, just the functionality of etcd, suitable for development, testing, demonstration.
Three Nodes, basic high availability, tolerate one node failure, suitable for medium production environment.
Five Nodes, better high availability, tolerate two node failure, suitable for large production environment.

Even number of etcd nodes is meaningless, and more than five nodes is not common.

One Node

Define the group etcd in the inventory, It will create a singleton etcd instance.

# etcd cluster for ha postgres
etcd: { hosts: { 10.10.10.10: { etcd_seq: 1 } }, vars: { etcd_cluster: etcd } }

This line exists almost in all single-node config templates, where the placeholder IP address 10.10.10.10 will be replaced with the current admin node IP.

The only necessary parameters are etcd_seq and etcd_cluster, which uniquely identify the cluster and each instance.

Three Nodes

It’s common to use a three-node etcd cluster, which can tolerate one node failure, suitable for medium prod env.

The trio and safe config templates use a three-node etcd cluster, as shown below:

etcd: # dcs service for postgres/patroni ha consensus
  hosts:  # 1 node for testing, 3 or 5 for production
    10.10.10.10: { etcd_seq: 1 }  # etcd_seq required
    10.10.10.11: { etcd_seq: 2 }  # assign from 1 ~ n
    10.10.10.12: { etcd_seq: 3 }  # odd number please
  vars: # cluster level parameter override roles/etcd
    etcd_cluster: etcd  # mark etcd cluster name etcd
    etcd_safeguard: false # safeguard against purging
    etcd_clean: true # purge etcd during init process

Five Nodes

Five nodes etcd cluster can tolerate two node failure, suitable for large prod env.

There’s a five-node etcd cluster example in the prod template:

etcd:
  hosts:
    10.10.10.21 : { etcd_seq: 1 }
    10.10.10.22 : { etcd_seq: 2 }
    10.10.10.23 : { etcd_seq: 3 }
    10.10.10.24 : { etcd_seq: 4 }
    10.10.10.25 : { etcd_seq: 5 }
  vars: { etcd_cluster: etcd    }

You can use more nodes for production environment, but 3 or 5 nodes are recommended. Remember to use odd number for cluster size.

Etcd Usage

These are the services that currently use Etcd:

patroni: Use etcd as a consensus backend for PostgreSQL HA
vip-manager: Read leader info from Etcd to bind an optional L2 VIP on the PostgreSQL cluster

You’ll have to reload etcd config after any permanent change to the etcd cluster members.

e.g, update patroni reference to etcd endpoints:

./pgsql.yml -t pg_conf                            # generate patroni config
ansible all -f 1 -b -a 'systemctl reload patroni' # reload patroni config

e.g, update vip-manager reference to etcd endpoints (if you are using PGSQL L2 VIP):

./pgsql.yml -t pg_vip_config                           # generate vip-manager config
ansible all -f 1 -b -a 'systemctl restart vip-manager' # restart vip-manager to use

2 - Parameters

Etcd has 10 parameters to customize the etcd cluster as needed.

Etcd is a distributed, reliable key-value store used to store the most critical config/consensus data in the system.

Etcd is used as the DCS used by the HA agent Patroni, which is very important for the high availability of PostgreSQL in Pigsty.

Pigsty use the hard-coded group etcd for etcd cluster, which can be an external etcd cluster, r a new etcd cluster created by Pigsty using the etcd.yml playbook.

Parameters

There are 10 parameters about the ETCD module.

Parameter	Type	Level	Comment
`etcd_seq`	int	I	etcd instance identifier, REQUIRED
`etcd_cluster`	string	C	etcd cluster & group name, etcd by default
`etcd_safeguard`	bool	G/C/A	prevent purging running etcd instance?
`etcd_clean`	bool	G/C/A	purging existing etcd during initialization?
`etcd_data`	path	C	etcd data directory, /data/etcd by default
`etcd_port`	port	C	etcd client port, 2379 by default
`etcd_peer_port`	port	C	etcd peer port, 2380 by default
`etcd_init`	enum	C	etcd initial cluster state, new or existing
`etcd_election_timeout`	int	C	etcd election timeout, 1000ms by default
`etcd_heartbeat_interval`	int	C	etcd heartbeat interval, 100ms by default

Defaults

The default parameters of Etcd is defined in roles/etcd/defaults/main.yml

#-----------------------------------------------------------------
# etcd
#-----------------------------------------------------------------
#etcd_seq: 1                      # etcd instance identifier, explicitly required
etcd_cluster: etcd                # etcd cluster & group name, etcd by default
etcd_safeguard: false             # prevent purging running etcd instance?
etcd_clean: true                  # purging existing etcd during initialization?
etcd_data: /data/etcd             # etcd data directory, /data/etcd by default
etcd_port: 2379                   # etcd client port, 2379 by default
etcd_peer_port: 2380              # etcd peer port, 2380 by default
etcd_init: new                    # etcd initial cluster state, new or existing
etcd_election_timeout: 1000       # etcd election timeout, 1000ms by default
etcd_heartbeat_interval: 100      # etcd heartbeat interval, 100ms by default

`etcd_seq`

etcd instance identifier, REQUIRED

no default value, you have to specify it explicitly. Here is a 3-node etcd cluster example:

etcd: # dcs service for postgres/patroni ha consensus
  hosts:  # 1 node for testing, 3 or 5 for production
    10.10.10.10: { etcd_seq: 1 }  # etcd_seq required
    10.10.10.11: { etcd_seq: 2 }  # assign from 1 ~ n
    10.10.10.12: { etcd_seq: 3 }  # odd number please
  vars: # cluster level parameter override roles/etcd
    etcd_cluster: etcd  # mark etcd cluster name etcd
    etcd_safeguard: false # safeguard against purging
    etcd_clean: true # purge etcd during init process

`etcd_cluster`

etcd cluster & group name, etcd by default

default values: etcd, which is a fixed group name, can be useful when you want to use deployed some extra etcd clusters

`etcd_safeguard`

prevent purging running etcd instance? default value is false

If enabled, running etcd instance will not be purged by etcd.yml playbook.

`etcd_clean`

purging existing etcd during initialization? default value is true

If enabled, running etcd instance will be purged by etcd.yml playbook, which makes the playbook fully idempotent.

But if etcd_safeguard is enabled, it will still abort on any running etcd instance.

`etcd_data`

etcd data directory, /data/etcd by default

`etcd_port`

etcd client port, 2379 by default

`etcd_peer_port`

etcd peer port, 2380 by default

`etcd_init`

etcd initial cluster state, new or existing

default values: new, which will create a standalone new etcd cluster.

The value existing is used when trying to add new member to existing etcd cluster.

`etcd_election_timeout`

etcd election timeout, 1000 (ms) by default

`etcd_heartbeat_interval`

etcd heartbeat interval, 100 (ms) by default

3 - Playbook

How to manage etcd cluster with ansible playbooks

There’s a built-in playbook: etcd.yml for installing etcd cluster. But you have to define it first.

Playbook

To create a new etcd cluster, run the following playbook:

./etcd.yml    # install etcd cluster on group 'etcd'

Here are available sub tasks:

etcd_assert : generate etcd identity
etcd_install : install etcd rpm packages
etcd_clean : cleanup existing etcd
- etcd_check : check etcd instance is running
- etcd_purge : remove running etcd instance & data
etcd_dir : create etcd data & conf dir
etcd_config : generate etcd config
- etcd_conf : generate etcd main config
- etcd_cert : generate etcd ssl cert
etcd_launch : launch etcd service
etcd_register : register etcd to prometheus

There’s no dedicated uninstall playbook for etcd. If you want to uninstall etcd, you can use etcd_clean subtask:

./etcd.yml -t etcd_clean

Commands

Some shortcuts and common commands:

./etcd.yml                                      # init etcd cluster 
./etcd.yml -t etcd_launch                       # restart etcd cluster
./etcd.yml -t etcd_clean                        # remove etcd cluster
./etcd.yml -t etcd_purge                        # remove etcd cluster with brute force
./etcd.yml -t etcd_conf                         # refreshing /etc/etcd/etcd.conf
./etcd.yml -l 10.10.10.12 -e etcd_init=existing # add new member to existing etcd cluster
./etcd.yml -l 10.10.10.12 -t etcd_purge         # remove member from existing etcd cluster

Safeguard

Pigsty has safeguard mechanism for etcd module to prevent accidental purge.

etcd_clean: true by default, which will clean existing etcd instances during init
etcd_safeguard: false by default, will not prevent purge etcd cluster.

The default setting is useful for development, testing, and emergency rebuild of etcd cluster in production.

If you wish to prevent accidental purge, you can enable safeguard by setting etcd_clean to false and etcd_safeguard to true. And you can always override this setting by using -e etcd_clean=true and -e etcd_safeguard=false in command line.

If you wish to remove existing cluster:

./etcd.yml -l <cls> -e etcd_clean=true -t etcd_clean

The final brutal way to remove etcd cluster is using the etcd_purge subtask, which will ignore the safeguard:

./etcd.yml -l <cls> -t etcd_purge

Demo

4 - Administration

Admin SOP, create & remove etcd clusters and members

Here are some administration SOP for etcd:

Check ETCD: FAQ for more questions.

Create Cluster

To create an etcd cluster, define the etcd cluster in inventory first:

etcd:
  hosts:
    10.10.10.10: { etcd_seq: 1 }
    10.10.10.11: { etcd_seq: 2 }
    10.10.10.12: { etcd_seq: 3 }
  vars: { etcd_cluster: etcd }

Then run the etcd.yml playbook.

./etcd.yml   # init etcd module on group 'etcd'

Pigsty has safeguard mechanism to prevent accidental purge. By default, etcd_clean is true, and etcd_safeguard is false, which means the playbook will purge etcd cluster even if there are running etcd instances. In this case, etcd.yml is truly idempotent, which is useful for development, testing, and emergency rebuild of etcd cluster in production.

For provsioned etcd cluster in prod env, you can enable safeguard to prevent accidental clean.

Remove Cluster

To remove an existing etcd cluster, use the etcd_clean subtask of etcd.yml, do think before you type.

./etcd.yml -t etcd_clean  # remove entire cluster, honor the etcd_safeguard
./etcd.yml -t etcd_purge  # purge with brutal force, omit the etcd_safeguard

The etcd_clean subtask will honor the etcd_safeguard, while the etcd_purge subtask will ignore that and wipe out the entire etcd cluster.

CLI Environment

Pigsty use etcd v3 API by default.

Here’s an example of client environment config.

alias e="etcdctl"
alias em="etcdctl member"
export ETCDCTL_API=3
export ETCDCTL_ENDPOINTS=https://10.10.10.10:2379
export ETCDCTL_CACERT=/etc/pki/ca.crt
export ETCDCTL_CERT=/etc/etcd/server.crt
export ETCDCTL_KEY=/etc/etcd/server.key

You can do CRUD with following commands after setting up the envs:

e put a 10 ; e get a; e del a ; # V3 API

Reload Config

In case of permanent etcd cluster membership changes, You’ll have to refresh the 4 etcd endpoints references:

config file of existing etcd members
etcdctl client environment variables
patroni dcs endpoint config
vip-manager dcs endpoint config

To refresh etcd config file /etc/etcd/etcd.conf on existing members:

./etcd.yml -t etcd_conf                           # refresh /etc/etcd/etcd.conf with latest status
ansible etcd -f 1 -b -a 'systemctl restart etcd'  # optional: restart etcd

To refresh etcdctl client environment variables

$ ./etcd.yml -t etcd_env                          # refresh /etc/profile.d/etcdctl.sh

To update etcd endpoints reference on patroni:

./pgsql.yml -t pg_conf                            # regenerate patroni config
ansible all -f 1 -b -a 'systemctl reload patroni' # reload patroni config

To update etcd endpoints reference on vip-manager, (optional, if you are using a L2 vip)

./pgsql.yml -t pg_vip_config                           # regenerate vip-manager config
ansible all -f 1 -b -a 'systemctl restart vip-manager' # restart vip-manager to use new config

Add Member

ETCD Reference: Add a member

You can add new members to existing etcd cluster in 5 steps:

issue etcdctl member add command to tell existing cluster that a new member is coming (use learner mode)
update inventory group etcd with new instance
init the new member with etcd_init=existing, to join the existing cluster rather than create a new one (VERY IMPORTANT)
promote the new member from leaner to follower
update etcd endpoints reference with reload-config

Short Version

etcdctl member add <etcd-?> --learner=true --peer-urls=https://<new_ins_ip>:2380
./etcd.yml -l <new_ins_ip> -e etcd_init=existing
etcdctl member promote <new_ins_server_id>

Detail: Add member to etcd cluster

Here’s the detail, let’s start from one single etcd instance.

etcd:
  hosts:
    10.10.10.10: { etcd_seq: 1 } # <--- this is the existing instance
    10.10.10.11: { etcd_seq: 2 } # <--- add this new member definition to inventory
  vars: { etcd_cluster: etcd }

Add a learner instance etcd-2 to cluster with etcd member add:

# tell the existing cluster that a new member etcd-2 is coming
$ etcdctl member add etcd-2 --learner=true --peer-urls=https://10.10.10.11:2380
Member 33631ba6ced84cf8 added to cluster 6646fbcf5debc68f

ETCD_NAME="etcd-2"
ETCD_INITIAL_CLUSTER="etcd-2=https://10.10.10.11:2380,etcd-1=https://10.10.10.10:2380"
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://10.10.10.11:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"

Check the member list with etcdctl member list (or em list), we can see an unstarted member:

33631ba6ced84cf8, unstarted, , https://10.10.10.11:2380, , true
429ee12c7fbab5c1, started, etcd-1, https://10.10.10.10:2380, https://10.10.10.10:2379, false

Init the new etcd instance etcd-2 with etcd.yml playbook, we can see the new member is started:

$ ./etcd.yml -l 10.10.10.11 -e etcd_init=existing    # etcd_init=existing must be set
...
33631ba6ced84cf8, started, etcd-2, https://10.10.10.11:2380, https://10.10.10.11:2379, true
429ee12c7fbab5c1, started, etcd-1, https://10.10.10.10:2380, https://10.10.10.10:2379, false

Promote the new member, from leaner to follower:

$ etcdctl member promote 33631ba6ced84cf8   # promote the new learner
Member 33631ba6ced84cf8 promoted in cluster 6646fbcf5debc68f

$ em list                # check again, the new member is started
33631ba6ced84cf8, started, etcd-2, https://10.10.10.11:2380, https://10.10.10.11:2379, false
429ee12c7fbab5c1, started, etcd-1, https://10.10.10.10:2380, https://10.10.10.10:2379, fals

The new member is added, don’t forget to reload config.

Repeat the steps above to add more members. remember to use at least 3 members for production.

Remove Member

To remove a member from existing etcd cluster, it usually takes 3 steps:

remove/uncomment it from inventory and reload config
remove it with etcdctl member remove <server_id> command and kick it out of the cluster
temporarily add it back to inventory and purge that instance, then remove it from inventory permanently

Detail: Remove member from etcd cluster

Here’s the detail, let’s start from a 3 instance etcd cluster:

etcd:
  hosts:
    10.10.10.10: { etcd_seq: 1 }
    10.10.10.11: { etcd_seq: 2 }
    10.10.10.12: { etcd_seq: 3 }   # <---- comment this line, then reload-config
  vars: { etcd_cluster: etcd }

Then, you’ll have to actually kick it from cluster with etcdctl member remove command:

$ etcdctl member list
429ee12c7fbab5c1, started, etcd-1, https://10.10.10.10:2380, https://10.10.10.10:2379, false
33631ba6ced84cf8, started, etcd-2, https://10.10.10.11:2380, https://10.10.10.11:2379, false
93fcf23b220473fb, started, etcd-3, https://10.10.10.12:2380, https://10.10.10.12:2379, false  # <--- remove this

$ etcdctl member remove 93fcf23b220473fb  # kick it from cluster
Member 93fcf23b220473fb removed from cluster 6646fbcf5debc68f

Finally, you have to shut down the instance, and purge it from node, you have to uncomment the member in inventory temporarily, then purge it with etcd.yml playbook:

./etcd.yml -t etcd_purge -l 10.10.10.12   # purge it (the member is in inventory again)

After that, remove the member from inventory permanently, all clear!

5 - Monitoring

Etcd monitoring metrics, dashboards, and alerting rules

Dashboard

The ETCD module provides a monitoring dashboard: Etcd Overview.

ETCD Overview Dashboard

ETCD Overview: Overview of the ETCD cluster

This dashboard provides key information about the ETCD status, with the most notable being ETCD Aliveness, which displays the overall service status of the ETCD cluster.

Red bands indicate periods when instances are unavailable, while the blue-gray bands below show when the entire cluster is unavailable.

Alert Rules

Pigsty provides the following five preset alert rules for Etcd, defined in files/prometheus/rules/etcd.yml:

EtcdServerDown: Etcd node down, critical alert
EtcdNoLeader: Etcd cluster has no leader, critical alert
EtcdQuotaFull: Etcd quota usage exceeds 90%, warning
EtcdNetworkPeerRTSlow: Etcd network latency is slow, notice
EtcdWalFsyncSlow: Etcd disk fsync is slow, notice

#==============================================================#
#                         Aliveness                            #
#==============================================================#
# etcd server instance down
- alert: EtcdServerDown
  expr: etcd_up < 1
  for: 1m
  labels: { level: 0, severity: CRIT, category: etcd }
  annotations:
    summary: "CRIT EtcdServerDown {{ $labels.ins }}@{{ $labels.instance }}"
    description: |
      etcd_up[ins={{ $labels.ins }}, instance={{ $labels.instance }}] = {{ $value }} < 1
      http://g.pigsty/d/etcd-overview      

#==============================================================#
#                         Error                                #
#==============================================================#
# Etcd no Leader triggers a P0 alert immediately
# if dcs_failsafe mode is not enabled, this may lead to global outage
- alert: EtcdNoLeader
  expr: min(etcd_server_has_leader) by (cls) < 1
  for: 15s
  labels: { level: 0, severity: CRIT, category: etcd }
  annotations:
    summary: "CRIT EtcdNoLeader: {{ $labels.cls }} {{ $value }}"
    description: |
      etcd_server_has_leader[cls={{ $labels.cls }}] = {{ $value }} < 1
      http://g.pigsty/d/etcd-overview?from=now-5m&to=now&var-cls={{$labels.cls}}      

#==============================================================#
#                        Saturation                            #
#==============================================================#
- alert: EtcdQuotaFull
  expr: etcd:cls:quota_usage > 0.90
  for: 1m
  labels: { level: 1, severity: WARN, category: etcd }
  annotations:
    summary: "WARN EtcdQuotaFull: {{ $labels.cls }}"
    description: |
      etcd:cls:quota_usage[cls={{ $labels.cls }}] = {{ $value | printf "%.3f" }} > 90%      

#==============================================================#
#                         Latency                              #
#==============================================================#
# etcd network peer rt p95 > 200ms for 1m
- alert: EtcdNetworkPeerRTSlow
  expr: etcd:ins:network_peer_rt_p95_5m > 0.200
  for: 1m
  labels: { level: 2, severity: INFO, category: etcd }
  annotations:
    summary: "INFO EtcdNetworkPeerRTSlow: {{ $labels.cls }} {{ $labels.ins }}"
    description: |
      etcd:ins:network_peer_rt_p95_5m[cls={{ $labels.cls }}, ins={{ $labels.ins }}] = {{ $value }} > 200ms
      http://g.pigsty/d/etcd-instance?from=now-10m&to=now&var-cls={{ $labels.cls }}      

# Etcd wal fsync rt p95 > 50ms
- alert: EtcdWalFsyncSlow
  expr: etcd:ins:wal_fsync_rt_p95_5m > 0.050
  for: 1m
  labels: { level: 2, severity: INFO, category: etcd }
  annotations:
    summary: "INFO EtcdWalFsyncSlow: {{ $labels.cls }} {{ $labels.ins }}"
    description: |
      etcd:ins:wal_fsync_rt_p95_5m[cls={{ $labels.cls }}, ins={{ $labels.ins }}] = {{ $value }} > 50ms
      http://g.pigsty/d/etcd-instance?from=now-10m&to=now&var-cls={{ $labels.cls }}

6 - Metrics

Pigsty ETCD module metric list

ETCD module has 177 available metrics

Metric Name	Type	Labels	Description
etcd:ins:backend_commit_rt_p99_5m	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
etcd:ins:disk_fsync_rt_p99_5m	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
etcd:ins:network_peer_rt_p99_1m	Unknown	`cls`, `To`, `ins`, `instance`, `job`, `ip`	N/A
etcd_cluster_version	gauge	`cls`, `cluster_version`, `ins`, `instance`, `job`, `ip`	Which version is running. 1 for ‘cluster_version’ label with current cluster version
etcd_debugging_auth_revision	gauge	`cls`, `ins`, `instance`, `job`, `ip`	The current revision of auth store.
etcd_debugging_disk_backend_commit_rebalance_duration_seconds_bucket	Unknown	`cls`, `ins`, `instance`, `job`, `le`, `ip`	N/A
etcd_debugging_disk_backend_commit_rebalance_duration_seconds_count	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
etcd_debugging_disk_backend_commit_rebalance_duration_seconds_sum	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
etcd_debugging_disk_backend_commit_spill_duration_seconds_bucket	Unknown	`cls`, `ins`, `instance`, `job`, `le`, `ip`	N/A
etcd_debugging_disk_backend_commit_spill_duration_seconds_count	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
etcd_debugging_disk_backend_commit_spill_duration_seconds_sum	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
etcd_debugging_disk_backend_commit_write_duration_seconds_bucket	Unknown	`cls`, `ins`, `instance`, `job`, `le`, `ip`	N/A
etcd_debugging_disk_backend_commit_write_duration_seconds_count	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
etcd_debugging_disk_backend_commit_write_duration_seconds_sum	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
etcd_debugging_lease_granted_total	counter	`cls`, `ins`, `instance`, `job`, `ip`	The total number of granted leases.
etcd_debugging_lease_renewed_total	counter	`cls`, `ins`, `instance`, `job`, `ip`	The number of renewed leases seen by the leader.
etcd_debugging_lease_revoked_total	counter	`cls`, `ins`, `instance`, `job`, `ip`	The total number of revoked leases.
etcd_debugging_lease_ttl_total_bucket	Unknown	`cls`, `ins`, `instance`, `job`, `le`, `ip`	N/A
etcd_debugging_lease_ttl_total_count	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
etcd_debugging_lease_ttl_total_sum	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
etcd_debugging_mvcc_compact_revision	gauge	`cls`, `ins`, `instance`, `job`, `ip`	The revision of the last compaction in store.
etcd_debugging_mvcc_current_revision	gauge	`cls`, `ins`, `instance`, `job`, `ip`	The current revision of store.
etcd_debugging_mvcc_db_compaction_keys_total	counter	`cls`, `ins`, `instance`, `job`, `ip`	Total number of db keys compacted.
etcd_debugging_mvcc_db_compaction_last	gauge	`cls`, `ins`, `instance`, `job`, `ip`	The unix time of the last db compaction. Resets to 0 on start.
etcd_debugging_mvcc_db_compaction_pause_duration_milliseconds_bucket	Unknown	`cls`, `ins`, `instance`, `job`, `le`, `ip`	N/A
etcd_debugging_mvcc_db_compaction_pause_duration_milliseconds_count	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
etcd_debugging_mvcc_db_compaction_pause_duration_milliseconds_sum	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
etcd_debugging_mvcc_db_compaction_total_duration_milliseconds_bucket	Unknown	`cls`, `ins`, `instance`, `job`, `le`, `ip`	N/A
etcd_debugging_mvcc_db_compaction_total_duration_milliseconds_count	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
etcd_debugging_mvcc_db_compaction_total_duration_milliseconds_sum	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
etcd_debugging_mvcc_events_total	counter	`cls`, `ins`, `instance`, `job`, `ip`	Total number of events sent by this member.
etcd_debugging_mvcc_index_compaction_pause_duration_milliseconds_bucket	Unknown	`cls`, `ins`, `instance`, `job`, `le`, `ip`	N/A
etcd_debugging_mvcc_index_compaction_pause_duration_milliseconds_count	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
etcd_debugging_mvcc_index_compaction_pause_duration_milliseconds_sum	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
etcd_debugging_mvcc_keys_total	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Total number of keys.
etcd_debugging_mvcc_pending_events_total	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Total number of pending events to be sent.
etcd_debugging_mvcc_range_total	counter	`cls`, `ins`, `instance`, `job`, `ip`	Total number of ranges seen by this member.
etcd_debugging_mvcc_slow_watcher_total	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Total number of unsynced slow watchers.
etcd_debugging_mvcc_total_put_size_in_bytes	gauge	`cls`, `ins`, `instance`, `job`, `ip`	The total size of put kv pairs seen by this member.
etcd_debugging_mvcc_watch_stream_total	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Total number of watch streams.
etcd_debugging_mvcc_watcher_total	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Total number of watchers.
etcd_debugging_server_lease_expired_total	counter	`cls`, `ins`, `instance`, `job`, `ip`	The total number of expired leases.
etcd_debugging_snap_save_marshalling_duration_seconds_bucket	Unknown	`cls`, `ins`, `instance`, `job`, `le`, `ip`	N/A
etcd_debugging_snap_save_marshalling_duration_seconds_count	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
etcd_debugging_snap_save_marshalling_duration_seconds_sum	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
etcd_debugging_snap_save_total_duration_seconds_bucket	Unknown	`cls`, `ins`, `instance`, `job`, `le`, `ip`	N/A
etcd_debugging_snap_save_total_duration_seconds_count	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
etcd_debugging_snap_save_total_duration_seconds_sum	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
etcd_debugging_store_expires_total	counter	`cls`, `ins`, `instance`, `job`, `ip`	Total number of expired keys.
etcd_debugging_store_reads_total	counter	`cls`, `action`, `ins`, `instance`, `job`, `ip`	Total number of reads action by (get/getRecursive), local to this member.
etcd_debugging_store_watch_requests_total	counter	`cls`, `ins`, `instance`, `job`, `ip`	Total number of incoming watch requests (new or reestablished).
etcd_debugging_store_watchers	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Count of currently active watchers.
etcd_debugging_store_writes_total	counter	`cls`, `action`, `ins`, `instance`, `job`, `ip`	Total number of writes (e.g. set/compareAndDelete) seen by this member.
etcd_disk_backend_commit_duration_seconds_bucket	Unknown	`cls`, `ins`, `instance`, `job`, `le`, `ip`	N/A
etcd_disk_backend_commit_duration_seconds_count	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
etcd_disk_backend_commit_duration_seconds_sum	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
etcd_disk_backend_defrag_duration_seconds_bucket	Unknown	`cls`, `ins`, `instance`, `job`, `le`, `ip`	N/A
etcd_disk_backend_defrag_duration_seconds_count	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
etcd_disk_backend_defrag_duration_seconds_sum	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
etcd_disk_backend_snapshot_duration_seconds_bucket	Unknown	`cls`, `ins`, `instance`, `job`, `le`, `ip`	N/A
etcd_disk_backend_snapshot_duration_seconds_count	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
etcd_disk_backend_snapshot_duration_seconds_sum	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
etcd_disk_defrag_inflight	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Whether or not defrag is active on the member. 1 means active, 0 means not.
etcd_disk_wal_fsync_duration_seconds_bucket	Unknown	`cls`, `ins`, `instance`, `job`, `le`, `ip`	N/A
etcd_disk_wal_fsync_duration_seconds_count	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
etcd_disk_wal_fsync_duration_seconds_sum	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
etcd_disk_wal_write_bytes_total	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Total number of bytes written in WAL.
etcd_grpc_proxy_cache_hits_total	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Total number of cache hits
etcd_grpc_proxy_cache_keys_total	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Total number of keys/ranges cached
etcd_grpc_proxy_cache_misses_total	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Total number of cache misses
etcd_grpc_proxy_events_coalescing_total	counter	`cls`, `ins`, `instance`, `job`, `ip`	Total number of events coalescing
etcd_grpc_proxy_watchers_coalescing_total	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Total number of current watchers coalescing
etcd_mvcc_db_open_read_transactions	gauge	`cls`, `ins`, `instance`, `job`, `ip`	The number of currently open read transactions
etcd_mvcc_db_total_size_in_bytes	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Total size of the underlying database physically allocated in bytes.
etcd_mvcc_db_total_size_in_use_in_bytes	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Total size of the underlying database logically in use in bytes.
etcd_mvcc_delete_total	counter	`cls`, `ins`, `instance`, `job`, `ip`	Total number of deletes seen by this member.
etcd_mvcc_hash_duration_seconds_bucket	Unknown	`cls`, `ins`, `instance`, `job`, `le`, `ip`	N/A
etcd_mvcc_hash_duration_seconds_count	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
etcd_mvcc_hash_duration_seconds_sum	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
etcd_mvcc_hash_rev_duration_seconds_bucket	Unknown	`cls`, `ins`, `instance`, `job`, `le`, `ip`	N/A
etcd_mvcc_hash_rev_duration_seconds_count	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
etcd_mvcc_hash_rev_duration_seconds_sum	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
etcd_mvcc_put_total	counter	`cls`, `ins`, `instance`, `job`, `ip`	Total number of puts seen by this member.
etcd_mvcc_range_total	counter	`cls`, `ins`, `instance`, `job`, `ip`	Total number of ranges seen by this member.
etcd_mvcc_txn_total	counter	`cls`, `ins`, `instance`, `job`, `ip`	Total number of txns seen by this member.
etcd_network_active_peers	gauge	`cls`, `ins`, `Local`, `instance`, `job`, `ip`, `Remote`	The current number of active peer connections.
etcd_network_client_grpc_received_bytes_total	counter	`cls`, `ins`, `instance`, `job`, `ip`	The total number of bytes received from grpc clients.
etcd_network_client_grpc_sent_bytes_total	counter	`cls`, `ins`, `instance`, `job`, `ip`	The total number of bytes sent to grpc clients.
etcd_network_peer_received_bytes_total	counter	`cls`, `ins`, `instance`, `job`, `ip`, `From`	The total number of bytes received from peers.
etcd_network_peer_round_trip_time_seconds_bucket	Unknown	`cls`, `To`, `ins`, `instance`, `job`, `le`, `ip`	N/A
etcd_network_peer_round_trip_time_seconds_count	Unknown	`cls`, `To`, `ins`, `instance`, `job`, `ip`	N/A
etcd_network_peer_round_trip_time_seconds_sum	Unknown	`cls`, `To`, `ins`, `instance`, `job`, `ip`	N/A
etcd_network_peer_sent_bytes_total	counter	`cls`, `To`, `ins`, `instance`, `job`, `ip`	The total number of bytes sent to peers.
etcd_server_apply_duration_seconds_bucket	Unknown	`cls`, `version`, `ins`, `instance`, `job`, `le`, `success`, `ip`, `op`	N/A
etcd_server_apply_duration_seconds_count	Unknown	`cls`, `version`, `ins`, `instance`, `job`, `success`, `ip`, `op`	N/A
etcd_server_apply_duration_seconds_sum	Unknown	`cls`, `version`, `ins`, `instance`, `job`, `success`, `ip`, `op`	N/A
etcd_server_client_requests_total	counter	`client_api_version`, `cls`, `ins`, `instance`, `type`, `job`, `ip`	The total number of client requests per client version.
etcd_server_go_version	gauge	`cls`, `ins`, `instance`, `job`, `server_go_version`, `ip`	Which Go version server is running with. 1 for ‘server_go_version’ label with current version.
etcd_server_has_leader	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Whether or not a leader exists. 1 is existence, 0 is not.
etcd_server_health_failures	counter	`cls`, `ins`, `instance`, `job`, `ip`	The total number of failed health checks
etcd_server_health_success	counter	`cls`, `ins`, `instance`, `job`, `ip`	The total number of successful health checks
etcd_server_heartbeat_send_failures_total	counter	`cls`, `ins`, `instance`, `job`, `ip`	The total number of leader heartbeat send failures (likely overloaded from slow disk).
etcd_server_id	gauge	`cls`, `ins`, `instance`, `job`, `server_id`, `ip`	Server or member ID in hexadecimal format. 1 for ‘server_id’ label with current ID.
etcd_server_is_leader	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Whether or not this member is a leader. 1 if is, 0 otherwise.
etcd_server_is_learner	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Whether or not this member is a learner. 1 if is, 0 otherwise.
etcd_server_leader_changes_seen_total	counter	`cls`, `ins`, `instance`, `job`, `ip`	The number of leader changes seen.
etcd_server_learner_promote_successes	counter	`cls`, `ins`, `instance`, `job`, `ip`	The total number of successful learner promotions while this member is leader.
etcd_server_proposals_applied_total	gauge	`cls`, `ins`, `instance`, `job`, `ip`	The total number of consensus proposals applied.
etcd_server_proposals_committed_total	gauge	`cls`, `ins`, `instance`, `job`, `ip`	The total number of consensus proposals committed.
etcd_server_proposals_failed_total	counter	`cls`, `ins`, `instance`, `job`, `ip`	The total number of failed proposals seen.
etcd_server_proposals_pending	gauge	`cls`, `ins`, `instance`, `job`, `ip`	The current number of pending proposals to commit.
etcd_server_quota_backend_bytes	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Current backend storage quota size in bytes.
etcd_server_read_indexes_failed_total	counter	`cls`, `ins`, `instance`, `job`, `ip`	The total number of failed read indexes seen.
etcd_server_slow_apply_total	counter	`cls`, `ins`, `instance`, `job`, `ip`	The total number of slow apply requests (likely overloaded from slow disk).
etcd_server_slow_read_indexes_total	counter	`cls`, `ins`, `instance`, `job`, `ip`	The total number of pending read indexes not in sync with leader’s or timed out read index requests.
etcd_server_snapshot_apply_in_progress_total	gauge	`cls`, `ins`, `instance`, `job`, `ip`	1 if the server is applying the incoming snapshot. 0 if none.
etcd_server_version	gauge	`cls`, `server_version`, `ins`, `instance`, `job`, `ip`	Which version is running. 1 for ‘server_version’ label with current version.
etcd_snap_db_fsync_duration_seconds_bucket	Unknown	`cls`, `ins`, `instance`, `job`, `le`, `ip`	N/A
etcd_snap_db_fsync_duration_seconds_count	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
etcd_snap_db_fsync_duration_seconds_sum	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
etcd_snap_db_save_total_duration_seconds_bucket	Unknown	`cls`, `ins`, `instance`, `job`, `le`, `ip`	N/A
etcd_snap_db_save_total_duration_seconds_count	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
etcd_snap_db_save_total_duration_seconds_sum	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
etcd_snap_fsync_duration_seconds_bucket	Unknown	`cls`, `ins`, `instance`, `job`, `le`, `ip`	N/A
etcd_snap_fsync_duration_seconds_count	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
etcd_snap_fsync_duration_seconds_sum	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
etcd_up	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
go_gc_duration_seconds	summary	`cls`, `ins`, `instance`, `quantile`, `job`, `ip`	A summary of the pause duration of garbage collection cycles.
go_gc_duration_seconds_count	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
go_gc_duration_seconds_sum	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
go_goroutines	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Number of goroutines that currently exist.
go_info	gauge	`cls`, `version`, `ins`, `instance`, `job`, `ip`	Information about the Go environment.
go_memstats_alloc_bytes	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Number of bytes allocated and still in use.
go_memstats_alloc_bytes_total	counter	`cls`, `ins`, `instance`, `job`, `ip`	Total number of bytes allocated, even if freed.
go_memstats_buck_hash_sys_bytes	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Number of bytes used by the profiling bucket hash table.
go_memstats_frees_total	counter	`cls`, `ins`, `instance`, `job`, `ip`	Total number of frees.
go_memstats_gc_cpu_fraction	gauge	`cls`, `ins`, `instance`, `job`, `ip`	The fraction of this program’s available CPU time used by the GC since the program started.
go_memstats_gc_sys_bytes	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Number of bytes used for garbage collection system metadata.
go_memstats_heap_alloc_bytes	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Number of heap bytes allocated and still in use.
go_memstats_heap_idle_bytes	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Number of heap bytes waiting to be used.
go_memstats_heap_inuse_bytes	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Number of heap bytes that are in use.
go_memstats_heap_objects	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Number of allocated objects.
go_memstats_heap_released_bytes	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Number of heap bytes released to OS.
go_memstats_heap_sys_bytes	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Number of heap bytes obtained from system.
go_memstats_last_gc_time_seconds	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Number of seconds since 1970 of last garbage collection.
go_memstats_lookups_total	counter	`cls`, `ins`, `instance`, `job`, `ip`	Total number of pointer lookups.
go_memstats_mallocs_total	counter	`cls`, `ins`, `instance`, `job`, `ip`	Total number of mallocs.
go_memstats_mcache_inuse_bytes	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Number of bytes in use by mcache structures.
go_memstats_mcache_sys_bytes	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Number of bytes used for mcache structures obtained from system.
go_memstats_mspan_inuse_bytes	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Number of bytes in use by mspan structures.
go_memstats_mspan_sys_bytes	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Number of bytes used for mspan structures obtained from system.
go_memstats_next_gc_bytes	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Number of heap bytes when next garbage collection will take place.
go_memstats_other_sys_bytes	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Number of bytes used for other system allocations.
go_memstats_stack_inuse_bytes	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Number of bytes in use by the stack allocator.
go_memstats_stack_sys_bytes	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Number of bytes obtained from system for stack allocator.
go_memstats_sys_bytes	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Number of bytes obtained from system.
go_threads	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Number of OS threads created.
grpc_server_handled_total	counter	`cls`, `ins`, `instance`, `grpc_code`, `job`, `grpc_method`, `grpc_type`, `ip`, `grpc_service`	Total number of RPCs completed on the server, regardless of success or failure.
grpc_server_msg_received_total	counter	`cls`, `ins`, `instance`, `job`, `grpc_type`, `grpc_method`, `ip`, `grpc_service`	Total number of RPC stream messages received on the server.
grpc_server_msg_sent_total	counter	`cls`, `ins`, `instance`, `job`, `grpc_type`, `grpc_method`, `ip`, `grpc_service`	Total number of gRPC stream messages sent by the server.
grpc_server_started_total	counter	`cls`, `ins`, `instance`, `job`, `grpc_type`, `grpc_method`, `ip`, `grpc_service`	Total number of RPCs started on the server.
os_fd_limit	gauge	`cls`, `ins`, `instance`, `job`, `ip`	The file descriptor limit.
os_fd_used	gauge	`cls`, `ins`, `instance`, `job`, `ip`	The number of used file descriptors.
process_cpu_seconds_total	counter	`cls`, `ins`, `instance`, `job`, `ip`	Total user and system CPU time spent in seconds.
process_max_fds	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Maximum number of open file descriptors.
process_open_fds	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Number of open file descriptors.
process_resident_memory_bytes	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Resident memory size in bytes.
process_start_time_seconds	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Start time of the process since unix epoch in seconds.
process_virtual_memory_bytes	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Virtual memory size in bytes.
process_virtual_memory_max_bytes	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Maximum amount of virtual memory available in bytes.
promhttp_metric_handler_requests_in_flight	gauge	`cls`, `ins`, `instance`, `job`, `ip`	Current number of scrapes being served.
promhttp_metric_handler_requests_total	counter	`cls`, `ins`, `instance`, `job`, `ip`, `code`	Total number of scrapes by HTTP status code.
scrape_duration_seconds	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
scrape_samples_post_metric_relabeling	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
scrape_samples_scraped	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
scrape_series_added	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A
up	Unknown	`cls`, `ins`, `instance`, `job`, `ip`	N/A

7 - FAQ

Pigsty ETCD dcs module frequently asked questions

What is the role of the etcd in pigsty?

etcd is a distributed, reliable key-value store used to store the most critical config / consensus data in the deployment. Pigsty uses etcd as the DCS (Distributed Configuration Store) service for Patroni, which will store the high availability status information of the PostgreSQL cluster.

How many etcd instance should I choose?

If more than (include) half of the etcd instances are down, the etcd cluster and its service will be unavailable.

For example, a 3-node etcd cluster can tolerate at most one node failure, and the other two nodes can still work normally; while a 5-node etcd cluster can tolerate 2 node failures.

Beware that the learner instances in the etcd cluster do not count in the member number, so in a 3-node etcd cluster, if there is a learner instance, the actual member count is 2, so no node failure can be tolerated.

It is advisable to choose an odd number of etcd instances to avoid split-brain scenarios. It is recommended to use 3 or 5 nodes for the production environment.

What is the impact of etcd failure?

If etcd cluster is unavailable, it will affect the control plane of Pigsty, but not the data plane - the existing PostgreSQL cluster will continue to serve, but admin operations through Patroni will not work.

During etcd failure, PostgreSQL HA is unable to perform automatic failover, and most of the Patroni operations will be blocked, such as edit-config, restart, switchover, etc… Admin tasks through Ansible playbooks are usually not affected by etcd failure, such as create database, create user, reload HBA and Service, etc…, And you can always operate the PostgreSQL cluster directly to achieve most of the patroni functions.

Beware that the above description is only applicable to newer versions of Patroni (>=3.0, Pigsty >= 2.0). If you are using an older version of Patroni (<3.0, corresponding to Pigsty version 1.x), etcd / consul failure will cause a serious impact: All PostgreSQL clusters will be demoted and reject write requests, and etcd failure will be amplified as a global PostgreSQL failure. After Patroni 3.0’s DCS Failsafe feature, this situation has been significantly improved.

What data is stored in the etcd cluster?

etcd is only used for PostgreSQL HA consensus in Pigsty, no other data is stored in etcd by default.

These consensus data are managed by Patroni, and when these data are lost in etcd, Patroni will automatically rebuild them.

Thus, by default, the etcd in Pigsty can be regarded as a “stateless service” that is disposable, which brings great convenience to maintenance work.

If you use etcd for other purposes, such as storing metadata for Kubernetes, or storing other data, you need to back up the etcd data yourself and restore the data after the etcd cluster is restored.

How to recover from etcd failure?

Since etcd is disposable in Pigsty, you can quickly stop the bleeding by “restarting” or “redeploying” etcd in case of failure.

To Restart the etcd cluster, you can use the following Ansible command (or systemctl restart etcd):

./etcd.yml -t etcd_launch

To Reset the etcd cluster, you can run this playbook, it will nuke the etcd cluster and redeploy it:

./etcd.yml

Beware that if you use etcd to store other data, don’t forget to backup etcd data before nuking the etcd cluster.

Is any maintenance work for etcd cluster?

In short: do not use all the quota of etcd.

etcd has a default quota for database size of 2GB, if your etcd database size exceeds this limit, etcd will reject write requests. Meanwhile, as etcd’s data model illustrates, each write will generate a new version (a.k.a. revision), so if your etcd cluster writes frequently, even with very few keys, the etcd database size may continue to grow, and may fail when it reaches the quota limit.

You can achieve this by Auto Compact, Manual Compact, Defragmentation, and Quota Increase, etc., please refer to the etcd official maintenance guide.

Pigsty has auto compact enabled by default since v2.6, so you usually don’t have to worry about etcd full. For versions before v2.6, we strongly recommend enabling etcd’s auto compact feature in the production environment.

Fill etcd may lead to PostgreSQL failure!

For Pigsty v2.0 - v2.5 users, we strongly recommend upgrading to a newer version, or following the instructions below to enable etcd auto compaction!

How to enable etcd auto compaction?

If you are using an earlier version of Pigsty (v2.0 - v2.5), we strongly recommend that you enable etcd’s auto compaction feature in the production environment.

Edit the etcd config template in roles/etcd/templates/etcd.conf.j2 with these 3 new lines:

auto-compaction-mode: periodic
auto-compaction-retention: "24h"
quota-backend-bytes: 17179869184

You can set all the PostgreSQL cluster to maintenance mode and then redeploy the etcd cluster with ./etcd.yml to apply the these changes.

It will increase the etcd default quota from 2 GiB to 16 GiB, and ensure that only the most recent day’s write history is retained, avoiding the infinite growth of the etcd database size.

Where does the PostgreSQL HA data stored in etcd?

Patroni will use the pg_namespace (default is /pg) as the prefix for all metadata keys in etcd, followed by the PostgreSQL cluster name.

For example, a PG cluster named pg-meta, its metadata keys will be stored under /pg/pg-meta, which may look like this:

/pg/pg-meta/config
{"ttl":30,"loop_wait":10,"retry_timeout":10,"primary_start_timeout":10,"maximum_lag_on_failover":1048576,"maximum_lag_on_syncnode":-1,"primary_stop_timeout":30,"synchronous_mode":false,"synchronous_mode_strict":false,"failsafe_mode":true,"pg_version":16,"pg_cluster":"pg-meta","pg_shard":"pg-meta","pg_group":0,"postgresql":{"use_slots":true,"use_pg_rewind":true,"remove_data_directory_on_rewind_failure":true,"parameters":{"max_connections":100,"superuser_reserved_connections":10,"max_locks_per_transaction":200,"max_prepared_transactions":0,"track_commit_timestamp":"on","wal_level":"logical","wal_log_hints":"on","max_worker_processes":16,"max_wal_senders":50,"max_replication_slots":50,"password_encryption":"scram-sha-256","ssl":"on","ssl_cert_file":"/pg/cert/server.crt","ssl_key_file":"/pg/cert/server.key","ssl_ca_file":"/pg/cert/ca.crt","shared_buffers":"7969MB","maintenance_work_mem":"1993MB","work_mem":"79MB","max_parallel_workers":8,"max_parallel_maintenance_workers":2,"max_parallel_workers_per_gather":0,"hash_mem_multiplier":8.0,"huge_pages":"try","temp_file_limit":"7GB","vacuum_cost_delay":"20ms","vacuum_cost_limit":2000,"bgwriter_delay":"10ms","bgwriter_lru_maxpages":800,"bgwriter_lru_multiplier":5.0,"min_wal_size":"7GB","max_wal_size":"28GB","max_slot_wal_keep_size":"42GB","wal_buffers":"16MB","wal_writer_delay":"20ms","wal_writer_flush_after":"1MB","commit_delay":20,"commit_siblings":10,"checkpoint_timeout":"15min","checkpoint_completion_target":0.8,"archive_mode":"on","archive_timeout":300,"archive_command":"pgbackrest --stanza=pg-meta archive-push %p","max_standby_archive_delay":"10min","max_standby_streaming_delay":"3min","wal_receiver_status_interval":"1s","hot_standby_feedback":"on","wal_receiver_timeout":"60s","max_logical_replication_workers":8,"max_sync_workers_per_subscription":6,"random_page_cost":1.1,"effective_io_concurrency":1000,"effective_cache_size":"23907MB","default_statistics_target":200,"log_destination":"csvlog","logging_collector":"on","log_directory":"/pg/log/postgres","log_filename":"postgresql-%Y-%m-%d.log","log_checkpoints":"on","log_lock_waits":"on","log_replication_commands":"on","log_statement":"ddl","log_min_duration_statement":100,"track_io_timing":"on","track_functions":"all","track_activity_query_size":8192,"log_autovacuum_min_duration":"1s","autovacuum_max_workers":2,"autovacuum_naptime":"1min","autovacuum_vacuum_cost_delay":-1,"autovacuum_vacuum_cost_limit":-1,"autovacuum_freeze_max_age":1000000000,"deadlock_timeout":"50ms","idle_in_transaction_session_timeout":"10min","shared_preload_libraries":"timescaledb, pg_stat_statements, auto_explain","auto_explain.log_min_duration":"1s","auto_explain.log_analyze":"on","auto_explain.log_verbose":"on","auto_explain.log_timing":"on","auto_explain.log_nested_statements":true,"pg_stat_statements.max":5000,"pg_stat_statements.track":"all","pg_stat_statements.track_utility":"off","pg_stat_statements.track_planning":"off","timescaledb.telemetry_level":"off","timescaledb.max_background_workers":8,"citus.node_conninfo":"sslm
ode=prefer"}}}
/pg/pg-meta/failsafe
{"pg-meta-2":"http://10.10.10.11:8008/patroni","pg-meta-1":"http://10.10.10.10:8008/patroni"}
/pg/pg-meta/initialize
7418384210787662172
/pg/pg-meta/leader
pg-meta-1
/pg/pg-meta/members/pg-meta-1
{"conn_url":"postgres://10.10.10.10:5432/postgres","api_url":"http://10.10.10.10:8008/patroni","state":"running","role":"primary","version":"4.0.1","tags":{"clonefrom":true,"version":"16","spec":"8C.32G.125G","conf":"tiny.yml"},"xlog_location":184549376,"timeline":1}
/pg/pg-meta/members/pg-meta-2
{"conn_url":"postgres://10.10.10.11:5432/postgres","api_url":"http://10.10.10.11:8008/patroni","state":"running","role":"replica","version":"4.0.1","tags":{"clonefrom":true,"version":"16","spec":"8C.32G.125G","conf":"tiny.yml"},"xlog_location":184549376,"replication_state":"streaming","timeline":1}
/pg/pg-meta/status
{"optime":184549376,"slots":{"pg_meta_2":184549376,"pg_meta_1":184549376},"retain_slots":["pg_meta_1","pg_meta_2"]}

How to use existing external etcd cluster?

The hard-coded group, etcd, will be used as DCS servers for PGSQL. You can initialize them with etcd.yml or assume it is an existing external etcd cluster.

To use an existing external etcd cluster, define them as usual and make sure your current etcd cluster certificate is signed by the same CA as your self-signed CA for PGSQL.

How to add a new member to the existing etcd cluster?

Check Add a member to etcd cluster

etcdctl member add <etcd-?> --learner=true --peer-urls=https://<new_ins_ip>:2380 # on admin node
./etcd.yml -l <new_ins_ip> -e etcd_init=existing                                 # init new etcd member
etcdctl member promote <new_ins_server_id>                                       # on admin node

How to remove a member from an existing etcd cluster?

Check Remove member from etcd cluster

etcdctl member remove <etcd_server_id>   # kick member out of the cluster (on admin node)
./etcd.yml -l <ins_ip> -t etcd_purge     # purge etcd instance