This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Module: NODE

Tune nodes into the desired state and monitor it, manage node, VIP, HAProxy, and exporters.

The NODE module helps you manage and monitor server nodes in Pigsty. It provides:

  • Node initialization and configuration management
  • System tuning and optimization
  • Service exposure via HAProxy
  • Node monitoring via Prometheus exporters
  • Log collection via Promtail
  • Optional L2 VIP support for high availability

1 - Concept

Introduction to node module, admin node, infra node, pgsql node, etc…

Node is an abstraction of hardware resources, which can be bare metal, virtual machines, or even k8s pods.

There are different types of nodes in Pigsty:

The admin node is usually overlapped with the infra node, if there’s more than one infra node, the first one is often used as the default admin node, and the rest of the infra nodes can be used as backup admin nodes.


Common Node

You can manage nodes with Pigsty, and install modules on them. The node.yml playbook will adjust the node to desired state.

Some services will be added to all nodes by default:

Component Port Description Status
Node Exporter 9100 Node Monitoring Metrics Exporter Enabled
HAProxy Admin 9101 HAProxy admin page Enabled
Promtail 9080 Log collecting agent Enabled
Docker Daemon 9323 Enable Container Service Disabled
Keepalived - Manage Node Cluster L2 VIP Disabled
Keepalived Exporter 9650 Monitoring Keepalived Status Disabled

Docker & Keepalived are optional components, enabled when required.


ADMIN Node

There is one and only one admin node in a pigsty deployment, which is specified by admin_ip. It is set to the local primary IP during configure.

The node will have ssh / sudo access to all other nodes, which is critical; ensure it’s fully secured.


INFRA Node

A pigsty deployment may have one or more infra nodes, usually 2 ~ 3, in a large production environment.

The infra group specifies infra nodes in the inventory. And infra nodes will have INFRA module installed (DNS, Nginx, Prometheus, Grafana, etc…),

The admin node is also the default and first infra node, and infra nodes can be used as ‘backup’ admin nodes.

Component Port Domain Description
Nginx 80 h.pigsty Web Service Portal (YUM/APT Repo)
AlertManager 9093 a.pigsty Alert Aggregation and delivery
Prometheus 9090 p.pigsty Monitoring Time Series Database
Grafana 3000 g.pigsty Visualization Platform
Loki 3100 - Logging Collection Server
PushGateway 9091 - Collect One-Time Job Metrics
BlackboxExporter 9115 - Blackbox Probing
Dnsmasq 53 - DNS Server
Chronyd 123 - NTP Time Server
PostgreSQL 5432 - Pigsty CMDB & default database
Ansible - - Run playbooks

PGSQL Node

The node with PGSQL module installed is called a PGSQL node. The node and pg instance is 1:1 deployed. And node instance can be borrowed from corresponding pg instances with node_id_from_pg.

Component Port Description Status
PostgreSQL 5432 PostgreSQL Database Enabled
Pgbouncer 6432 Pgbouncer Connection Pooling Service Enabled
Patroni 8008 Patroni HA Component Enabled
Haproxy Primary 5433 Primary connection pool: Read/Write Service Enabled
Haproxy Replica 5434 Replica connection pool: Read-only Service Enabled
Haproxy Default 5436 Primary Direct Connect Service Enabled
Haproxy Offline 5438 Offline Direct Connect: Offline Read Service Enabled
Haproxy service 543x Customized PostgreSQL Services On Demand
Haproxy Admin 9101 Monitoring metrics and traffic management Enabled
PG Exporter 9630 PG Monitoring Metrics Exporter Enabled
PGBouncer Exporter 9631 PGBouncer Monitoring Metrics Exporter Enabled
Node Exporter 9100 Node Monitoring Metrics Exporter Enabled
Promtail 9080 Collect Postgres, Pgbouncer, Patroni logs Enabled
Docker Daemon 9323 Docker Container Service (disable by default) Disabled
vip-manager - Bind VIP to the primary Disabled
keepalived - Node Cluster L2 VIP manager (disable by default) Disabled
Keepalived Exporter 9650 Keepalived Metrics Exporter (disable by default) Disabled

2 - Configuration

Configure node identity, dns, vip, data dir, and haproxy services

Each node has identity parameters that are configured through the parameters in <cluster>.hosts and <cluster>.vars.

Pigsty uses IP as a unique identifier for database nodes. This IP must be the IP that the database instance listens to and serves externally, But it would be inappropriate to use a public IP address!

This is very important. The IP is the inventory_hostname of the host in the inventory, which is reflected as the key in the <cluster>.hosts object.

You can use ansible_* parameters to overwrite ssh behavior, e.g. connect via domain name / alias, but the primary IPv4 is still the core identity of the node.

nodename and node_cluster are not mandatory; nodename will use the node’s current hostname by default, while node_cluster will use the fixed default value: nodes.

If node_id_from_pg is enabled, the node will borrow PGSQL identity and use it as Node’s identity, i.e. node_cluster is set to pg_cluster if applicable, and nodename is set to ${pg_cluster}-${pg_seq}. If nodename_overwrite is enabled, node’s hostname will be overwritten by nodename

Pigsty labels a node with identity parameters in the monitoring system. Which maps nodename to ins, and node_cluster into cls.

Name Type Level Necessity Comment
inventory_hostname ip - Required Node IP
nodename string I Optional Node Name
node_cluster string C Optional Node cluster name

The following cluster config declares a three-node node cluster:

node-test:
  hosts:
    10.10.10.11: { nodename: node-test-1 }
    10.10.10.12: { nodename: node-test-2 }
    10.10.10.13: { nodename: node-test-3 }
  vars:
    node_cluster: node-test

Default values:

#nodename:           # [INSTANCE] # node instance identity, use hostname if missing, optional
node_cluster: nodes   # [CLUSTER] # node cluster identity, use 'nodes' if missing, optional
nodename_overwrite: true          # overwrite node's hostname with nodename?
nodename_exchange: false          # exchange nodename among play hosts?
node_id_from_pg: true             # use postgres identity as node identity if applicable?



3 - Parameter

There are 64 parameters to customize node, monitor agent and haproxy load balancer

There are 10 sections, 66 parameters in the NODE module.


Parameters

Name Section Type Level Comment
nodename NODE_ID string I node instance identity, use hostname if missing, optional
node_cluster NODE_ID string C node cluster identity, use ’nodes’ if missing, optional
nodename_overwrite NODE_ID bool C overwrite node’s hostname with nodename?
nodename_exchange NODE_ID bool C exchange nodename among play hosts?
node_id_from_pg NODE_ID bool C use postgres identity as node identity if applicable?
node_write_etc_hosts NODE_DNS bool G/C/I modify /etc/hosts on target node?
node_default_etc_hosts NODE_DNS string[] G static dns records in /etc/hosts
node_etc_hosts NODE_DNS string[] C extra static dns records in /etc/hosts
node_dns_method NODE_DNS enum C how to handle dns servers: add,none,overwrite
node_dns_servers NODE_DNS string[] C dynamic nameserver in /etc/resolv.conf
node_dns_options NODE_DNS string[] C dns resolv options in /etc/resolv.conf
node_repo_modules NODE_PACKAGE string C upstream repo to be added on node, local by default
node_repo_remove NODE_PACKAGE bool C remove existing repo on node?
node_packages NODE_PACKAGE string[] C packages to be installed current nodes
node_default_packages NODE_PACKAGE string[] G default packages to be installed on all nodes
node_disable_firewall NODE_TUNE bool C disable node firewall? true by default
node_disable_selinux NODE_TUNE bool C disable node selinux? true by default
node_disable_numa NODE_TUNE bool C disable node numa, reboot required
node_disable_swap NODE_TUNE bool C disable node swap, use with caution
node_static_network NODE_TUNE bool C preserve dns resolver settings after reboot
node_disk_prefetch NODE_TUNE bool C setup disk prefetch on HDD to increase performance
node_kernel_modules NODE_TUNE string[] C kernel modules to be enabled on this node
node_hugepage_count NODE_TUNE int C number of 2MB hugepage, take precedence over ratio
node_hugepage_ratio NODE_TUNE float C node mem hugepage ratio, 0 disable it by default
node_overcommit_ratio NODE_TUNE float C node mem overcommit ratio, 0 disable it by default
node_tune NODE_TUNE enum C node tuned profile: none,oltp,olap,crit,tiny
node_sysctl_params NODE_TUNE dict C sysctl parameters in k:v format in addition to tuned
node_data NODE_ADMIN path C node main data directory, /data by default
node_admin_enabled NODE_ADMIN bool C create a admin user on target node?
node_admin_uid NODE_ADMIN int C uid and gid for node admin user
node_admin_username NODE_ADMIN username C name of node admin user, dba by default
node_admin_ssh_exchange NODE_ADMIN bool C exchange admin ssh key among node cluster
node_admin_pk_current NODE_ADMIN bool C add current user’s ssh pk to admin authorized_keys
node_admin_pk_list NODE_ADMIN string[] C ssh public keys to be added to admin user
node_aliases NODE_ADMIN dict C extra shell aliases to be added, k:v dict
node_timezone NODE_TIME string C setup node timezone, empty string to skip
node_ntp_enabled NODE_TIME bool C enable chronyd time sync service?
node_ntp_servers NODE_TIME string[] C ntp servers in /etc/chrony.conf
node_crontab_overwrite NODE_TIME bool C overwrite or append to /etc/crontab?
node_crontab NODE_TIME string[] C crontab entries in /etc/crontab
vip_enabled NODE_VIP bool C enable vip on this node cluster?
vip_address NODE_VIP ip C node vip address in ipv4 format, required if vip is enabled
vip_vrid NODE_VIP int C required, integer, 1-254, should be unique among same VLAN
vip_role NODE_VIP enum I optional, master/backup, backup by default, use as init role
vip_preempt NODE_VIP bool C/I optional, true/false, false by default, enable vip preemption
vip_interface NODE_VIP string C/I node vip network interface to listen, eth0 by default
vip_dns_suffix NODE_VIP string C node vip dns name suffix, empty string by default
vip_exporter_port NODE_VIP port C keepalived exporter listen port, 9650 by default
haproxy_enabled HAPROXY bool C enable haproxy on this node?
haproxy_clean HAPROXY bool G/C/A cleanup all existing haproxy config?
haproxy_reload HAPROXY bool A reload haproxy after config?
haproxy_auth_enabled HAPROXY bool G enable authentication for haproxy admin page
haproxy_admin_username HAPROXY username G haproxy admin username, admin by default
haproxy_admin_password HAPROXY password G haproxy admin password, pigsty by default
haproxy_exporter_port HAPROXY port C haproxy admin/exporter port, 9101 by default
haproxy_client_timeout HAPROXY interval C client side connection timeout, 24h by default
haproxy_server_timeout HAPROXY interval C server side connection timeout, 24h by default
haproxy_services HAPROXY service[] C list of haproxy service to be exposed on node
node_exporter_enabled NODE_EXPORTER bool C setup node_exporter on this node?
node_exporter_port NODE_EXPORTER port C node exporter listen port, 9100 by default
node_exporter_options NODE_EXPORTER arg C extra server options for node_exporter
promtail_enabled PROMTAIL bool C enable promtail logging collector?
promtail_clean PROMTAIL bool G/A purge existing promtail status file during init?
promtail_port PROMTAIL port C promtail listen port, 9080 by default
promtail_positions PROMTAIL path C promtail position status file path

NODE

Node module are tuning target nodes into desired state and take it into the Pigsty monitor system.


NODE_ID

Each node has identity parameters that are configured through the parameters in <cluster>.hosts and <cluster>.vars. Check NODE Identity for details.


nodename

name: nodename, type: string, level: I

node instance identity, use hostname if missing, optional

no default value, Null or empty string means nodename will be set to node’s current hostname.

If node_id_from_pg is true (by default) and nodename is not explicitly defined, nodename will try to use ${pg_cluster}-${pg_seq} first, if PGSQL is not defined on this node, it will fall back to default HOSTNAME.

If nodename_overwrite is true, the node name will also be used as the HOSTNAME.


node_cluster

name: node_cluster, type: string, level: C

node cluster identity, use ’nodes’ if missing, optional

default values: nodes

If node_id_from_pg is true (by default) and node_cluster is not explicitly defined, node_cluster will try to use ${pg_cluster} first, if PGSQL is not defined on this node, it will fall back to default HOSTNAME.


nodename_overwrite

name: nodename_overwrite, type: bool, level: C

overwrite node’s hostname with nodename?

default value is true, a non-empty node name nodename will override the hostname of the current node.

When the nodename parameter is undefined or an empty string, but node_id_from_pg is true, the node name will try to use {{ pg_cluster }}-{{ pg_seq }}, borrow identity from the 1:1 PostgreSQL Instance’s ins name.

No changes are made to the hostname if the nodename is undefined, empty, or an empty string and node_id_from_pg is false.


nodename_exchange

name: nodename_exchange, type: bool, level: C

exchange nodename among play hosts?

default value is false

When this parameter is enabled, node names are exchanged between the same group of nodes executing the node.yml playbook, written to /etc/hosts.


node_id_from_pg

name: node_id_from_pg, type: bool, level: C

use postgres identity as node identity if applicable?

default value is true

Boworrow PostgreSQL cluster & instance identity if application.

It’s useful to use same identity for postgres & node if there’s a 1:1 relationship


NODE_DNS

Pigsty configs static DNS records and dynamic DNS resolver for nodes.

If you already have a DNS server, set node_dns_method to none to disable dynamic DNS setup.

node_write_etc_hosts: true        # modify `/etc/hosts` on target node?
node_default_etc_hosts:           # static dns records in `/etc/hosts`
  - "${admin_ip} h.pigsty a.pigsty p.pigsty g.pigsty"
node_etc_hosts: []                # extra static dns records in `/etc/hosts`
node_dns_method: add              # how to handle dns servers: add,none,overwrite
node_dns_servers: ['${admin_ip}'] # dynamic nameserver in `/etc/resolv.conf`
node_dns_options:                 # dns resolv options in `/etc/resolv.conf`
  - options single-request-reopen timeout:1

node_write_etc_hosts

name: node_write_etc_hosts, type: ‘bool’, level: G|C|I

modify /etc/hosts on target node?

For example, the docker VM can not modify /etc/hosts by default, so you can set this value to false to disable the modification.


node_default_etc_hosts

name: node_default_etc_hosts, type: string[], level: G

static dns records in /etc/hosts

default value:

["${admin_ip} h.pigsty a.pigsty p.pigsty g.pigsty"]

node_default_etc_hosts is an array. Each element is a DNS record with format <ip> <name>.

It is used for global static DNS records. You can use node_etc_hosts for ad hoc records for each cluster.

Make sure to write a DNS record like 10.10.10.10 h.pigsty a.pigsty p.pigsty g.pigsty to /etc/hosts to ensure that the local yum repo can be accessed using the domain name before the DNS Nameserver starts.


node_etc_hosts

name: node_etc_hosts, type: string[], level: C

extra static dns records in /etc/hosts

default values: []

Same as node_default_etc_hosts, but in addition to it.


node_dns_method

name: node_dns_method, type: enum, level: C

how to handle dns servers: add,none,overwrite

default values: add

  • add: Append the records in node_dns_servers to /etc/resolv.conf and keep the existing DNS servers. (default)
  • overwrite: Overwrite /etc/resolv.conf with the record in node_dns_servers
  • none: If a DNS server is provided in the production env, the DNS server config can be skipped.

node_dns_servers

name: node_dns_servers, type: string[], level: C

dynamic nameserver in /etc/resolv.conf

default values: ["${admin_ip}"] , the default nameserver on admin node will be added to /etc/resolv.conf as the first nameserver.


node_dns_options

name: node_dns_options, type: string[], level: C

dns resolv options in /etc/resolv.conf, default value:

- options single-request-reopen timeout:1

NODE_PACKAGE

This section is about upstream yum repos & packages to be installed.

node_repo_modules: local          # upstream repo to be added on node, local by default
node_repo_remove: true            # remove existing repo on node?
node_packages: [openssh-server]   # packages to be installed current nodes with latest version
#node_default_packages: []        # default packages to be installed on infra nodes, (defaults are load from node_id/vars)

node_repo_modules

name: node_repo_modules, type: string, level: C/A

upstream repo to be added on node, default value: local

This parameter specifies the upstream repo to be added to the node. It is used to filter the repo_upstream entries and only the entries with the same module value will be added to the node’s software source. Which is similar to the repo_modules parameter.


node_repo_remove

name: node_repo_remove, type: bool, level: C/A

remove existing repo on node?

default value is true, and thus Pigsty will move existing repo file in /etc/yum.repos.d to a backup dir: /etc/yum.repos.d/backup before adding upstream repos On Debian/Ubuntu, Pigsty will backup & move /etc/apt/sources.list(.d) to /etc/apt/backup.


node_packages

name: node_packages, type: string[], level: C

packages to be installed current nodes, default values: [openssh-server].

Each element is a comma-separated list of package names, which will be installed on the current node in addition to node_default_packages

Packages specified in this parameter will be upgraded to the latest version, and the default value is [openssh-server], which will upgrade sshd by default to avoid SSH CVE.

This parameters is usually used to install additional software packages that is ad hoc for the current node/cluster.


node_default_packages

name: node_default_packages, type: string[], level: G

default packages to be installed on all nodes, the default values is not defined.

This param is an array os strings, each string is a comma-separated list of package names, which will be installed on all nodes by default.

This param DOES NOT have a default value, you can specify it explicitly, or leaving it empty if you want to use the default values.

When leaving it empty, Pigsty will use the default values from the node_packages_default defined in roles/node_id/vars according to you OS.

For EL system, the default values are:

- lz4,unzip,bzip2,pv,jq,git,ncdu,make,patch,bash,lsof,wget,uuid,tuned,nvme-cli,numactl,sysstat,iotop,htop,rsync,tcpdump
- python3,python3-pip,socat,lrzsz,net-tools,ipvsadm,telnet,ca-certificates,openssl,keepalived,etcd,haproxy,chrony
- zlib,yum,audit,bind-utils,readline,vim-minimal,node_exporter,grubby,openssh-server,openssh-clients

For debian / ubuntu nodes, use this default value explicitly:

- lz4,unzip,bzip2,pv,jq,git,ncdu,make,patch,bash,lsof,wget,uuid,tuned,nvme-cli,numactl,sysstat,iotop,htop,rsync,tcpdump
- python3,python3-pip,socat,lrzsz,net-tools,ipvsadm,telnet,ca-certificates,openssl,keepalived,etcd,haproxy,chrony
- zlib1g,acl,dnsutils,libreadline-dev,vim-tiny,node-exporter,openssh-server,openssh-client

NODE_TUNE

Configure tuned templates, features, kernel modules, sysctl params on node.

node_disable_firewall: true       # disable node firewall? true by default
node_disable_selinux: true        # disable node selinux? true by default
node_disable_numa: false          # disable node numa, reboot required
node_disable_swap: false          # disable node swap, use with caution
node_static_network: true         # preserve dns resolver settings after reboot
node_disk_prefetch: false         # setup disk prefetch on HDD to increase performance
node_kernel_modules: [ softdog, br_netfilter, ip_vs, ip_vs_rr, ip_vs_wrr, ip_vs_sh ]
node_hugepage_count: 0            # number of 2MB hugepage, take precedence over ratio
node_hugepage_ratio: 0            # node mem hugepage ratio, 0 disable it by default
node_overcommit_ratio: 0          # node mem overcommit ratio, 0 disable it by default
node_tune: oltp                   # node tuned profile: none,oltp,olap,crit,tiny
node_sysctl_params: { }           # sysctl parameters in k:v format in addition to tuned

node_disable_firewall

name: node_disable_firewall, type: bool, level: C

disable node firewall? true by default

default value is true


node_disable_selinux

name: node_disable_selinux, type: bool, level: C

disable node selinux? true by default

default value is true


node_disable_numa

name: node_disable_numa, type: bool, level: C

disable node numa, reboot required

default value is false

Boolean flag, default is not off. Note that turning off NUMA requires a reboot of the machine before it can take effect!

If you don’t know how to set the CPU affinity, it is recommended to turn off NUMA.


node_disable_swap

name: node_disable_swap, type: bool, level: C

disable node swap, use with caution

default value is false

But turning off SWAP is not recommended. But SWAP should be disabled when your node is used for a Kubernetes deployment.

If there is enough memory and the database is deployed exclusively. it may slightly improve performance


node_static_network

name: node_static_network, type: bool, level: C

preserve dns resolver settings after reboot, default value is true

Enabling static networking means that machine reboots will not overwrite your DNS Resolv config with NIC changes. It is recommended to enable it in production environment.


node_disk_prefetch

name: node_disk_prefetch, type: bool, level: C

setup disk prefetch on HDD to increase performance

default value is false, Consider enable this when using HDD.


node_kernel_modules

name: node_kernel_modules, type: string[], level: C

kernel modules to be enabled on this node

default value:

node_kernel_modules: [ softdog, br_netfilter, ip_vs, ip_vs_rr, ip_vs_wrr, ip_vs_sh ]

An array consisting of kernel module names declaring the kernel modules that need to be installed on the node.


node_hugepage_count

name: node_hugepage_count, type: int, level: C

number of 2MB hugepage, take precedence over ratio, 0 by default

Take precedence over node_hugepage_ratio. If a non-zero value is given, it will be written to /etc/sysctl.d/hugepage.conf

If node_hugepage_count and node_hugepage_ratio are both 0 (default), hugepage will be disabled at all.

Negative value will not work, and number higher than 90% node mem will be ceil to 90% of node mem.

It should slightly larger than pg_shared_buffer_ratio, if not zero.


node_hugepage_ratio

name: node_hugepage_ratio, type: float, level: C

node mem hugepage ratio, 0 disable it by default, valid range: 0 ~ 0.40

default values: 0, which will set vm.nr_hugepages=0 and not use HugePage at all.

Percent of this memory will be allocated as HugePage, and reserved for PostgreSQL.

It should be equal or slightly larger than pg_shared_buffer_ratio, if not zero.

For example, if you have default 25% mem for postgres shard buffers, you can set this value to 0.27 ~ 0.30, Wasted hugepage can be reclaimed later with /pg/bin/pg-tune-hugepage


node_overcommit_ratio

name: node_overcommit_ratio, type: int, level: C

node mem overcommit ratio, 0 disable it by default. this is an integer from 0 to 100+ .

default values: 0, which will set vm.overcommit_memory=0, otherwise vm.overcommit_memory=2 will be used, and this value will be used as vm.overcommit_ratio.

It is recommended to set use a vm.overcommit_ratio on dedicated pgsql nodes. e.g. 50 ~ 100.


node_tune

name: node_tune, type: enum, level: C

node tuned profile: none,oltp,olap,crit,tiny

default values: oltp

  • tiny: Micro Virtual Machine (1 ~ 3 Core, 1 ~ 8 GB Mem)
  • oltp: Regular OLTP templates with optimized latency
  • olap : Regular OLAP templates to optimize throughput
  • crit: Core financial business templates, optimizing the number of dirty pages

Usually, the database tuning template pg_conf should be paired with the node tuning template: node_tune


node_sysctl_params

name: node_sysctl_params, type: dict, level: C

sysctl parameters in k:v format in addition to tuned

default values: {}

Dictionary K-V structure, Key is kernel sysctl parameter name, Value is the parameter value.

You can also define sysctl parameters with tuned profile


NODE_ADMIN

This section is about admin users and it’s credentials.

node_data: /data                  # node main data directory, `/data` by default
node_admin_enabled: true          # create a admin user on target node?
node_admin_uid: 88                # uid and gid for node admin user
node_admin_username: dba          # name of node admin user, `dba` by default
node_admin_ssh_exchange: true     # exchange admin ssh key among node cluster
node_admin_pk_current: true       # add current user's ssh pk to admin authorized_keys
node_admin_pk_list: []            # ssh public keys to be added to admin user

node_data

name: node_data, type: path, level: C

node main data directory, /data by default

default values: /data

If specified, this path will be used as major data disk mountpoint. And a dir will be created and throwing a warning if path not exists.

The data dir is owned by root with mode 0777.


node_admin_enabled

name: node_admin_enabled, type: bool, level: C

create a admin user on target node?

default value is true

Create an admin user on each node (password-free sudo and ssh), an admin user named dba (uid=88) will be created by default, which can access other nodes in the env and perform sudo from the meta node via SSH password-free.


node_admin_uid

name: node_admin_uid, type: int, level: C

uid and gid for node admin user

default values: 88


node_admin_username

name: node_admin_username, type: username, level: C

name of node admin user, dba by default

default values: dba


node_admin_ssh_exchange

name: node_admin_ssh_exchange, type: bool, level: C

exchange admin ssh key among node cluster

default value is true

When enabled, Pigsty will exchange SSH public keys between members during playbook execution, allowing admins node_admin_username to access each other from different nodes.


node_admin_pk_current

name: node_admin_pk_current, type: bool, level: C

add current user’s ssh pk to admin authorized_keys

default value is true

When enabled, on the current node, the SSH public key (~/.ssh/id_rsa.pub) of the current user is copied to the authorized_keys of the target node admin user.

When deploying in a production env, be sure to pay attention to this parameter, which installs the default public key of the user currently executing the command to the admin user of all machines.


node_admin_pk_list

name: node_admin_pk_list, type: string[], level: C

ssh public keys to be added to admin user

default values: []

Each element of the array is a string containing the key written to the admin user ~/.ssh/authorized_keys, and the user with the corresponding private key can log in as an admin user.

When deploying in production envs, be sure to note this parameter and add only trusted keys to this list.


node_aliases

name: node_aliases, type: dict, level: C/I

extra aliases to be added to admin user’s shell profile

default values: {}

You can add extra shell aliases to it, pigsty will add these aliases to the /etc/profile.d/node.alias.sh file on the target node:

node_aliases:
  g:   git
  d:   docker

会生成:

alias g="git"
alias d="docker"

NODE_TIME

node_timezone: ''                 # setup node timezone, empty string to skip
node_ntp_enabled: true            # enable chronyd time sync service?
node_ntp_servers:                 # ntp servers in `/etc/chrony.conf`
  - pool pool.ntp.org iburst
node_crontab_overwrite: true      # overwrite or append to `/etc/crontab`?
node_crontab: [ ]                 # crontab entries in `/etc/crontab`

node_timezone

name: node_timezone, type: string, level: C

setup node timezone, empty string to skip

default value is empty string, which will not change the default timezone (usually UTC)


node_ntp_enabled

name: node_ntp_enabled, type: bool, level: C

enable chronyd time sync service?

default value is true, and thus Pigsty will override the node’s /etc/chrony.conf by with node_ntp_servers.

If you already a NTP server configured, just set to false to leave it be.


node_ntp_servers

name: node_ntp_servers, type: string[], level: C

ntp servers in /etc/chrony.conf, default value: ["pool pool.ntp.org iburst"]

It only takes effect if node_ntp_enabled is true.

You can use ${admin_ip} to sync time with ntp server on admin node rather than public ntp server.

node_ntp_servers: [ 'pool ${admin_ip} iburst' ]

node_crontab_overwrite

name: node_crontab_overwrite, type: bool, level: C

overwrite or append to /etc/crontab?

default value is true, and pigsty will render records in node_crontab in overwrite mode rather than appending to it.


node_crontab

name: node_crontab, type: string[], level: C

crontab entries in /etc/crontab

default values: []


NODE_VIP

You can bind an optional L2 VIP among one node cluster, which is disabled by default.

L2 VIP can only be used in same L2 LAN, which may incurs extra restrictions on your network topology.

If enabled, You have to manually assign the vip_address and vip_vrid for each node cluster.

It is user’s responsibility to ensure that the address / vrid is unique among the same LAN.

vip_enabled: false                # enable vip on this node cluster?
# vip_address:         [IDENTITY] # node vip address in ipv4 format, required if vip is enabled
# vip_vrid:            [IDENTITY] # required, integer, 1-254, should be unique among same VLAN
vip_role: backup                  # optional, `master/backup`, backup by default, use as init role
vip_preempt: false                # optional, `true/false`, false by default, enable vip preemption
vip_interface: eth0               # node vip network interface to listen, `eth0` by default
vip_dns_suffix: ''                # node vip dns name suffix, empty string by default
vip_exporter_port: 9650           # keepalived exporter listen port, 9650 by default

vip_enabled

name: vip_enabled, type: bool, level: C

enable vip on this node cluster? default value is false, means no L2 VIP is created for this node cluster.

L2 VIP can only be used in same L2 LAN, which may incurs extra restrictions on your network topology.


vip_address

name: vip_address, type: ip, level: C

node vip address in IPv4 format, required if node vip_enabled.

no default value. This parameter must be explicitly assigned and unique in your LAN.


vip_vrid

name: vip_vrid, type: int, level: C

integer, 1-254, should be unique in same VLAN, required if node vip_enabled.

no default value. This parameter must be explicitly assigned and unique in your LAN.


vip_role

name: vip_role, type: enum, level: I

node vip role, could be master or backup, will be used as initial keepalived state.


vip_preempt

name: vip_preempt, type: bool, level: C/I

optional, true/false, false by default, enable vip preemption

default value is false, means no preempt is happening when a backup have higher priority than living master.


vip_interface

name: vip_interface, type: string, level: C/I

node vip network interface to listen, eth0 by default.

It should be the same primary intranet interface of your node, which is the IP address you used in the inventory file.

If your node have different interface, you can override it on instance vars


vip_dns_suffix

name: vip_dns_suffix, type: string, level: C/I

node vip dns name suffix, empty string by default. It will be used as the DNS name of the node VIP.


vip_exporter_port

name: vip_exporter_port, type: port, level: C/I

keepalived exporter listen port, 9650 by default.


HAPROXY

HAProxy is installed on every node by default, exposing services in a NodePort manner.

It is used by PGSQL Service.

haproxy_enabled: true             # enable haproxy on this node?
haproxy_clean: false              # cleanup all existing haproxy config?
haproxy_reload: true              # reload haproxy after config?
haproxy_auth_enabled: true        # enable authentication for haproxy admin page
haproxy_admin_username: admin     # haproxy admin username, `admin` by default
haproxy_admin_password: pigsty    # haproxy admin password, `pigsty` by default
haproxy_exporter_port: 9101       # haproxy admin/exporter port, 9101 by default
haproxy_client_timeout: 24h       # client side connection timeout, 24h by default
haproxy_server_timeout: 24h       # server side connection timeout, 24h by default
haproxy_services: []              # list of haproxy service to be exposed on node

haproxy_enabled

name: haproxy_enabled, type: bool, level: C

enable haproxy on this node?

default value is true


haproxy_clean

name: haproxy_clean, type: bool, level: G/C/A

cleanup all existing haproxy config?

default value is false


haproxy_reload

name: haproxy_reload, type: bool, level: A

reload haproxy after config?

default value is true, it will reload haproxy after config change.

If you wish to check before apply, you can turn off this with cli args and check it.


haproxy_auth_enabled

name: haproxy_auth_enabled, type: bool, level: G

enable authentication for haproxy admin page

default value is true, which will require a http basic auth for admin page.

disable it is not recommended, since your traffic control will be exposed


haproxy_admin_username

name: haproxy_admin_username, type: username, level: G

haproxy admin username, admin by default


haproxy_admin_password

name: haproxy_admin_password, type: password, level: G

haproxy admin password, pigsty by default

PLEASE CHANGE IT IN YOUR PRODUCTION ENVIRONMENT!


haproxy_exporter_port

name: haproxy_exporter_port, type: port, level: C

haproxy admin/exporter port, 9101 by default


haproxy_client_timeout

name: haproxy_client_timeout, type: interval, level: C

client side connection timeout, 24h by default


haproxy_server_timeout

name: haproxy_server_timeout, type: interval, level: C

server side connection timeout, 24h by default


haproxy_services

name: haproxy_services, type: service[], level: C

list of haproxy service to be exposed on node, default values: []

Each element is a service definition, here is an ad hoc haproxy service example:

haproxy_services:                   # list of haproxy service

  # expose pg-test read only replicas
  - name: pg-test-ro                # [REQUIRED] service name, unique
    port: 5440                      # [REQUIRED] service port, unique
    ip: "*"                         # [OPTIONAL] service listen addr, "*" by default
    protocol: tcp                   # [OPTIONAL] service protocol, 'tcp' by default
    balance: leastconn              # [OPTIONAL] load balance algorithm, roundrobin by default (or leastconn)
    maxconn: 20000                  # [OPTIONAL] max allowed front-end connection, 20000 by default
    default: 'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'
    options:
      - option httpchk
      - option http-keep-alive
      - http-check send meth OPTIONS uri /read-only
      - http-check expect status 200
    servers:
      - { name: pg-test-1 ,ip: 10.10.10.11 , port: 5432 , options: check port 8008 , backup: true }
      - { name: pg-test-2 ,ip: 10.10.10.12 , port: 5432 , options: check port 8008 }
      - { name: pg-test-3 ,ip: 10.10.10.13 , port: 5432 , options: check port 8008 }

It will be rendered to /etc/haproxy/<service.name>.cfg and take effect after reload.


NODE_EXPORTER

node_exporter_enabled: true       # setup node_exporter on this node?
node_exporter_port: 9100          # node exporter listen port, 9100 by default
node_exporter_options: '--no-collector.softnet --no-collector.nvme --collector.tcpstat --collector.processes'

node_exporter_enabled

name: node_exporter_enabled, type: bool, level: C

setup node_exporter on this node? default value is true


node_exporter_port

name: node_exporter_port, type: port, level: C

node exporter listen port, 9100 by default


node_exporter_options

name: node_exporter_options, type: arg, level: C

extra server options for node_exporter, default value: --no-collector.softnet --no-collector.nvme --collector.tcpstat --collector.processes

Pigsty enables tcpstat, processes collectors and disable nvme, softnet metrics collectors by default.


PROMTAIL

Promtail will collect logs from other modules, and send them to LOKI

  • INFRA: Infra logs, collected only on infra nodes.

    • nginx-access: /var/log/nginx/access.log
    • nginx-error: /var/log/nginx/error.log
    • grafana: /var/log/grafana/grafana.log
  • NODES: Host node logs, collected on all nodes.

    • syslog: /var/log/messages
    • dmesg: /var/log/dmesg
    • cron: /var/log/cron
  • PGSQL: PostgreSQL logs, collected when a node is defined with pg_cluster.

    • postgres: /pg/log/postgres/*
    • patroni: /pg/log/patroni.log
    • pgbouncer: /pg/log/pgbouncer/pgbouncer.log
    • pgbackrest: /pg/log/pgbackrest/*.log
  • REDIS: Redis logs, collected when a node is defined with redis_cluster.

    • redis: /var/log/redis/*.log

Log directory are customizable according to pg_log_dir, patroni_log_dir, pgbouncer_log_dir, pgbackrest_log_dir

promtail_enabled: true            # enable promtail logging collector?
promtail_clean: false             # purge existing promtail status file during init?
promtail_port: 9080               # promtail listen port, 9080 by default
promtail_positions: /var/log/positions.yaml # promtail position status file path

promtail_enabled

name: promtail_enabled, type: bool, level: C

enable promtail logging collector?

default value is true


promtail_clean

name: promtail_clean, type: bool, level: G/A

purge existing promtail status file during init?

default value is false, if you choose to clean, Pigsty will remove the existing state file defined by promtail_positions which means that Promtail will recollect all logs on the current node and send them to Loki again.


promtail_port

name: promtail_port, type: port, level: C

promtail listen port, 9080 by default

default values: 9080


promtail_positions

name: promtail_positions, type: path, level: C

promtail position status file path

default values: /var/log/positions.yaml

Promtail records the consumption offsets of all logs, which are periodically written to the file specified by promtail_positions.




4 - Playbook

Hwo to manage node cluster with ansible playbooks

There are two node playbooks node.yml and node-rm.yml


node.yml

The playbook node.yml will init node for pigsty

Subtasks of this playbook:

# node-id       : generate node identity
# node_name     : setup hostname
# node_hosts    : setup /etc/hosts records
# node_resolv   : setup dns resolver
# node_firewall : setup firewall & selinux
# node_ca       : add & trust ca certificate
# node_repo     : add upstream repo
# node_pkg      : install yum packages
# node_feature  : setup numa, grub, static network
# node_kernel   : enable kernel modules
# node_tune     : setup tuned profile
# node_sysctl   : setup additional sysctl parameters
# node_profile  : write /etc/profile.d/node.sh
# node_ulimit   : setup resource limits
# node_data     : setup main data dir
# node_admin    : setup admin user and ssh key
# node_timezone : setup timezone
# node_ntp      : setup ntp server/clients
# node_crontab  : add/overwrite crontab tasks
# node_vip      : setup optional l2 vrrp vip for node cluster
#   - vip_install
#   - vip_config
#   - vip_launch
#   - vip_reload
# haproxy       : setup haproxy on node to expose services
#   - haproxy_install
#   - haproxy_config
#   - haproxy_launch
#   - haproxy_reload
# monitor       : setup node_exporter & promtail for metrics & logs
#   - haproxy_register
#   - vip_dns
#   - node_exporter
#     - node_exporter_config
#     - node_exporter_launch
#   - vip_exporter
#     - vip_exporter_config
#     - vip_exporter_launch
#   - node_register
#   - promtail
#     - promtail_clean
#     - promtail_config
#     - promtail_install
#     - promtail_launch

asciicast


node-rm.yml

The playbook node-rm.yml will remove node from pigsty.playbook

Subtasks of this playbook:

# register       : remove register from prometheus & nginx
#   - prometheus : remove registered prometheus monitor target
#   - nginx      : remove nginx proxy record for haproxy admin
# vip            : remove node keepalived if enabled
# haproxy        : remove haproxy load balancer
# node_exporter  : remove monitoring exporter
# vip_exporter   : remove keepalived_exporter if enabled
# promtail       : remove loki log agent
# profile        : remove /etc/profile.d/node.sh



5 - Administration

Node admin SOP, add & remove node, setup admin, bind vip and miscellany

Here are some common administration tasks for NODE module.


Add Node

To add a node into Pigsty, you need to have nopass ssh/sudo access to the node

# ./node.yml -l <cls|ip|group>        # the underlying playbook
# bin/node-add <selector|ip...>       # add cluster/node to pigsty
bin/node-add node-test                # init node cluster 'node-test'
bin/node-add 10.10.10.10              # init node '10.10.10.10'

Remove Node

To remove a node from Pigsty, you can use the following:

# ./node-rm.yml -l <cls|ip|group>    # the underlying playbook
# bin/node-rm <selector|ip...>       # remove node from pigsty:
bin/node-rm node-test                # remove node cluster 'node-test'
bin/node-rm 10.10.10.10              # remove node '10.10.10.10'

Create Admin

If the current user does not have nopass ssh/sudo access to the node, you can use another admin user to bootstrap the node:

node.yml -t node_admin -k -K -e ansible_user=<another admin>   # input ssh/sudo password for another admin 

Bind VIP

You can bind an optional L2 VIP on a node cluster with vip_enabled.

proxy:
  hosts:
    10.10.10.29: { nodename: proxy-1 } 
    10.10.10.30: { nodename: proxy-2 } # , vip_role: master }
  vars:
    node_cluster: proxy
    vip_enabled: true
    vip_vrid: 128
    vip_address: 10.10.10.99
    vip_interface: eth1
./node.yml -l proxy -t node_vip     # enable for the first time
./node.yml -l proxy -t vip_refresh  # refresh vip config (e.g. designated master) 

Other Tasks

# Play
./node.yml -t node                            # init node itself (haproxy monitor not included)
./node.yml -t haproxy                         # setup haproxy on node to expose services
./node.yml -t monitor                         # setup node_exporter & promtail for metrics & logs
./node.yml -t node_vip                        # enable keepalived for node cluster L2 VIP
./node.yml -t vip_config,vip_reload           # refresh L2 VIP configuration
./node.yml -t haproxy_config,haproxy_reload   # refresh haproxy services definition on node cluster
./node.yml -t register_prometheus             # register node to Prometheus
./node.yml -t register_nginx                  # register haproxy admin page url to Nginx on infra nodes

# Task
./node.yml -t node-id        # generate node identity
./node.yml -t node_name      # setup hostname
./node.yml -t node_hosts     # setup /etc/hosts records
./node.yml -t node_resolv    # setup dns resolver
./node.yml -t node_firewall  # setup firewall & selinux
./node.yml -t node_ca        # add & trust ca certificate
./node.yml -t node_repo      # add upstream repo
./node.yml -t node_pkg       # install yum packages
./node.yml -t node_feature   # setup numa, grub, static network
./node.yml -t node_kernel    # enable kernel modules
./node.yml -t node_tune      # setup tuned profile
./node.yml -t node_sysctl    # setup additional sysctl parameters
./node.yml -t node_profile   # write /etc/profile.d/node.sh
./node.yml -t node_ulimit    # setup resource limits
./node.yml -t node_data      # setup main data dir
./node.yml -t node_admin     # setup admin user and ssh key
./node.yml -t node_timezone  # setup timezone
./node.yml -t node_ntp       # setup ntp server/clients
./node.yml -t node_crontab   # add/overwrite crontab tasks
./node.yml -t node_vip       # setup optional l2 vrrp vip for node cluster



6 - Monitoring

How to monitoring nodes in Pigsty and setup alerting rules

Dashboard

There are 6 dashboards for NODE module.

NODE Overview: Overview of all nodes

Node Overview Dashboard

node-overview.jpg

NODE Cluster: Detail information about one dedicate node cluster

Node Cluster Dashboard

node-cluster.jpg

Node Instance : Detail information about one single node instance

Node Instance Dashboard

node-instance.jpg

NODE Alert: Overview of key metrics of all node clusters/instances

Node Alert Dashboard

node-alert.jpg

NODE VIP: Detail information about a L2 VIP on a node cluster

Node VIP Dashboard

node-vip.jpg

Node Haproxy : Detail information about haproxy on node instance

Node Haproxy Dashboard

node-haproxy.jpg


Alert Rules

Here are default alerting rules for node module:

################################################################
#                         Node Alert                           #
################################################################
- name: node-alert
  rules:

    #==============================================================#
    #                          Aliveness                           #
    #==============================================================#
    # node exporter is dead indicate node is down
    - alert: NodeDown
      expr: node_up < 1
      for: 1m
      labels: { level: 0, severity: CRIT, category: node }
      annotations:
        summary: "CRIT NodeDown {{ $labels.ins }}@{{ $labels.instance }}"
        description: |
          node_up[ins={{ $labels.ins }}, instance={{ $labels.instance }}] = {{ $value }} < 1
          http://g.pigsty/d/node-instance?var-ins={{ $labels.ins }}          

    # haproxy the load balancer
    - alert: HaproxyDown
      expr: haproxy_up < 1
      for: 1m
      labels: { level: 0, severity: CRIT, category: node }
      annotations:
        summary: "CRIT HaproxyDown {{ $labels.ins }}@{{ $labels.instance }}"
        description: |
          haproxy_up[ins={{ $labels.ins }}, instance={{ $labels.instance }}] = {{ $value }} < 1
          http://g.pigsty/d/node-haproxy?var-ins={{ $labels.ins }}          

    # promtail the logging agent
    - alert: PromtailDown
      expr: promtail_up < 1
      for: 1m
      labels: { level: 1, severity: WARN, category: node }
      annotations:
        summary: "WARN PromtailDown {{ $labels.ins }}@{{ $labels.instance }}"
        description: |
          promtail_up[ins={{ $labels.ins }}, instance={{ $labels.instance }}] = {{ $value }} < 1
          http://g.pigsty/d/node-instance?var-ins={{ $labels.ins }}          

    # docker the container engine
    - alert: DockerDown
      expr: docker_up < 1
      for: 1m
      labels: { level: 1, severity: WARN, category: node }
      annotations:
        summary: "WARN DockerDown {{ $labels.ins }}@{{ $labels.instance }}"
        description: |
          docker_up[ins={{ $labels.ins }}, instance={{ $labels.instance }}] = {{ $value }} < 1
          http://g.pigsty/d/node-instance?var-ins={{ $labels.ins }}          

    # keepalived daemon
    - alert: KeepalivedDown
      expr: keepalived_up < 1
      for: 1m
      labels: { level: 1, severity: WARN, category: node }
      annotations:
        summary: "WARN KeepalivedDown {{ $labels.ins }}@{{ $labels.instance }}"
        description: |
          keepalived_up[ins={{ $labels.ins }}, instance={{ $labels.instance }}] = {{ $value }} < 1
          http://g.pigsty/d/node-instance?var-ins={{ $labels.ins }}          



    #==============================================================#
    #                          Node : CPU                          #
    #==============================================================#
    # cpu usage high : 1m avg cpu usage > 70% for 3m
    - alert: NodeCpuHigh
      expr: node:ins:cpu_usage_1m > 0.70
      for: 1m
      labels: { level: 1, severity: WARN, category: node }
      annotations:
        summary: 'WARN NodeCpuHigh {{ $labels.ins }}@{{ $labels.instance }} {{ $value  | printf "%.2f" }}'
        description: |
          node:ins:cpu_usage[ins={{ $labels.ins }}] = {{ $value  | printf "%.2f" }} > 70%          

    # OPTIONAL: one core high
    # OPTIONAL: throttled
    # OPTIONAL: frequency
    # OPTIONAL: steal

    #==============================================================#
    #                       Node : Schedule                        #
    #==============================================================#
    # node load high : 1m avg standard load > 100% for 3m
    - alert: NodeLoadHigh
      expr: node:ins:stdload1 > 1
      for: 1m
      labels: { level: 1, severity: WARN, category: node }
      annotations:
        summary: 'WARN NodeLoadHigh {{ $labels.ins }}@{{ $labels.instance }} {{ $value  | printf "%.2f" }}'
        description: |
          node:ins:stdload1[ins={{ $labels.ins }}] = {{ $value  | printf "%.2f" }} > 100%          


    #==============================================================#
    #                        Node : Memory                         #
    #==============================================================#
    # available memory < 10%
    - alert: NodeOutOfMem
      expr: node:ins:mem_avail < 0.10
      for: 1m
      labels: { level: 1, severity: WARN, category: node }
      annotations:
        summary: 'WARN NodeOutOfMem {{ $labels.ins }}@{{ $labels.instance }} {{ $value  | printf "%.2f" }}'
        description: |
          node:ins:mem_avail[ins={{ $labels.ins }}] = {{ $value  | printf "%.2f" }} < 10%          

    # commit ratio > 90%
    #- alert: NodeMemCommitRatioHigh
    #  expr: node:ins:mem_commit_ratio > 0.90
    #  for: 1m
    #  labels: { level: 1, severity: WARN, category: node }
    #  annotations:
    #    summary: 'WARN NodeMemCommitRatioHigh {{ $labels.ins }}@{{ $labels.instance }} {{ $value  | printf "%.2f" }}'
    #    description: |
    #      node:ins:mem_commit_ratio[ins={{ $labels.ins }}] = {{ $value  | printf "%.2f" }} > 90%

    # OPTIONAL: EDAC Errors

    #==============================================================#
    #                        Node : Swap                           #
    #==============================================================#
    # swap usage > 1%
    - alert: NodeMemSwapped
      expr: node:ins:swap_usage > 0.01
      for: 5m
      labels: { level: 1, severity: WARN, category: node }
      annotations:
        summary: 'WARN NodeMemSwapped {{ $labels.ins }}@{{ $labels.instance }} {{ $value  | printf "%.2f" }}'
        description: |
          node:ins:swap_usage[ins={{ $labels.ins }}] = {{ $value  | printf "%.2f" }} > 1%          

    #==============================================================#
    #                     Node : File System                       #
    #==============================================================#

    # filesystem usage > 90%
    - alert: NodeFsSpaceFull
      expr: node:fs:space_usage > 0.90
      for: 1m
      labels: { level: 1, severity: WARN, category: node }
      annotations:
        summary: 'WARN NodeFsSpaceFull {{ $labels.ins }}@{{ $labels.instance }} {{ $value  | printf "%.2f" }}'
        description: |
          node:fs:space_usage[ins={{ $labels.ins }}] = {{ $value  | printf "%.2f" }} > 90%          

    # inode usage > 90%
    - alert: NodeFsFilesFull
      expr: node:fs:inode_usage > 0.90
      for: 1m
      labels: { level: 1, severity: WARN, category: node }
      annotations:
        summary: 'WARN NodeFsFilesFull {{ $labels.ins }}@{{ $labels.instance }} {{ $value  | printf "%.2f" }}'
        description: |
          node:fs:inode_usage[ins={{ $labels.ins }}] = {{ $value  | printf "%.2f" }} > 90%          

    # file descriptor usage > 90%
    - alert: NodeFdFull
      expr: node:ins:fd_usage > 0.90
      for: 1m
      labels: { level: 1, severity: WARN, category: node }
      annotations:
        summary: 'WARN NodeFdFull {{ $labels.ins }}@{{ $labels.instance }} {{ $value  | printf "%.2f" }}'
        description: |
          node:ins:fd_usage[ins={{ $labels.ins }}] = {{ $value  | printf "%.2f" }} > 90%          

    # OPTIONAL: space predict 1d
    # OPTIONAL: filesystem read-only
    # OPTIONAL: fast release on disk space

    #==============================================================#
    #                          Node : Disk                         #
    #==============================================================#
    # read latency > 32ms (typical on pci-e ssd: 100µs)
    - alert: NodeDiskSlow
      expr: node:dev:disk_read_rt_1m{device="dfa"} > 0.032 or node:dev:disk_write_rt_1m{device="dfa"} > 0.032
      for: 1m
      labels: { level: 2, severity: INFO, category: node }
      annotations:
        summary: 'INFO NodeReadSlow {{ $labels.ins }}@{{ $labels.instance }} {{ $value  | printf "%.6f" }}'
        description: |
          node:dev:disk_read_rt_1m[ins={{ $labels.ins }}] = {{ $value  | printf "%.6f" }} > 32ms          

    # OPTIONAL: raid card failure
    # OPTIONAL: read/write traffic high
    # OPTIONAL: read/write latency high

    #==============================================================#
    #                        Node : Network                        #
    #==============================================================#
    # OPTIONAL: unusual network traffic
    # OPTIONAL: interface saturation high

    #==============================================================#
    #                        Node : Protocol                       #
    #==============================================================#

    # rate(node:ins:tcp_error[1m]) > 1
    - alert: NodeTcpErrHigh
      expr: rate(node:ins:tcp_error[1m]) > 1
      for: 1m
      labels: { level: 1, severity: WARN, category: node }
      annotations:
        summary: 'WARN NodeTcpErrHigh {{ $labels.ins }}@{{ $labels.instance }} {{ $value  | printf "%.2f" }}'
        description: |
          rate(node:ins:tcp_error{ins={{ $labels.ins }}}[1m]) = {{ $value  | printf "%.2f" }} > 1          

    # node:ins:tcp_retrans_ratio1m > 1e-4
    - alert: NodeTcpRetransHigh
      expr: node:ins:tcp_retrans_ratio1m > 1e-2
      for: 1m
      labels: { level: 2, severity: INFO, category: node }
      annotations:
        summary: 'INFO NodeTcpRetransHigh {{ $labels.ins }}@{{ $labels.instance }} {{ $value  | printf "%.6f" }}'
        description: |
          node:ins:tcp_retrans_ratio1m[ins={{ $labels.ins }}] = {{ $value  | printf "%.6f" }} > 1%          

    # OPTIONAL: tcp conn high
    # OPTIONAL: udp traffic high
    # OPTIONAL: conn track

    #==============================================================#
    #                          Node : Time                         #
    #==============================================================#

    - alert: NodeTimeDrift
      expr: node_timex_sync_status != 1
      for: 1m
      labels: { level: 1, severity: WARN, category: node }
      annotations:
        summary: 'WARN NodeTimeDrift {{ $labels.ins }}@{{ $labels.instance }}'
        description: |
          node_timex_status[ins={{ $labels.ins }}]) = {{ $value | printf "%.6f" }} != 0 or
          node_timex_sync_status[ins={{ $labels.ins }}]) = {{ $value | printf "%.6f" }} != 1          


    # time drift > 64ms
    # - alert: NodeTimeDrift
    #   expr: node:ins:time_drift > 0.064
    #   for: 1m
    #   labels: { level: 1, severity: WARN, category: node }
    #   annotations:
    #     summary: 'WARN NodeTimeDrift {{ $labels.ins }}@{{ $labels.instance }}'
    #     description: |
    #       abs(node_timex_offset_seconds)[ins={{ $labels.ins }}]) = {{ $value | printf "%.6f" }} > 64ms

7 - Metrics

Pigsty NODE module metric list

NODE module has 747 available metrics.

Metric Name Type Labels Description
ALERTS Unknown alertname, ip, level, severity, ins, job, alertstate, category, instance, cls N/A
ALERTS_FOR_STATE Unknown alertname, ip, level, severity, ins, job, category, instance, cls N/A
deprecated_flags_inuse_total Unknown instance, ins, job, ip, cls N/A
go_gc_duration_seconds summary quantile, instance, ins, job, ip, cls A summary of the pause duration of garbage collection cycles.
go_gc_duration_seconds_count Unknown instance, ins, job, ip, cls N/A
go_gc_duration_seconds_sum Unknown instance, ins, job, ip, cls N/A
go_goroutines gauge instance, ins, job, ip, cls Number of goroutines that currently exist.
go_info gauge version, instance, ins, job, ip, cls Information about the Go environment.
go_memstats_alloc_bytes gauge instance, ins, job, ip, cls Number of bytes allocated and still in use.
go_memstats_alloc_bytes_total counter instance, ins, job, ip, cls Total number of bytes allocated, even if freed.
go_memstats_buck_hash_sys_bytes gauge instance, ins, job, ip, cls Number of bytes used by the profiling bucket hash table.
go_memstats_frees_total counter instance, ins, job, ip, cls Total number of frees.
go_memstats_gc_sys_bytes gauge instance, ins, job, ip, cls Number of bytes used for garbage collection system metadata.
go_memstats_heap_alloc_bytes gauge instance, ins, job, ip, cls Number of heap bytes allocated and still in use.
go_memstats_heap_idle_bytes gauge instance, ins, job, ip, cls Number of heap bytes waiting to be used.
go_memstats_heap_inuse_bytes gauge instance, ins, job, ip, cls Number of heap bytes that are in use.
go_memstats_heap_objects gauge instance, ins, job, ip, cls Number of allocated objects.
go_memstats_heap_released_bytes gauge instance, ins, job, ip, cls Number of heap bytes released to OS.
go_memstats_heap_sys_bytes gauge instance, ins, job, ip, cls Number of heap bytes obtained from system.
go_memstats_last_gc_time_seconds gauge instance, ins, job, ip, cls Number of seconds since 1970 of last garbage collection.
go_memstats_lookups_total counter instance, ins, job, ip, cls Total number of pointer lookups.
go_memstats_mallocs_total counter instance, ins, job, ip, cls Total number of mallocs.
go_memstats_mcache_inuse_bytes gauge instance, ins, job, ip, cls Number of bytes in use by mcache structures.
go_memstats_mcache_sys_bytes gauge instance, ins, job, ip, cls Number of bytes used for mcache structures obtained from system.
go_memstats_mspan_inuse_bytes gauge instance, ins, job, ip, cls Number of bytes in use by mspan structures.
go_memstats_mspan_sys_bytes gauge instance, ins, job, ip, cls Number of bytes used for mspan structures obtained from system.
go_memstats_next_gc_bytes gauge instance, ins, job, ip, cls Number of heap bytes when next garbage collection will take place.
go_memstats_other_sys_bytes gauge instance, ins, job, ip, cls Number of bytes used for other system allocations.
go_memstats_stack_inuse_bytes gauge instance, ins, job, ip, cls Number of bytes in use by the stack allocator.
go_memstats_stack_sys_bytes gauge instance, ins, job, ip, cls Number of bytes obtained from system for stack allocator.
go_memstats_sys_bytes gauge instance, ins, job, ip, cls Number of bytes obtained from system.
go_threads gauge instance, ins, job, ip, cls Number of OS threads created.
haproxy:cls:usage Unknown job, cls N/A
haproxy:ins:uptime Unknown instance, ins, job, ip, cls N/A
haproxy:ins:usage Unknown instance, ins, job, ip, cls N/A
haproxy_backend_active_servers gauge proxy, instance, ins, job, ip, cls Total number of active UP servers with a non-zero weight
haproxy_backend_agg_check_status gauge state, proxy, instance, ins, job, ip, cls Backend’s aggregated gauge of servers’ state check status
haproxy_backend_agg_server_check_status gauge state, proxy, instance, ins, job, ip, cls [DEPRECATED] Backend’s aggregated gauge of servers’ status
haproxy_backend_agg_server_status gauge state, proxy, instance, ins, job, ip, cls Backend’s aggregated gauge of servers’ status
haproxy_backend_backup_servers gauge proxy, instance, ins, job, ip, cls Total number of backup UP servers with a non-zero weight
haproxy_backend_bytes_in_total counter proxy, instance, ins, job, ip, cls Total number of request bytes since process started
haproxy_backend_bytes_out_total counter proxy, instance, ins, job, ip, cls Total number of response bytes since process started
haproxy_backend_check_last_change_seconds gauge proxy, instance, ins, job, ip, cls How long ago the last server state changed, in seconds
haproxy_backend_check_up_down_total counter proxy, instance, ins, job, ip, cls Total number of failed checks causing UP to DOWN server transitions, per server/backend, since the worker process started
haproxy_backend_client_aborts_total counter proxy, instance, ins, job, ip, cls Total number of requests or connections aborted by the client since the worker process started
haproxy_backend_connect_time_average_seconds gauge proxy, instance, ins, job, ip, cls Avg. connect time for last 1024 successful connections.
haproxy_backend_connection_attempts_total counter proxy, instance, ins, job, ip, cls Total number of outgoing connection attempts on this backend/server since the worker process started
haproxy_backend_connection_errors_total counter proxy, instance, ins, job, ip, cls Total number of failed connections to server since the worker process started
haproxy_backend_connection_reuses_total counter proxy, instance, ins, job, ip, cls Total number of reused connection on this backend/server since the worker process started
haproxy_backend_current_queue gauge proxy, instance, ins, job, ip, cls Number of current queued connections
haproxy_backend_current_sessions gauge proxy, instance, ins, job, ip, cls Number of current sessions on the frontend, backend or server
haproxy_backend_downtime_seconds_total counter proxy, instance, ins, job, ip, cls Total time spent in DOWN state, for server or backend
haproxy_backend_failed_header_rewriting_total counter proxy, instance, ins, job, ip, cls Total number of failed HTTP header rewrites since the worker process started
haproxy_backend_http_cache_hits_total counter proxy, instance, ins, job, ip, cls Total number of HTTP requests not found in the cache on this frontend/backend since the worker process started
haproxy_backend_http_cache_lookups_total counter proxy, instance, ins, job, ip, cls Total number of HTTP requests looked up in the cache on this frontend/backend since the worker process started
haproxy_backend_http_comp_bytes_bypassed_total counter proxy, instance, ins, job, ip, cls Total number of bytes that bypassed HTTP compression for this object since the worker process started (CPU/memory/bandwidth limitation)
haproxy_backend_http_comp_bytes_in_total counter proxy, instance, ins, job, ip, cls Total number of bytes submitted to the HTTP compressor for this object since the worker process started
haproxy_backend_http_comp_bytes_out_total counter proxy, instance, ins, job, ip, cls Total number of bytes emitted by the HTTP compressor for this object since the worker process started
haproxy_backend_http_comp_responses_total counter proxy, instance, ins, job, ip, cls Total number of HTTP responses that were compressed for this object since the worker process started
haproxy_backend_http_requests_total counter proxy, instance, ins, job, ip, cls Total number of HTTP requests processed by this object since the worker process started
haproxy_backend_http_responses_total counter ip, proxy, ins, code, job, instance, cls Total number of HTTP responses with status 100-199 returned by this object since the worker process started
haproxy_backend_internal_errors_total counter proxy, instance, ins, job, ip, cls Total number of internal errors since process started
haproxy_backend_last_session_seconds gauge proxy, instance, ins, job, ip, cls How long ago some traffic was seen on this object on this worker process, in seconds
haproxy_backend_limit_sessions gauge proxy, instance, ins, job, ip, cls Frontend/listener/server’s maxconn, backend’s fullconn
haproxy_backend_loadbalanced_total counter proxy, instance, ins, job, ip, cls Total number of requests routed by load balancing since the worker process started (ignores queue pop and stickiness)
haproxy_backend_max_connect_time_seconds gauge proxy, instance, ins, job, ip, cls Maximum observed time spent waiting for a connection to complete
haproxy_backend_max_queue gauge proxy, instance, ins, job, ip, cls Highest value of queued connections encountered since process started
haproxy_backend_max_queue_time_seconds gauge proxy, instance, ins, job, ip, cls Maximum observed time spent in the queue
haproxy_backend_max_response_time_seconds gauge proxy, instance, ins, job, ip, cls Maximum observed time spent waiting for a server response
haproxy_backend_max_session_rate gauge proxy, instance, ins, job, ip, cls Highest value of sessions per second observed since the worker process started
haproxy_backend_max_sessions gauge proxy, instance, ins, job, ip, cls Highest value of current sessions encountered since process started
haproxy_backend_max_total_time_seconds gauge proxy, instance, ins, job, ip, cls Maximum observed total request+response time (request+queue+connect+response+processing)
haproxy_backend_queue_time_average_seconds gauge proxy, instance, ins, job, ip, cls Avg. queue time for last 1024 successful connections.
haproxy_backend_redispatch_warnings_total counter proxy, instance, ins, job, ip, cls Total number of server redispatches due to connection failures since the worker process started
haproxy_backend_requests_denied_total counter proxy, instance, ins, job, ip, cls Total number of denied requests since process started
haproxy_backend_response_errors_total counter proxy, instance, ins, job, ip, cls Total number of invalid responses since the worker process started
haproxy_backend_response_time_average_seconds gauge proxy, instance, ins, job, ip, cls Avg. response time for last 1024 successful connections.
haproxy_backend_responses_denied_total counter proxy, instance, ins, job, ip, cls Total number of denied responses since process started
haproxy_backend_retry_warnings_total counter proxy, instance, ins, job, ip, cls Total number of server connection retries since the worker process started
haproxy_backend_server_aborts_total counter proxy, instance, ins, job, ip, cls Total number of requests or connections aborted by the server since the worker process started
haproxy_backend_sessions_total counter proxy, instance, ins, job, ip, cls Total number of sessions since process started
haproxy_backend_status gauge state, proxy, instance, ins, job, ip, cls Current status of the service, per state label value.
haproxy_backend_total_time_average_seconds gauge proxy, instance, ins, job, ip, cls Avg. total time for last 1024 successful connections.
haproxy_backend_uweight gauge proxy, instance, ins, job, ip, cls Server’s user weight, or sum of active servers’ user weights for a backend
haproxy_backend_weight gauge proxy, instance, ins, job, ip, cls Server’s effective weight, or sum of active servers’ effective weights for a backend
haproxy_frontend_bytes_in_total counter proxy, instance, ins, job, ip, cls Total number of request bytes since process started
haproxy_frontend_bytes_out_total counter proxy, instance, ins, job, ip, cls Total number of response bytes since process started
haproxy_frontend_connections_rate_max gauge proxy, instance, ins, job, ip, cls Highest value of connections per second observed since the worker process started
haproxy_frontend_connections_total counter proxy, instance, ins, job, ip, cls Total number of new connections accepted on this frontend since the worker process started
haproxy_frontend_current_sessions gauge proxy, instance, ins, job, ip, cls Number of current sessions on the frontend, backend or server
haproxy_frontend_denied_connections_total counter proxy, instance, ins, job, ip, cls Total number of incoming connections blocked on a listener/frontend by a tcp-request connection rule since the worker process started
haproxy_frontend_denied_sessions_total counter proxy, instance, ins, job, ip, cls Total number of incoming sessions blocked on a listener/frontend by a tcp-request connection rule since the worker process started
haproxy_frontend_failed_header_rewriting_total counter proxy, instance, ins, job, ip, cls Total number of failed HTTP header rewrites since the worker process started
haproxy_frontend_http_cache_hits_total counter proxy, instance, ins, job, ip, cls Total number of HTTP requests not found in the cache on this frontend/backend since the worker process started
haproxy_frontend_http_cache_lookups_total counter proxy, instance, ins, job, ip, cls Total number of HTTP requests looked up in the cache on this frontend/backend since the worker process started
haproxy_frontend_http_comp_bytes_bypassed_total counter proxy, instance, ins, job, ip, cls Total number of bytes that bypassed HTTP compression for this object since the worker process started (CPU/memory/bandwidth limitation)
haproxy_frontend_http_comp_bytes_in_total counter proxy, instance, ins, job, ip, cls Total number of bytes submitted to the HTTP compressor for this object since the worker process started
haproxy_frontend_http_comp_bytes_out_total counter proxy, instance, ins, job, ip, cls Total number of bytes emitted by the HTTP compressor for this object since the worker process started
haproxy_frontend_http_comp_responses_total counter proxy, instance, ins, job, ip, cls Total number of HTTP responses that were compressed for this object since the worker process started
haproxy_frontend_http_requests_rate_max gauge proxy, instance, ins, job, ip, cls Highest value of http requests observed since the worker process started
haproxy_frontend_http_requests_total counter proxy, instance, ins, job, ip, cls Total number of HTTP requests processed by this object since the worker process started
haproxy_frontend_http_responses_total counter ip, proxy, ins, code, job, instance, cls Total number of HTTP responses with status 100-199 returned by this object since the worker process started
haproxy_frontend_intercepted_requests_total counter proxy, instance, ins, job, ip, cls Total number of HTTP requests intercepted on the frontend (redirects/stats/services) since the worker process started
haproxy_frontend_internal_errors_total counter proxy, instance, ins, job, ip, cls Total number of internal errors since process started
haproxy_frontend_limit_session_rate gauge proxy, instance, ins, job, ip, cls Limit on the number of sessions accepted in a second (frontend only, ‘rate-limit sessions’ setting)
haproxy_frontend_limit_sessions gauge proxy, instance, ins, job, ip, cls Frontend/listener/server’s maxconn, backend’s fullconn
haproxy_frontend_max_session_rate gauge proxy, instance, ins, job, ip, cls Highest value of sessions per second observed since the worker process started
haproxy_frontend_max_sessions gauge proxy, instance, ins, job, ip, cls Highest value of current sessions encountered since process started
haproxy_frontend_request_errors_total counter proxy, instance, ins, job, ip, cls Total number of invalid requests since process started
haproxy_frontend_requests_denied_total counter proxy, instance, ins, job, ip, cls Total number of denied requests since process started
haproxy_frontend_responses_denied_total counter proxy, instance, ins, job, ip, cls Total number of denied responses since process started
haproxy_frontend_sessions_total counter proxy, instance, ins, job, ip, cls Total number of sessions since process started
haproxy_frontend_status gauge state, proxy, instance, ins, job, ip, cls Current status of the service, per state label value.
haproxy_process_active_peers gauge instance, ins, job, ip, cls Current number of verified active peers connections on the current worker process
haproxy_process_build_info gauge version, instance, ins, job, ip, cls Build info
haproxy_process_busy_polling_enabled gauge instance, ins, job, ip, cls 1 if busy-polling is currently in use on the worker process, otherwise zero (config.busy-polling)
haproxy_process_bytes_out_rate gauge instance, ins, job, ip, cls Number of bytes emitted by current worker process over the last second
haproxy_process_bytes_out_total counter instance, ins, job, ip, cls Total number of bytes emitted by current worker process since started
haproxy_process_connected_peers gauge instance, ins, job, ip, cls Current number of peers having passed the connection step on the current worker process
haproxy_process_connections_total counter instance, ins, job, ip, cls Total number of connections on this worker process since started
haproxy_process_current_backend_ssl_key_rate gauge instance, ins, job, ip, cls Number of SSL keys created on backends in this worker process over the last second
haproxy_process_current_connection_rate gauge instance, ins, job, ip, cls Number of front connections created on this worker process over the last second
haproxy_process_current_connections gauge instance, ins, job, ip, cls Current number of connections on this worker process
haproxy_process_current_frontend_ssl_key_rate gauge instance, ins, job, ip, cls Number of SSL keys created on frontends in this worker process over the last second
haproxy_process_current_run_queue gauge instance, ins, job, ip, cls Total number of active tasks+tasklets in the current worker process
haproxy_process_current_session_rate gauge instance, ins, job, ip, cls Number of sessions created on this worker process over the last second
haproxy_process_current_ssl_connections gauge instance, ins, job, ip, cls Current number of SSL endpoints on this worker process (front+back)
haproxy_process_current_ssl_rate gauge instance, ins, job, ip, cls Number of SSL connections created on this worker process over the last second
haproxy_process_current_tasks gauge instance, ins, job, ip, cls Total number of tasks in the current worker process (active + sleeping)
haproxy_process_current_zlib_memory gauge instance, ins, job, ip, cls Amount of memory currently used by HTTP compression on the current worker process (in bytes)
haproxy_process_dropped_logs_total counter instance, ins, job, ip, cls Total number of dropped logs for current worker process since started
haproxy_process_failed_resolutions counter instance, ins, job, ip, cls Total number of failed DNS resolutions in current worker process since started
haproxy_process_frontend_ssl_reuse gauge instance, ins, job, ip, cls Percent of frontend SSL connections which did not require a new key
haproxy_process_hard_max_connections gauge instance, ins, job, ip, cls Hard limit on the number of per-process connections (imposed by Memmax_MB or Ulimit-n)
haproxy_process_http_comp_bytes_in_total counter instance, ins, job, ip, cls Number of bytes submitted to the HTTP compressor in this worker process over the last second
haproxy_process_http_comp_bytes_out_total counter instance, ins, job, ip, cls Number of bytes emitted by the HTTP compressor in this worker process over the last second
haproxy_process_idle_time_percent gauge instance, ins, job, ip, cls Percentage of last second spent waiting in the current worker thread
haproxy_process_jobs gauge instance, ins, job, ip, cls Current number of active jobs on the current worker process (frontend connections, master connections, listeners)
haproxy_process_limit_connection_rate gauge instance, ins, job, ip, cls Hard limit for ConnRate (global.maxconnrate)
haproxy_process_limit_http_comp gauge instance, ins, job, ip, cls Limit of CompressBpsOut beyond which HTTP compression is automatically disabled
haproxy_process_limit_session_rate gauge instance, ins, job, ip, cls Hard limit for SessRate (global.maxsessrate)
haproxy_process_limit_ssl_rate gauge instance, ins, job, ip, cls Hard limit for SslRate (global.maxsslrate)
haproxy_process_listeners gauge instance, ins, job, ip, cls Current number of active listeners on the current worker process
haproxy_process_max_backend_ssl_key_rate gauge instance, ins, job, ip, cls Highest SslBackendKeyRate reached on this worker process since started (in SSL keys per second)
haproxy_process_max_connection_rate gauge instance, ins, job, ip, cls Highest ConnRate reached on this worker process since started (in connections per second)
haproxy_process_max_connections gauge instance, ins, job, ip, cls Hard limit on the number of per-process connections (configured or imposed by Ulimit-n)
haproxy_process_max_fds gauge instance, ins, job, ip, cls Hard limit on the number of per-process file descriptors
haproxy_process_max_frontend_ssl_key_rate gauge instance, ins, job, ip, cls Highest SslFrontendKeyRate reached on this worker process since started (in SSL keys per second)
haproxy_process_max_memory_bytes gauge instance, ins, job, ip, cls Worker process’s hard limit on memory usage in byes (-m on command line)
haproxy_process_max_pipes gauge instance, ins, job, ip, cls Hard limit on the number of pipes for splicing, 0=unlimited
haproxy_process_max_session_rate gauge instance, ins, job, ip, cls Highest SessRate reached on this worker process since started (in sessions per second)
haproxy_process_max_sockets gauge instance, ins, job, ip, cls Hard limit on the number of per-process sockets
haproxy_process_max_ssl_connections gauge instance, ins, job, ip, cls Hard limit on the number of per-process SSL endpoints (front+back), 0=unlimited
haproxy_process_max_ssl_rate gauge instance, ins, job, ip, cls Highest SslRate reached on this worker process since started (in connections per second)
haproxy_process_max_zlib_memory gauge instance, ins, job, ip, cls Limit on the amount of memory used by HTTP compression above which it is automatically disabled (in bytes, see global.maxzlibmem)
haproxy_process_nbproc gauge instance, ins, job, ip, cls Number of started worker processes (historical, always 1)
haproxy_process_nbthread gauge instance, ins, job, ip, cls Number of started threads (global.nbthread)
haproxy_process_pipes_free_total counter instance, ins, job, ip, cls Current number of allocated and available pipes in this worker process
haproxy_process_pipes_used_total counter instance, ins, job, ip, cls Current number of pipes in use in this worker process
haproxy_process_pool_allocated_bytes gauge instance, ins, job, ip, cls Amount of memory allocated in pools (in bytes)
haproxy_process_pool_failures_total counter instance, ins, job, ip, cls Number of failed pool allocations since this worker was started
haproxy_process_pool_used_bytes gauge instance, ins, job, ip, cls Amount of pool memory currently used (in bytes)
haproxy_process_recv_logs_total counter instance, ins, job, ip, cls Total number of log messages received by log-forwarding listeners on this worker process since started
haproxy_process_relative_process_id gauge instance, ins, job, ip, cls Relative worker process number (1)
haproxy_process_requests_total counter instance, ins, job, ip, cls Total number of requests on this worker process since started
haproxy_process_spliced_bytes_out_total counter instance, ins, job, ip, cls Total number of bytes emitted by current worker process through a kernel pipe since started
haproxy_process_ssl_cache_lookups_total counter instance, ins, job, ip, cls Total number of SSL session ID lookups in the SSL session cache on this worker since started
haproxy_process_ssl_cache_misses_total counter instance, ins, job, ip, cls Total number of SSL session ID lookups that didn’t find a session in the SSL session cache on this worker since started
haproxy_process_ssl_connections_total counter instance, ins, job, ip, cls Total number of SSL endpoints on this worker process since started (front+back)
haproxy_process_start_time_seconds gauge instance, ins, job, ip, cls Start time in seconds
haproxy_process_stopping gauge instance, ins, job, ip, cls 1 if the worker process is currently stopping, otherwise zero
haproxy_process_unstoppable_jobs gauge instance, ins, job, ip, cls Current number of unstoppable jobs on the current worker process (master connections)
haproxy_process_uptime_seconds gauge instance, ins, job, ip, cls How long ago this worker process was started (seconds)
haproxy_server_bytes_in_total counter proxy, instance, ins, job, server, ip, cls Total number of request bytes since process started
haproxy_server_bytes_out_total counter proxy, instance, ins, job, server, ip, cls Total number of response bytes since process started
haproxy_server_check_code gauge proxy, instance, ins, job, server, ip, cls layer5-7 code, if available of the last health check.
haproxy_server_check_duration_seconds gauge proxy, instance, ins, job, server, ip, cls Total duration of the latest server health check, in seconds.
haproxy_server_check_failures_total counter proxy, instance, ins, job, server, ip, cls Total number of failed individual health checks per server/backend, since the worker process started
haproxy_server_check_last_change_seconds gauge proxy, instance, ins, job, server, ip, cls How long ago the last server state changed, in seconds
haproxy_server_check_status gauge state, proxy, instance, ins, job, server, ip, cls Status of last health check, per state label value.
haproxy_server_check_up_down_total counter proxy, instance, ins, job, server, ip, cls Total number of failed checks causing UP to DOWN server transitions, per server/backend, since the worker process started
haproxy_server_client_aborts_total counter proxy, instance, ins, job, server, ip, cls Total number of requests or connections aborted by the client since the worker process started
haproxy_server_connect_time_average_seconds gauge proxy, instance, ins, job, server, ip, cls Avg. connect time for last 1024 successful connections.
haproxy_server_connection_attempts_total counter proxy, instance, ins, job, server, ip, cls Total number of outgoing connection attempts on this backend/server since the worker process started
haproxy_server_connection_errors_total counter proxy, instance, ins, job, server, ip, cls Total number of failed connections to server since the worker process started
haproxy_server_connection_reuses_total counter proxy, instance, ins, job, server, ip, cls Total number of reused connection on this backend/server since the worker process started
haproxy_server_current_queue gauge proxy, instance, ins, job, server, ip, cls Number of current queued connections
haproxy_server_current_sessions gauge proxy, instance, ins, job, server, ip, cls Number of current sessions on the frontend, backend or server
haproxy_server_current_throttle gauge proxy, instance, ins, job, server, ip, cls Throttling ratio applied to a server’s maxconn and weight during the slowstart period (0 to 100%)
haproxy_server_downtime_seconds_total counter proxy, instance, ins, job, server, ip, cls Total time spent in DOWN state, for server or backend
haproxy_server_failed_header_rewriting_total counter proxy, instance, ins, job, server, ip, cls Total number of failed HTTP header rewrites since the worker process started
haproxy_server_idle_connections_current gauge proxy, instance, ins, job, server, ip, cls Current number of idle connections available for reuse on this server
haproxy_server_idle_connections_limit gauge proxy, instance, ins, job, server, ip, cls Limit on the number of available idle connections on this server (server ‘pool_max_conn’ directive)
haproxy_server_internal_errors_total counter proxy, instance, ins, job, server, ip, cls Total number of internal errors since process started
haproxy_server_last_session_seconds gauge proxy, instance, ins, job, server, ip, cls How long ago some traffic was seen on this object on this worker process, in seconds
haproxy_server_limit_sessions gauge proxy, instance, ins, job, server, ip, cls Frontend/listener/server’s maxconn, backend’s fullconn
haproxy_server_loadbalanced_total counter proxy, instance, ins, job, server, ip, cls Total number of requests routed by load balancing since the worker process started (ignores queue pop and stickiness)
haproxy_server_max_connect_time_seconds gauge proxy, instance, ins, job, server, ip, cls Maximum observed time spent waiting for a connection to complete
haproxy_server_max_queue gauge proxy, instance, ins, job, server, ip, cls Highest value of queued connections encountered since process started
haproxy_server_max_queue_time_seconds gauge proxy, instance, ins, job, server, ip, cls Maximum observed time spent in the queue
haproxy_server_max_response_time_seconds gauge proxy, instance, ins, job, server, ip, cls Maximum observed time spent waiting for a server response
haproxy_server_max_session_rate gauge proxy, instance, ins, job, server, ip, cls Highest value of sessions per second observed since the worker process started
haproxy_server_max_sessions gauge proxy, instance, ins, job, server, ip, cls Highest value of current sessions encountered since process started
haproxy_server_max_total_time_seconds gauge proxy, instance, ins, job, server, ip, cls Maximum observed total request+response time (request+queue+connect+response+processing)
haproxy_server_need_connections_current gauge proxy, instance, ins, job, server, ip, cls Estimated needed number of connections
haproxy_server_queue_limit gauge proxy, instance, ins, job, server, ip, cls Limit on the number of connections in queue, for servers only (maxqueue argument)
haproxy_server_queue_time_average_seconds gauge proxy, instance, ins, job, server, ip, cls Avg. queue time for last 1024 successful connections.
haproxy_server_redispatch_warnings_total counter proxy, instance, ins, job, server, ip, cls Total number of server redispatches due to connection failures since the worker process started
haproxy_server_response_errors_total counter proxy, instance, ins, job, server, ip, cls Total number of invalid responses since the worker process started
haproxy_server_response_time_average_seconds gauge proxy, instance, ins, job, server, ip, cls Avg. response time for last 1024 successful connections.
haproxy_server_responses_denied_total counter proxy, instance, ins, job, server, ip, cls Total number of denied responses since process started
haproxy_server_retry_warnings_total counter proxy, instance, ins, job, server, ip, cls Total number of server connection retries since the worker process started
haproxy_server_safe_idle_connections_current gauge proxy, instance, ins, job, server, ip, cls Current number of safe idle connections
haproxy_server_server_aborts_total counter proxy, instance, ins, job, server, ip, cls Total number of requests or connections aborted by the server since the worker process started
haproxy_server_sessions_total counter proxy, instance, ins, job, server, ip, cls Total number of sessions since process started
haproxy_server_status gauge state, proxy, instance, ins, job, server, ip, cls Current status of the service, per state label value.
haproxy_server_total_time_average_seconds gauge proxy, instance, ins, job, server, ip, cls Avg. total time for last 1024 successful connections.
haproxy_server_unsafe_idle_connections_current gauge proxy, instance, ins, job, server, ip, cls Current number of unsafe idle connections
haproxy_server_used_connections_current gauge proxy, instance, ins, job, server, ip, cls Current number of connections in use
haproxy_server_uweight gauge proxy, instance, ins, job, server, ip, cls Server’s user weight, or sum of active servers’ user weights for a backend
haproxy_server_weight gauge proxy, instance, ins, job, server, ip, cls Server’s effective weight, or sum of active servers’ effective weights for a backend
haproxy_up Unknown instance, ins, job, ip, cls N/A
inflight_requests gauge instance, ins, job, route, ip, cls, method Current number of inflight requests.
jaeger_tracer_baggage_restrictions_updates_total Unknown instance, ins, job, result, ip, cls N/A
jaeger_tracer_baggage_truncations_total Unknown instance, ins, job, ip, cls N/A
jaeger_tracer_baggage_updates_total Unknown instance, ins, job, result, ip, cls N/A
jaeger_tracer_finished_spans_total Unknown instance, ins, job, sampled, ip, cls N/A
jaeger_tracer_reporter_queue_length gauge instance, ins, job, ip, cls Current number of spans in the reporter queue
jaeger_tracer_reporter_spans_total Unknown instance, ins, job, result, ip, cls N/A
jaeger_tracer_sampler_queries_total Unknown instance, ins, job, result, ip, cls N/A
jaeger_tracer_sampler_updates_total Unknown instance, ins, job, result, ip, cls N/A
jaeger_tracer_span_context_decoding_errors_total Unknown instance, ins, job, ip, cls N/A
jaeger_tracer_started_spans_total Unknown instance, ins, job, sampled, ip, cls N/A
jaeger_tracer_throttled_debug_spans_total Unknown instance, ins, job, ip, cls N/A
jaeger_tracer_throttler_updates_total Unknown instance, ins, job, result, ip, cls N/A
jaeger_tracer_traces_total Unknown state, instance, ins, job, sampled, ip, cls N/A
loki_experimental_features_in_use_total Unknown instance, ins, job, ip, cls N/A
loki_internal_log_messages_total Unknown level, instance, ins, job, ip, cls N/A
loki_log_flushes_bucket Unknown instance, ins, job, le, ip, cls N/A
loki_log_flushes_count Unknown instance, ins, job, ip, cls N/A
loki_log_flushes_sum Unknown instance, ins, job, ip, cls N/A
loki_log_messages_total Unknown level, instance, ins, job, ip, cls N/A
loki_logql_querystats_duplicates_total Unknown instance, ins, job, ip, cls N/A
loki_logql_querystats_ingester_sent_lines_total Unknown instance, ins, job, ip, cls N/A
loki_querier_index_cache_corruptions_total Unknown instance, ins, job, ip, cls N/A
loki_querier_index_cache_encode_errors_total Unknown instance, ins, job, ip, cls N/A
loki_querier_index_cache_gets_total Unknown instance, ins, job, ip, cls N/A
loki_querier_index_cache_hits_total Unknown instance, ins, job, ip, cls N/A
loki_querier_index_cache_puts_total Unknown instance, ins, job, ip, cls N/A
net_conntrack_dialer_conn_attempted_total counter ip, ins, job, instance, cls, dialer_name Total number of connections attempted by the given dialer a given name.
net_conntrack_dialer_conn_closed_total counter ip, ins, job, instance, cls, dialer_name Total number of connections closed which originated from the dialer of a given name.
net_conntrack_dialer_conn_established_total counter ip, ins, job, instance, cls, dialer_name Total number of connections successfully established by the given dialer a given name.
net_conntrack_dialer_conn_failed_total counter ip, ins, job, reason, instance, cls, dialer_name Total number of connections failed to dial by the dialer a given name.
node:cls:avail_bytes Unknown job, cls N/A
node:cls:cpu_count Unknown job, cls N/A
node:cls:cpu_usage Unknown job, cls N/A
node:cls:cpu_usage_15m Unknown job, cls N/A
node:cls:cpu_usage_1m Unknown job, cls N/A
node:cls:cpu_usage_5m Unknown job, cls N/A
node:cls:disk_io_bytes_rate1m Unknown job, cls N/A
node:cls:disk_iops_1m Unknown job, cls N/A
node:cls:disk_mreads_rate1m Unknown job, cls N/A
node:cls:disk_mreads_ratio1m Unknown job, cls N/A
node:cls:disk_mwrites_rate1m Unknown job, cls N/A
node:cls:disk_mwrites_ratio1m Unknown job, cls N/A
node:cls:disk_read_bytes_rate1m Unknown job, cls N/A
node:cls:disk_reads_rate1m Unknown job, cls N/A
node:cls:disk_write_bytes_rate1m Unknown job, cls N/A
node:cls:disk_writes_rate1m Unknown job, cls N/A
node:cls:free_bytes Unknown job, cls N/A
node:cls:mem_usage Unknown job, cls N/A
node:cls:network_io_bytes_rate1m Unknown job, cls N/A
node:cls:network_rx_bytes_rate1m Unknown job, cls N/A
node:cls:network_rx_pps1m Unknown job, cls N/A
node:cls:network_tx_bytes_rate1m Unknown job, cls N/A
node:cls:network_tx_pps1m Unknown job, cls N/A
node:cls:size_bytes Unknown job, cls N/A
node:cls:space_usage Unknown job, cls N/A
node:cls:space_usage_max Unknown job, cls N/A
node:cls:stdload1 Unknown job, cls N/A
node:cls:stdload15 Unknown job, cls N/A
node:cls:stdload5 Unknown job, cls N/A
node:cls:time_drift_max Unknown job, cls N/A
node:cpu:idle_time_irate1m Unknown ip, ins, job, cpu, instance, cls N/A
node:cpu:sched_timeslices_rate1m Unknown ip, ins, job, cpu, instance, cls N/A
node:cpu:sched_wait_rate1m Unknown ip, ins, job, cpu, instance, cls N/A
node:cpu:time_irate1m Unknown ip, mode, ins, job, cpu, instance, cls N/A
node:cpu:total_time_irate1m Unknown ip, ins, job, cpu, instance, cls N/A
node:cpu:usage Unknown ip, ins, job, cpu, instance, cls N/A
node:cpu:usage_avg15m Unknown ip, ins, job, cpu, instance, cls N/A
node:cpu:usage_avg1m Unknown ip, ins, job, cpu, instance, cls N/A
node:cpu:usage_avg5m Unknown ip, ins, job, cpu, instance, cls N/A
node:dev:disk_avg_queue_size Unknown ip, device, ins, job, instance, cls N/A
node:dev:disk_io_batch_1m Unknown ip, device, ins, job, instance, cls N/A
node:dev:disk_io_bytes_rate1m Unknown ip, device, ins, job, instance, cls N/A
node:dev:disk_io_rt_1m Unknown ip, device, ins, job, instance, cls N/A
node:dev:disk_io_time_rate1m Unknown ip, device, ins, job, instance, cls N/A
node:dev:disk_iops_1m Unknown ip, device, ins, job, instance, cls N/A
node:dev:disk_mreads_rate1m Unknown ip, device, ins, job, instance, cls N/A
node:dev:disk_mreads_ratio1m Unknown ip, device, ins, job, instance, cls N/A
node:dev:disk_mwrites_rate1m Unknown ip, device, ins, job, instance, cls N/A
node:dev:disk_mwrites_ratio1m Unknown ip, device, ins, job, instance, cls N/A
node:dev:disk_read_batch_1m Unknown ip, device, ins, job, instance, cls N/A
node:dev:disk_read_bytes_rate1m Unknown ip, device, ins, job, instance, cls N/A
node:dev:disk_read_rt_1m Unknown ip, device, ins, job, instance, cls N/A
node:dev:disk_read_time_rate1m Unknown ip, device, ins, job, instance, cls N/A
node:dev:disk_reads_rate1m Unknown ip, device, ins, job, instance, cls N/A
node:dev:disk_util_1m Unknown ip, device, ins, job, instance, cls N/A
node:dev:disk_write_batch_1m Unknown ip, device, ins, job, instance, cls N/A
node:dev:disk_write_bytes_rate1m Unknown ip, device, ins, job, instance, cls N/A
node:dev:disk_write_rt_1m Unknown ip, device, ins, job, instance, cls N/A
node:dev:disk_write_time_rate1m Unknown ip, device, ins, job, instance, cls N/A
node:dev:disk_writes_rate1m Unknown ip, device, ins, job, instance, cls N/A
node:dev:network_io_bytes_rate1m Unknown ip, device, ins, job, instance, cls N/A
node:dev:network_rx_bytes_rate1m Unknown ip, device, ins, job, instance, cls N/A
node:dev:network_rx_pps1m Unknown ip, device, ins, job, instance, cls N/A
node:dev:network_tx_bytes_rate1m Unknown ip, device, ins, job, instance, cls N/A
node:dev:network_tx_pps1m Unknown ip, device, ins, job, instance, cls N/A
node:env:avail_bytes Unknown job N/A
node:env:cpu_count Unknown job N/A
node:env:cpu_usage Unknown job N/A
node:env:cpu_usage_15m Unknown job N/A
node:env:cpu_usage_1m Unknown job N/A
node:env:cpu_usage_5m Unknown job N/A
node:env:device_space_usage_max Unknown device, mountpoint, job, fstype N/A
node:env:free_bytes Unknown job N/A
node:env:mem_avail Unknown job N/A
node:env:mem_total Unknown job N/A
node:env:mem_usage Unknown job N/A
node:env:size_bytes Unknown job N/A
node:env:space_usage Unknown job N/A
node:env:stdload1 Unknown job N/A
node:env:stdload15 Unknown job N/A
node:env:stdload5 Unknown job N/A
node:fs:avail_bytes Unknown ip, device, mountpoint, ins, cls, job, instance, fstype N/A
node:fs:free_bytes Unknown ip, device, mountpoint, ins, cls, job, instance, fstype N/A
node:fs:inode_free Unknown ip, device, mountpoint, ins, cls, job, instance, fstype N/A
node:fs:inode_total Unknown ip, device, mountpoint, ins, cls, job, instance, fstype N/A
node:fs:inode_usage Unknown ip, device, mountpoint, ins, cls, job, instance, fstype N/A
node:fs:inode_used Unknown ip, device, mountpoint, ins, cls, job, instance, fstype N/A
node:fs:size_bytes Unknown ip, device, mountpoint, ins, cls, job, instance, fstype N/A
node:fs:space_deriv1h Unknown ip, device, mountpoint, ins, cls, job, instance, fstype N/A
node:fs:space_exhaust Unknown ip, device, mountpoint, ins, cls, job, instance, fstype N/A
node:fs:space_predict_1d Unknown ip, device, mountpoint, ins, cls, job, instance, fstype N/A
node:fs:space_usage Unknown ip, device, mountpoint, ins, cls, job, instance, fstype N/A
node:ins Unknown id, ip, ins, job, nodename, instance, cls N/A
node:ins:avail_bytes Unknown instance, ins, job, ip, cls N/A
node:ins:cpu_count Unknown instance, ins, job, ip, cls N/A
node:ins:cpu_usage Unknown instance, ins, job, ip, cls N/A
node:ins:cpu_usage_15m Unknown instance, ins, job, ip, cls N/A
node:ins:cpu_usage_1m Unknown instance, ins, job, ip, cls N/A
node:ins:cpu_usage_5m Unknown instance, ins, job, ip, cls N/A
node:ins:ctx_switch_rate1m Unknown instance, ins, job, ip, cls N/A
node:ins:disk_io_bytes_rate1m Unknown instance, ins, job, ip, cls N/A
node:ins:disk_iops_1m Unknown instance, ins, job, ip, cls N/A
node:ins:disk_mreads_rate1m Unknown instance, ins, job, ip, cls N/A
node:ins:disk_mreads_ratio1m Unknown instance, ins, job, ip, cls N/A
node:ins:disk_mwrites_rate1m Unknown instance, ins, job, ip, cls N/A
node:ins:disk_mwrites_ratio1m Unknown instance, ins, job, ip, cls N/A
node:ins:disk_read_bytes_rate1m Unknown instance, ins, job, ip, cls N/A
node:ins:disk_reads_rate1m Unknown instance, ins, job, ip, cls N/A
node:ins:disk_write_bytes_rate1m Unknown instance, ins, job, ip, cls N/A
node:ins:disk_writes_rate1m Unknown instance, ins, job, ip, cls N/A
node:ins:fd_alloc_rate1m Unknown instance, ins, job, ip, cls N/A
node:ins:fd_usage Unknown instance, ins, job, ip, cls N/A
node:ins:forks_rate1m Unknown instance, ins, job, ip, cls N/A
node:ins:free_bytes Unknown instance, ins, job, ip, cls N/A
node:ins:inode_usage Unknown instance, ins, job, ip, cls N/A
node:ins:interrupt_rate1m Unknown instance, ins, job, ip, cls N/A
node:ins:mem_avail Unknown instance, ins, job, ip, cls N/A
node:ins:mem_commit_ratio Unknown instance, ins, job, ip, cls N/A
node:ins:mem_kernel Unknown instance, ins, job, ip, cls N/A
node:ins:mem_rss Unknown instance, ins, job, ip, cls N/A
node:ins:mem_usage Unknown instance, ins, job, ip, cls N/A
node:ins:network_io_bytes_rate1m Unknown instance, ins, job, ip, cls N/A
node:ins:network_rx_bytes_rate1m Unknown instance, ins, job, ip, cls N/A
node:ins:network_rx_pps1m Unknown instance, ins, job, ip, cls N/A
node:ins:network_tx_bytes_rate1m Unknown instance, ins, job, ip, cls N/A
node:ins:network_tx_pps1m Unknown instance, ins, job, ip, cls N/A
node:ins:pagefault_rate1m Unknown instance, ins, job, ip, cls N/A
node:ins:pagein_rate1m Unknown instance, ins, job, ip, cls N/A
node:ins:pageout_rate1m Unknown instance, ins, job, ip, cls N/A
node:ins:pgmajfault_rate1m Unknown instance, ins, job, ip, cls N/A
node:ins:sched_wait_rate1m Unknown instance, ins, job, ip, cls N/A
node:ins:size_bytes Unknown instance, ins, job, ip, cls N/A
node:ins:space_usage_max Unknown instance, ins, job, ip, cls N/A
node:ins:stdload1 Unknown instance, ins, job, ip, cls N/A
node:ins:stdload15 Unknown instance, ins, job, ip, cls N/A
node:ins:stdload5 Unknown instance, ins, job, ip, cls N/A
node:ins:swap_usage Unknown instance, ins, job, ip, cls N/A
node:ins:swapin_rate1m Unknown instance, ins, job, ip, cls N/A
node:ins:swapout_rate1m Unknown instance, ins, job, ip, cls N/A
node:ins:tcp_active_opens_rate1m Unknown instance, ins, job, ip, cls N/A
node:ins:tcp_dropped_rate1m Unknown instance, ins, job, ip, cls N/A
node:ins:tcp_error Unknown instance, ins, job, ip, cls N/A
node:ins:tcp_error_rate1m Unknown instance, ins, job, ip, cls N/A
node:ins:tcp_insegs_rate1m Unknown instance, ins, job, ip, cls N/A
node:ins:tcp_outsegs_rate1m Unknown instance, ins, job, ip, cls N/A
node:ins:tcp_overflow_rate1m Unknown instance, ins, job, ip, cls N/A
node:ins:tcp_passive_opens_rate1m Unknown instance, ins, job, ip, cls N/A
node:ins:tcp_retrans_ratio1m Unknown instance, ins, job, ip, cls N/A
node:ins:tcp_retranssegs_rate1m Unknown instance, ins, job, ip, cls N/A
node:ins:tcp_segs_rate1m Unknown instance, ins, job, ip, cls N/A
node:ins:time_drift Unknown instance, ins, job, ip, cls N/A
node:ins:udp_in_rate1m Unknown instance, ins, job, ip, cls N/A
node:ins:udp_out_rate1m Unknown instance, ins, job, ip, cls N/A
node:ins:uptime Unknown instance, ins, job, ip, cls N/A
node_arp_entries gauge ip, device, ins, job, instance, cls ARP entries by device
node_boot_time_seconds gauge instance, ins, job, ip, cls Node boot time, in unixtime.
node_context_switches_total counter instance, ins, job, ip, cls Total number of context switches.
node_cooling_device_cur_state gauge instance, ins, job, type, ip, cls Current throttle state of the cooling device
node_cooling_device_max_state gauge instance, ins, job, type, ip, cls Maximum throttle state of the cooling device
node_cpu_guest_seconds_total counter ip, mode, ins, job, cpu, instance, cls Seconds the CPUs spent in guests (VMs) for each mode.
node_cpu_seconds_total counter ip, mode, ins, job, cpu, instance, cls Seconds the CPUs spent in each mode.
node_disk_discard_time_seconds_total counter ip, device, ins, job, instance, cls This is the total number of seconds spent by all discards.
node_disk_discarded_sectors_total counter ip, device, ins, job, instance, cls The total number of sectors discarded successfully.
node_disk_discards_completed_total counter ip, device, ins, job, instance, cls The total number of discards completed successfully.
node_disk_discards_merged_total counter ip, device, ins, job, instance, cls The total number of discards merged.
node_disk_filesystem_info gauge ip, usage, version, device, uuid, ins, type, job, instance, cls Info about disk filesystem.
node_disk_info gauge minor, ip, major, revision, device, model, serial, path, ins, job, instance, cls Info of /sys/block/<block_device>.
node_disk_io_now gauge ip, device, ins, job, instance, cls The number of I/Os currently in progress.
node_disk_io_time_seconds_total counter ip, device, ins, job, instance, cls Total seconds spent doing I/Os.
node_disk_io_time_weighted_seconds_total counter ip, device, ins, job, instance, cls The weighted # of seconds spent doing I/Os.
node_disk_read_bytes_total counter ip, device, ins, job, instance, cls The total number of bytes read successfully.
node_disk_read_time_seconds_total counter ip, device, ins, job, instance, cls The total number of seconds spent by all reads.
node_disk_reads_completed_total counter ip, device, ins, job, instance, cls The total number of reads completed successfully.
node_disk_reads_merged_total counter ip, device, ins, job, instance, cls The total number of reads merged.
node_disk_write_time_seconds_total counter ip, device, ins, job, instance, cls This is the total number of seconds spent by all writes.
node_disk_writes_completed_total counter ip, device, ins, job, instance, cls The total number of writes completed successfully.
node_disk_writes_merged_total counter ip, device, ins, job, instance, cls The number of writes merged.
node_disk_written_bytes_total counter ip, device, ins, job, instance, cls The total number of bytes written successfully.
node_dmi_info gauge bios_vendor, ip, product_family, product_version, product_uuid, system_vendor, bios_version, ins, bios_date, cls, job, product_name, instance, chassis_version, chassis_vendor, product_serial A metric with a constant ‘1’ value labeled by bios_date, bios_release, bios_vendor, bios_version, board_asset_tag, board_name, board_serial, board_vendor, board_version, chassis_asset_tag, chassis_serial, chassis_vendor, chassis_version, product_family, product_name, product_serial, product_sku, product_uuid, product_version, system_vendor if provided by DMI.
node_entropy_available_bits gauge instance, ins, job, ip, cls Bits of available entropy.
node_entropy_pool_size_bits gauge instance, ins, job, ip, cls Bits of entropy pool.
node_exporter_build_info gauge ip, version, revision, goversion, branch, ins, goarch, job, tags, instance, cls, goos A metric with a constant ‘1’ value labeled by version, revision, branch, goversion from which node_exporter was built, and the goos and goarch for the build.
node_filefd_allocated gauge instance, ins, job, ip, cls File descriptor statistics: allocated.
node_filefd_maximum gauge instance, ins, job, ip, cls File descriptor statistics: maximum.
node_filesystem_avail_bytes gauge ip, device, mountpoint, ins, cls, job, instance, fstype Filesystem space available to non-root users in bytes.
node_filesystem_device_error gauge ip, device, mountpoint, ins, cls, job, instance, fstype Whether an error occurred while getting statistics for the given device.
node_filesystem_files gauge ip, device, mountpoint, ins, cls, job, instance, fstype Filesystem total file nodes.
node_filesystem_files_free gauge ip, device, mountpoint, ins, cls, job, instance, fstype Filesystem total free file nodes.
node_filesystem_free_bytes gauge ip, device, mountpoint, ins, cls, job, instance, fstype Filesystem free space in bytes.
node_filesystem_readonly gauge ip, device, mountpoint, ins, cls, job, instance, fstype Filesystem read-only status.
node_filesystem_size_bytes gauge ip, device, mountpoint, ins, cls, job, instance, fstype Filesystem size in bytes.
node_forks_total counter instance, ins, job, ip, cls Total number of forks.
node_hwmon_chip_names gauge chip_name, ip, ins, chip, job, instance, cls Annotation metric for human-readable chip names
node_hwmon_energy_joule_total counter sensor, ip, ins, chip, job, instance, cls Hardware monitor for joules used so far (input)
node_hwmon_sensor_label gauge sensor, ip, ins, chip, job, label, instance, cls Label for given chip and sensor
node_intr_total counter instance, ins, job, ip, cls Total number of interrupts serviced.
node_ipvs_connections_total counter instance, ins, job, ip, cls The total number of connections made.
node_ipvs_incoming_bytes_total counter instance, ins, job, ip, cls The total amount of incoming data.
node_ipvs_incoming_packets_total counter instance, ins, job, ip, cls The total number of incoming packets.
node_ipvs_outgoing_bytes_total counter instance, ins, job, ip, cls The total amount of outgoing data.
node_ipvs_outgoing_packets_total counter instance, ins, job, ip, cls The total number of outgoing packets.
node_load1 gauge instance, ins, job, ip, cls 1m load average.
node_load15 gauge instance, ins, job, ip, cls 15m load average.
node_load5 gauge instance, ins, job, ip, cls 5m load average.
node_memory_Active_anon_bytes gauge instance, ins, job, ip, cls Memory information field Active_anon_bytes.
node_memory_Active_bytes gauge instance, ins, job, ip, cls Memory information field Active_bytes.
node_memory_Active_file_bytes gauge instance, ins, job, ip, cls Memory information field Active_file_bytes.
node_memory_AnonHugePages_bytes gauge instance, ins, job, ip, cls Memory information field AnonHugePages_bytes.
node_memory_AnonPages_bytes gauge instance, ins, job, ip, cls Memory information field AnonPages_bytes.
node_memory_Bounce_bytes gauge instance, ins, job, ip, cls Memory information field Bounce_bytes.
node_memory_Buffers_bytes gauge instance, ins, job, ip, cls Memory information field Buffers_bytes.
node_memory_Cached_bytes gauge instance, ins, job, ip, cls Memory information field Cached_bytes.
node_memory_CommitLimit_bytes gauge instance, ins, job, ip, cls Memory information field CommitLimit_bytes.
node_memory_Committed_AS_bytes gauge instance, ins, job, ip, cls Memory information field Committed_AS_bytes.
node_memory_DirectMap1G_bytes gauge instance, ins, job, ip, cls Memory information field DirectMap1G_bytes.
node_memory_DirectMap2M_bytes gauge instance, ins, job, ip, cls Memory information field DirectMap2M_bytes.
node_memory_DirectMap4k_bytes gauge instance, ins, job, ip, cls Memory information field DirectMap4k_bytes.
node_memory_Dirty_bytes gauge instance, ins, job, ip, cls Memory information field Dirty_bytes.
node_memory_FileHugePages_bytes gauge instance, ins, job, ip, cls Memory information field FileHugePages_bytes.
node_memory_FilePmdMapped_bytes gauge instance, ins, job, ip, cls Memory information field FilePmdMapped_bytes.
node_memory_HardwareCorrupted_bytes gauge instance, ins, job, ip, cls Memory information field HardwareCorrupted_bytes.
node_memory_HugePages_Free gauge instance, ins, job, ip, cls Memory information field HugePages_Free.
node_memory_HugePages_Rsvd gauge instance, ins, job, ip, cls Memory information field HugePages_Rsvd.
node_memory_HugePages_Surp gauge instance, ins, job, ip, cls Memory information field HugePages_Surp.
node_memory_HugePages_Total gauge instance, ins, job, ip, cls Memory information field HugePages_Total.
node_memory_Hugepagesize_bytes gauge instance, ins, job, ip, cls Memory information field Hugepagesize_bytes.
node_memory_Hugetlb_bytes gauge instance, ins, job, ip, cls Memory information field Hugetlb_bytes.
node_memory_Inactive_anon_bytes gauge instance, ins, job, ip, cls Memory information field Inactive_anon_bytes.
node_memory_Inactive_bytes gauge instance, ins, job, ip, cls Memory information field Inactive_bytes.
node_memory_Inactive_file_bytes gauge instance, ins, job, ip, cls Memory information field Inactive_file_bytes.
node_memory_KReclaimable_bytes gauge instance, ins, job, ip, cls Memory information field KReclaimable_bytes.
node_memory_KernelStack_bytes gauge instance, ins, job, ip, cls Memory information field KernelStack_bytes.
node_memory_Mapped_bytes gauge instance, ins, job, ip, cls Memory information field Mapped_bytes.
node_memory_MemAvailable_bytes gauge instance, ins, job, ip, cls Memory information field MemAvailable_bytes.
node_memory_MemFree_bytes gauge instance, ins, job, ip, cls Memory information field MemFree_bytes.
node_memory_MemTotal_bytes gauge instance, ins, job, ip, cls Memory information field MemTotal_bytes.
node_memory_Mlocked_bytes gauge instance, ins, job, ip, cls Memory information field Mlocked_bytes.
node_memory_NFS_Unstable_bytes gauge instance, ins, job, ip, cls Memory information field NFS_Unstable_bytes.
node_memory_PageTables_bytes gauge instance, ins, job, ip, cls Memory information field PageTables_bytes.
node_memory_Percpu_bytes gauge instance, ins, job, ip, cls Memory information field Percpu_bytes.
node_memory_SReclaimable_bytes gauge instance, ins, job, ip, cls Memory information field SReclaimable_bytes.
node_memory_SUnreclaim_bytes gauge instance, ins, job, ip, cls Memory information field SUnreclaim_bytes.
node_memory_ShmemHugePages_bytes gauge instance, ins, job, ip, cls Memory information field ShmemHugePages_bytes.
node_memory_ShmemPmdMapped_bytes gauge instance, ins, job, ip, cls Memory information field ShmemPmdMapped_bytes.
node_memory_Shmem_bytes gauge instance, ins, job, ip, cls Memory information field Shmem_bytes.
node_memory_Slab_bytes gauge instance, ins, job, ip, cls Memory information field Slab_bytes.
node_memory_SwapCached_bytes gauge instance, ins, job, ip, cls Memory information field SwapCached_bytes.
node_memory_SwapFree_bytes gauge instance, ins, job, ip, cls Memory information field SwapFree_bytes.
node_memory_SwapTotal_bytes gauge instance, ins, job, ip, cls Memory information field SwapTotal_bytes.
node_memory_Unevictable_bytes gauge instance, ins, job, ip, cls Memory information field Unevictable_bytes.
node_memory_VmallocChunk_bytes gauge instance, ins, job, ip, cls Memory information field VmallocChunk_bytes.
node_memory_VmallocTotal_bytes gauge instance, ins, job, ip, cls Memory information field VmallocTotal_bytes.
node_memory_VmallocUsed_bytes gauge instance, ins, job, ip, cls Memory information field VmallocUsed_bytes.
node_memory_WritebackTmp_bytes gauge instance, ins, job, ip, cls Memory information field WritebackTmp_bytes.
node_memory_Writeback_bytes gauge instance, ins, job, ip, cls Memory information field Writeback_bytes.
node_netstat_Icmp6_InErrors unknown instance, ins, job, ip, cls Statistic Icmp6InErrors.
node_netstat_Icmp6_InMsgs unknown instance, ins, job, ip, cls Statistic Icmp6InMsgs.
node_netstat_Icmp6_OutMsgs unknown instance, ins, job, ip, cls Statistic Icmp6OutMsgs.
node_netstat_Icmp_InErrors unknown instance, ins, job, ip, cls Statistic IcmpInErrors.
node_netstat_Icmp_InMsgs unknown instance, ins, job, ip, cls Statistic IcmpInMsgs.
node_netstat_Icmp_OutMsgs unknown instance, ins, job, ip, cls Statistic IcmpOutMsgs.
node_netstat_Ip6_InOctets unknown instance, ins, job, ip, cls Statistic Ip6InOctets.
node_netstat_Ip6_OutOctets unknown instance, ins, job, ip, cls Statistic Ip6OutOctets.
node_netstat_IpExt_InOctets unknown instance, ins, job, ip, cls Statistic IpExtInOctets.
node_netstat_IpExt_OutOctets unknown instance, ins, job, ip, cls Statistic IpExtOutOctets.
node_netstat_Ip_Forwarding unknown instance, ins, job, ip, cls Statistic IpForwarding.
node_netstat_TcpExt_ListenDrops unknown instance, ins, job, ip, cls Statistic TcpExtListenDrops.
node_netstat_TcpExt_ListenOverflows unknown instance, ins, job, ip, cls Statistic TcpExtListenOverflows.
node_netstat_TcpExt_SyncookiesFailed unknown instance, ins, job, ip, cls Statistic TcpExtSyncookiesFailed.
node_netstat_TcpExt_SyncookiesRecv unknown instance, ins, job, ip, cls Statistic TcpExtSyncookiesRecv.
node_netstat_TcpExt_SyncookiesSent unknown instance, ins, job, ip, cls Statistic TcpExtSyncookiesSent.
node_netstat_TcpExt_TCPSynRetrans unknown instance, ins, job, ip, cls Statistic TcpExtTCPSynRetrans.
node_netstat_TcpExt_TCPTimeouts unknown instance, ins, job, ip, cls Statistic TcpExtTCPTimeouts.
node_netstat_Tcp_ActiveOpens unknown instance, ins, job, ip, cls Statistic TcpActiveOpens.
node_netstat_Tcp_CurrEstab unknown instance, ins, job, ip, cls Statistic TcpCurrEstab.
node_netstat_Tcp_InErrs unknown instance, ins, job, ip, cls Statistic TcpInErrs.
node_netstat_Tcp_InSegs unknown instance, ins, job, ip, cls Statistic TcpInSegs.
node_netstat_Tcp_OutRsts unknown instance, ins, job, ip, cls Statistic TcpOutRsts.
node_netstat_Tcp_OutSegs unknown instance, ins, job, ip, cls Statistic TcpOutSegs.
node_netstat_Tcp_PassiveOpens unknown instance, ins, job, ip, cls Statistic TcpPassiveOpens.
node_netstat_Tcp_RetransSegs unknown instance, ins, job, ip, cls Statistic TcpRetransSegs.
node_netstat_Udp6_InDatagrams unknown instance, ins, job, ip, cls Statistic Udp6InDatagrams.
node_netstat_Udp6_InErrors unknown instance, ins, job, ip, cls Statistic Udp6InErrors.
node_netstat_Udp6_NoPorts unknown instance, ins, job, ip, cls Statistic Udp6NoPorts.
node_netstat_Udp6_OutDatagrams unknown instance, ins, job, ip, cls Statistic Udp6OutDatagrams.
node_netstat_Udp6_RcvbufErrors unknown instance, ins, job, ip, cls Statistic Udp6RcvbufErrors.
node_netstat_Udp6_SndbufErrors unknown instance, ins, job, ip, cls Statistic Udp6SndbufErrors.
node_netstat_UdpLite6_InErrors unknown instance, ins, job, ip, cls Statistic UdpLite6InErrors.
node_netstat_UdpLite_InErrors unknown instance, ins, job, ip, cls Statistic UdpLiteInErrors.
node_netstat_Udp_InDatagrams unknown instance, ins, job, ip, cls Statistic UdpInDatagrams.
node_netstat_Udp_InErrors unknown instance, ins, job, ip, cls Statistic UdpInErrors.
node_netstat_Udp_NoPorts unknown instance, ins, job, ip, cls Statistic UdpNoPorts.
node_netstat_Udp_OutDatagrams unknown instance, ins, job, ip, cls Statistic UdpOutDatagrams.
node_netstat_Udp_RcvbufErrors unknown instance, ins, job, ip, cls Statistic UdpRcvbufErrors.
node_netstat_Udp_SndbufErrors unknown instance, ins, job, ip, cls Statistic UdpSndbufErrors.
node_network_address_assign_type gauge ip, device, ins, job, instance, cls Network device property: address_assign_type
node_network_carrier gauge ip, device, ins, job, instance, cls Network device property: carrier
node_network_carrier_changes_total counter ip, device, ins, job, instance, cls Network device property: carrier_changes_total
node_network_carrier_down_changes_total counter ip, device, ins, job, instance, cls Network device property: carrier_down_changes_total
node_network_carrier_up_changes_total counter ip, device, ins, job, instance, cls Network device property: carrier_up_changes_total
node_network_device_id gauge ip, device, ins, job, instance, cls Network device property: device_id
node_network_dormant gauge ip, device, ins, job, instance, cls Network device property: dormant
node_network_flags gauge ip, device, ins, job, instance, cls Network device property: flags
node_network_iface_id gauge ip, device, ins, job, instance, cls Network device property: iface_id
node_network_iface_link gauge ip, device, ins, job, instance, cls Network device property: iface_link
node_network_iface_link_mode gauge ip, device, ins, job, instance, cls Network device property: iface_link_mode
node_network_info gauge broadcast, ip, device, operstate, ins, job, adminstate, duplex, address, instance, cls Non-numeric data from /sys/class/net/, value is always 1.
node_network_mtu_bytes gauge ip, device, ins, job, instance, cls Network device property: mtu_bytes
node_network_name_assign_type gauge ip, device, ins, job, instance, cls Network device property: name_assign_type
node_network_net_dev_group gauge ip, device, ins, job, instance, cls Network device property: net_dev_group
node_network_protocol_type gauge ip, device, ins, job, instance, cls Network device property: protocol_type
node_network_receive_bytes_total counter ip, device, ins, job, instance, cls Network device statistic receive_bytes.
node_network_receive_compressed_total counter ip, device, ins, job, instance, cls Network device statistic receive_compressed.
node_network_receive_drop_total counter ip, device, ins, job, instance, cls Network device statistic receive_drop.
node_network_receive_errs_total counter ip, device, ins, job, instance, cls Network device statistic receive_errs.
node_network_receive_fifo_total counter ip, device, ins, job, instance, cls Network device statistic receive_fifo.
node_network_receive_frame_total counter ip, device, ins, job, instance, cls Network device statistic receive_frame.
node_network_receive_multicast_total counter ip, device, ins, job, instance, cls Network device statistic receive_multicast.
node_network_receive_nohandler_total counter ip, device, ins, job, instance, cls Network device statistic receive_nohandler.
node_network_receive_packets_total counter ip, device, ins, job, instance, cls Network device statistic receive_packets.
node_network_speed_bytes gauge ip, device, ins, job, instance, cls Network device property: speed_bytes
node_network_transmit_bytes_total counter ip, device, ins, job, instance, cls Network device statistic transmit_bytes.
node_network_transmit_carrier_total counter ip, device, ins, job, instance, cls Network device statistic transmit_carrier.
node_network_transmit_colls_total counter ip, device, ins, job, instance, cls Network device statistic transmit_colls.
node_network_transmit_compressed_total counter ip, device, ins, job, instance, cls Network device statistic transmit_compressed.
node_network_transmit_drop_total counter ip, device, ins, job, instance, cls Network device statistic transmit_drop.
node_network_transmit_errs_total counter ip, device, ins, job, instance, cls Network device statistic transmit_errs.
node_network_transmit_fifo_total counter ip, device, ins, job, instance, cls Network device statistic transmit_fifo.
node_network_transmit_packets_total counter ip, device, ins, job, instance, cls Network device statistic transmit_packets.
node_network_transmit_queue_length gauge ip, device, ins, job, instance, cls Network device property: transmit_queue_length
node_network_up gauge ip, device, ins, job, instance, cls Value is 1 if operstate is ‘up’, 0 otherwise.
node_nf_conntrack_entries gauge instance, ins, job, ip, cls Number of currently allocated flow entries for connection tracking.
node_nf_conntrack_entries_limit gauge instance, ins, job, ip, cls Maximum size of connection tracking table.
node_nf_conntrack_stat_drop gauge instance, ins, job, ip, cls Number of packets dropped due to conntrack failure.
node_nf_conntrack_stat_early_drop gauge instance, ins, job, ip, cls Number of dropped conntrack entries to make room for new ones, if maximum table size was reached.
node_nf_conntrack_stat_found gauge instance, ins, job, ip, cls Number of searched entries which were successful.
node_nf_conntrack_stat_ignore gauge instance, ins, job, ip, cls Number of packets seen which are already connected to a conntrack entry.
node_nf_conntrack_stat_insert gauge instance, ins, job, ip, cls Number of entries inserted into the list.
node_nf_conntrack_stat_insert_failed gauge instance, ins, job, ip, cls Number of entries for which list insertion was attempted but failed.
node_nf_conntrack_stat_invalid gauge instance, ins, job, ip, cls Number of packets seen which can not be tracked.
node_nf_conntrack_stat_search_restart gauge instance, ins, job, ip, cls Number of conntrack table lookups which had to be restarted due to hashtable resizes.
node_os_info gauge id, ip, version, version_id, ins, instance, job, pretty_name, id_like, cls A metric with a constant ‘1’ value labeled by build_id, id, id_like, image_id, image_version, name, pretty_name, variant, variant_id, version, version_codename, version_id.
node_os_version gauge id, ip, ins, instance, job, id_like, cls Metric containing the major.minor part of the OS version.
node_processes_max_processes gauge instance, ins, job, ip, cls Number of max PIDs limit
node_processes_max_threads gauge instance, ins, job, ip, cls Limit of threads in the system
node_processes_pids gauge instance, ins, job, ip, cls Number of PIDs
node_processes_state gauge state, instance, ins, job, ip, cls Number of processes in each state.
node_processes_threads gauge instance, ins, job, ip, cls Allocated threads in system
node_processes_threads_state gauge instance, ins, job, thread_state, ip, cls Number of threads in each state.
node_procs_blocked gauge instance, ins, job, ip, cls Number of processes blocked waiting for I/O to complete.
node_procs_running gauge instance, ins, job, ip, cls Number of processes in runnable state.
node_schedstat_running_seconds_total counter ip, ins, job, cpu, instance, cls Number of seconds CPU spent running a process.
node_schedstat_timeslices_total counter ip, ins, job, cpu, instance, cls Number of timeslices executed by CPU.
node_schedstat_waiting_seconds_total counter ip, ins, job, cpu, instance, cls Number of seconds spent by processing waiting for this CPU.
node_scrape_collector_duration_seconds gauge ip, collector, ins, job, instance, cls node_exporter: Duration of a collector scrape.
node_scrape_collector_success gauge ip, collector, ins, job, instance, cls node_exporter: Whether a collector succeeded.
node_selinux_enabled gauge instance, ins, job, ip, cls SELinux is enabled, 1 is true, 0 is false
node_sockstat_FRAG6_inuse gauge instance, ins, job, ip, cls Number of FRAG6 sockets in state inuse.
node_sockstat_FRAG6_memory gauge instance, ins, job, ip, cls Number of FRAG6 sockets in state memory.
node_sockstat_FRAG_inuse gauge instance, ins, job, ip, cls Number of FRAG sockets in state inuse.
node_sockstat_FRAG_memory gauge instance, ins, job, ip, cls Number of FRAG sockets in state memory.
node_sockstat_RAW6_inuse gauge instance, ins, job, ip, cls Number of RAW6 sockets in state inuse.
node_sockstat_RAW_inuse gauge instance, ins, job, ip, cls Number of RAW sockets in state inuse.
node_sockstat_TCP6_inuse gauge instance, ins, job, ip, cls Number of TCP6 sockets in state inuse.
node_sockstat_TCP_alloc gauge instance, ins, job, ip, cls Number of TCP sockets in state alloc.
node_sockstat_TCP_inuse gauge instance, ins, job, ip, cls Number of TCP sockets in state inuse.
node_sockstat_TCP_mem gauge instance, ins, job, ip, cls Number of TCP sockets in state mem.
node_sockstat_TCP_mem_bytes gauge instance, ins, job, ip, cls Number of TCP sockets in state mem_bytes.
node_sockstat_TCP_orphan gauge instance, ins, job, ip, cls Number of TCP sockets in state orphan.
node_sockstat_TCP_tw gauge instance, ins, job, ip, cls Number of TCP sockets in state tw.
node_sockstat_UDP6_inuse gauge instance, ins, job, ip, cls Number of UDP6 sockets in state inuse.
node_sockstat_UDPLITE6_inuse gauge instance, ins, job, ip, cls Number of UDPLITE6 sockets in state inuse.
node_sockstat_UDPLITE_inuse gauge instance, ins, job, ip, cls Number of UDPLITE sockets in state inuse.
node_sockstat_UDP_inuse gauge instance, ins, job, ip, cls Number of UDP sockets in state inuse.
node_sockstat_UDP_mem gauge instance, ins, job, ip, cls Number of UDP sockets in state mem.
node_sockstat_UDP_mem_bytes gauge instance, ins, job, ip, cls Number of UDP sockets in state mem_bytes.
node_sockstat_sockets_used gauge instance, ins, job, ip, cls Number of IPv4 sockets in use.
node_tcp_connection_states gauge state, instance, ins, job, ip, cls Number of connection states.
node_textfile_scrape_error gauge instance, ins, job, ip, cls 1 if there was an error opening or reading a file, 0 otherwise
node_time_clocksource_available_info gauge ip, device, ins, clocksource, job, instance, cls Available clocksources read from ‘/sys/devices/system/clocksource’.
node_time_clocksource_current_info gauge ip, device, ins, clocksource, job, instance, cls Current clocksource read from ‘/sys/devices/system/clocksource’.
node_time_seconds gauge instance, ins, job, ip, cls System time in seconds since epoch (1970).
node_time_zone_offset_seconds gauge instance, ins, job, time_zone, ip, cls System time zone offset in seconds.
node_timex_estimated_error_seconds gauge instance, ins, job, ip, cls Estimated error in seconds.
node_timex_frequency_adjustment_ratio gauge instance, ins, job, ip, cls Local clock frequency adjustment.
node_timex_loop_time_constant gauge instance, ins, job, ip, cls Phase-locked loop time constant.
node_timex_maxerror_seconds gauge instance, ins, job, ip, cls Maximum error in seconds.
node_timex_offset_seconds gauge instance, ins, job, ip, cls Time offset in between local system and reference clock.
node_timex_pps_calibration_total counter instance, ins, job, ip, cls Pulse per second count of calibration intervals.
node_timex_pps_error_total counter instance, ins, job, ip, cls Pulse per second count of calibration errors.
node_timex_pps_frequency_hertz gauge instance, ins, job, ip, cls Pulse per second frequency.
node_timex_pps_jitter_seconds gauge instance, ins, job, ip, cls Pulse per second jitter.
node_timex_pps_jitter_total counter instance, ins, job, ip, cls Pulse per second count of jitter limit exceeded events.
node_timex_pps_shift_seconds gauge instance, ins, job, ip, cls Pulse per second interval duration.
node_timex_pps_stability_exceeded_total counter instance, ins, job, ip, cls Pulse per second count of stability limit exceeded events.
node_timex_pps_stability_hertz gauge instance, ins, job, ip, cls Pulse per second stability, average of recent frequency changes.
node_timex_status gauge instance, ins, job, ip, cls Value of the status array bits.
node_timex_sync_status gauge instance, ins, job, ip, cls Is clock synchronized to a reliable server (1 = yes, 0 = no).
node_timex_tai_offset_seconds gauge instance, ins, job, ip, cls International Atomic Time (TAI) offset.
node_timex_tick_seconds gauge instance, ins, job, ip, cls Seconds between clock ticks.
node_udp_queues gauge ip, queue, ins, job, exported_ip, instance, cls Number of allocated memory in the kernel for UDP datagrams in bytes.
node_uname_info gauge ip, sysname, version, domainname, release, ins, job, nodename, instance, cls, machine Labeled system information as provided by the uname system call.
node_up Unknown instance, ins, job, ip, cls N/A
node_vmstat_oom_kill unknown instance, ins, job, ip, cls /proc/vmstat information field oom_kill.
node_vmstat_pgfault unknown instance, ins, job, ip, cls /proc/vmstat information field pgfault.
node_vmstat_pgmajfault unknown instance, ins, job, ip, cls /proc/vmstat information field pgmajfault.
node_vmstat_pgpgin unknown instance, ins, job, ip, cls /proc/vmstat information field pgpgin.
node_vmstat_pgpgout unknown instance, ins, job, ip, cls /proc/vmstat information field pgpgout.
node_vmstat_pswpin unknown instance, ins, job, ip, cls /proc/vmstat information field pswpin.
node_vmstat_pswpout unknown instance, ins, job, ip, cls /proc/vmstat information field pswpout.
process_cpu_seconds_total counter instance, ins, job, ip, cls Total user and system CPU time spent in seconds.
process_max_fds gauge instance, ins, job, ip, cls Maximum number of open file descriptors.
process_open_fds gauge instance, ins, job, ip, cls Number of open file descriptors.
process_resident_memory_bytes gauge instance, ins, job, ip, cls Resident memory size in bytes.
process_start_time_seconds gauge instance, ins, job, ip, cls Start time of the process since unix epoch in seconds.
process_virtual_memory_bytes gauge instance, ins, job, ip, cls Virtual memory size in bytes.
process_virtual_memory_max_bytes gauge instance, ins, job, ip, cls Maximum amount of virtual memory available in bytes.
prometheus_remote_storage_exemplars_in_total counter instance, ins, job, ip, cls Exemplars in to remote storage, compare to exemplars out for queue managers.
prometheus_remote_storage_histograms_in_total counter instance, ins, job, ip, cls HistogramSamples in to remote storage, compare to histograms out for queue managers.
prometheus_remote_storage_samples_in_total counter instance, ins, job, ip, cls Samples in to remote storage, compare to samples out for queue managers.
prometheus_remote_storage_string_interner_zero_reference_releases_total counter instance, ins, job, ip, cls The number of times release has been called for strings that are not interned.
prometheus_sd_azure_failures_total counter instance, ins, job, ip, cls Number of Azure service discovery refresh failures.
prometheus_sd_consul_rpc_duration_seconds summary ip, call, quantile, ins, job, instance, cls, endpoint The duration of a Consul RPC call in seconds.
prometheus_sd_consul_rpc_duration_seconds_count Unknown ip, call, ins, job, instance, cls, endpoint N/A
prometheus_sd_consul_rpc_duration_seconds_sum Unknown ip, call, ins, job, instance, cls, endpoint N/A
prometheus_sd_consul_rpc_failures_total counter instance, ins, job, ip, cls The number of Consul RPC call failures.
prometheus_sd_consulagent_rpc_duration_seconds summary ip, call, quantile, ins, job, instance, cls, endpoint The duration of a Consul Agent RPC call in seconds.
prometheus_sd_consulagent_rpc_duration_seconds_count Unknown ip, call, ins, job, instance, cls, endpoint N/A
prometheus_sd_consulagent_rpc_duration_seconds_sum Unknown ip, call, ins, job, instance, cls, endpoint N/A
prometheus_sd_consulagent_rpc_failures_total Unknown instance, ins, job, ip, cls N/A
prometheus_sd_dns_lookup_failures_total counter instance, ins, job, ip, cls The number of DNS-SD lookup failures.
prometheus_sd_dns_lookups_total counter instance, ins, job, ip, cls The number of DNS-SD lookups.
prometheus_sd_file_read_errors_total counter instance, ins, job, ip, cls The number of File-SD read errors.
prometheus_sd_file_scan_duration_seconds summary quantile, instance, ins, job, ip, cls The duration of the File-SD scan in seconds.
prometheus_sd_file_scan_duration_seconds_count Unknown instance, ins, job, ip, cls N/A
prometheus_sd_file_scan_duration_seconds_sum Unknown instance, ins, job, ip, cls N/A
prometheus_sd_file_watcher_errors_total counter instance, ins, job, ip, cls The number of File-SD errors caused by filesystem watch failures.
prometheus_sd_kubernetes_events_total counter ip, event, ins, job, role, instance, cls The number of Kubernetes events handled.
prometheus_target_scrape_pool_exceeded_label_limits_total counter instance, ins, job, ip, cls Total number of times scrape pools hit the label limits, during sync or config reload.
prometheus_target_scrape_pool_exceeded_target_limit_total counter instance, ins, job, ip, cls Total number of times scrape pools hit the target limit, during sync or config reload.
prometheus_target_scrape_pool_reloads_failed_total counter instance, ins, job, ip, cls Total number of failed scrape pool reloads.
prometheus_target_scrape_pool_reloads_total counter instance, ins, job, ip, cls Total number of scrape pool reloads.
prometheus_target_scrape_pools_failed_total counter instance, ins, job, ip, cls Total number of scrape pool creations that failed.
prometheus_target_scrape_pools_total counter instance, ins, job, ip, cls Total number of scrape pool creation attempts.
prometheus_target_scrapes_cache_flush_forced_total counter instance, ins, job, ip, cls How many times a scrape cache was flushed due to getting big while scrapes are failing.
prometheus_target_scrapes_exceeded_body_size_limit_total counter instance, ins, job, ip, cls Total number of scrapes that hit the body size limit
prometheus_target_scrapes_exceeded_sample_limit_total counter instance, ins, job, ip, cls Total number of scrapes that hit the sample limit and were rejected.
prometheus_target_scrapes_exemplar_out_of_order_total counter instance, ins, job, ip, cls Total number of exemplar rejected due to not being out of the expected order.
prometheus_target_scrapes_sample_duplicate_timestamp_total counter instance, ins, job, ip, cls Total number of samples rejected due to duplicate timestamps but different values.
prometheus_target_scrapes_sample_out_of_bounds_total counter instance, ins, job, ip, cls Total number of samples rejected due to timestamp falling outside of the time bounds.
prometheus_target_scrapes_sample_out_of_order_total counter instance, ins, job, ip, cls Total number of samples rejected due to not being out of the expected order.
prometheus_template_text_expansion_failures_total counter instance, ins, job, ip, cls The total number of template text expansion failures.
prometheus_template_text_expansions_total counter instance, ins, job, ip, cls The total number of template text expansions.
prometheus_treecache_watcher_goroutines gauge instance, ins, job, ip, cls The current number of watcher goroutines.
prometheus_treecache_zookeeper_failures_total counter instance, ins, job, ip, cls The total number of ZooKeeper failures.
promhttp_metric_handler_errors_total counter ip, cause, ins, job, instance, cls Total number of internal errors encountered by the promhttp metric handler.
promhttp_metric_handler_requests_in_flight gauge instance, ins, job, ip, cls Current number of scrapes being served.
promhttp_metric_handler_requests_total counter ip, ins, code, job, instance, cls Total number of scrapes by HTTP status code.
promtail_batch_retries_total Unknown host, ip, ins, job, instance, cls N/A
promtail_build_info gauge ip, version, revision, goversion, branch, ins, goarch, job, tags, instance, cls, goos A metric with a constant ‘1’ value labeled by version, revision, branch, goversion from which promtail was built, and the goos and goarch for the build.
promtail_config_reload_fail_total Unknown instance, ins, job, ip, cls N/A
promtail_config_reload_success_total Unknown instance, ins, job, ip, cls N/A
promtail_dropped_bytes_total Unknown host, ip, ins, job, reason, instance, cls N/A
promtail_dropped_entries_total Unknown host, ip, ins, job, reason, instance, cls N/A
promtail_encoded_bytes_total Unknown host, ip, ins, job, instance, cls N/A
promtail_file_bytes_total gauge path, instance, ins, job, ip, cls Number of bytes total.
promtail_files_active_total gauge instance, ins, job, ip, cls Number of active files.
promtail_mutated_bytes_total Unknown host, ip, ins, job, reason, instance, cls N/A
promtail_mutated_entries_total Unknown host, ip, ins, job, reason, instance, cls N/A
promtail_read_bytes_total gauge path, instance, ins, job, ip, cls Number of bytes read.
promtail_read_lines_total Unknown path, instance, ins, job, ip, cls N/A
promtail_request_duration_seconds_bucket Unknown host, ip, ins, job, status_code, le, instance, cls N/A
promtail_request_duration_seconds_count Unknown host, ip, ins, job, status_code, instance, cls N/A
promtail_request_duration_seconds_sum Unknown host, ip, ins, job, status_code, instance, cls N/A
promtail_sent_bytes_total Unknown host, ip, ins, job, instance, cls N/A
promtail_sent_entries_total Unknown host, ip, ins, job, instance, cls N/A
promtail_targets_active_total gauge instance, ins, job, ip, cls Number of active total.
promtail_up Unknown instance, ins, job, ip, cls N/A
request_duration_seconds_bucket Unknown instance, ins, job, status_code, route, ws, le, ip, cls, method N/A
request_duration_seconds_count Unknown instance, ins, job, status_code, route, ws, ip, cls, method N/A
request_duration_seconds_sum Unknown instance, ins, job, status_code, route, ws, ip, cls, method N/A
request_message_bytes_bucket Unknown instance, ins, job, route, le, ip, cls, method N/A
request_message_bytes_count Unknown instance, ins, job, route, ip, cls, method N/A
request_message_bytes_sum Unknown instance, ins, job, route, ip, cls, method N/A
response_message_bytes_bucket Unknown instance, ins, job, route, le, ip, cls, method N/A
response_message_bytes_count Unknown instance, ins, job, route, ip, cls, method N/A
response_message_bytes_sum Unknown instance, ins, job, route, ip, cls, method N/A
scrape_duration_seconds Unknown instance, ins, job, ip, cls N/A
scrape_samples_post_metric_relabeling Unknown instance, ins, job, ip, cls N/A
scrape_samples_scraped Unknown instance, ins, job, ip, cls N/A
scrape_series_added Unknown instance, ins, job, ip, cls N/A
tcp_connections gauge instance, ins, job, protocol, ip, cls Current number of accepted TCP connections.
tcp_connections_limit gauge instance, ins, job, protocol, ip, cls The max number of TCP connections that can be accepted (0 means no limit).
up Unknown instance, ins, job, ip, cls N/A

8 - FAQ

Pigsty NODE module frequently asked questions

How to configure NTP service?

If NTP is not configured, use a public NTP service or sync time with the admin node.

If your nodes already have NTP configured, you can leave it there by setting node_ntp_enabled to false.

Otherwise, if you have Internet access, you can use public NTP services such as pool.ntp.org.

If you don’t have Internet access, at least you can sync time with the admin node with the following:

node_ntp_servers:                 # NTP servers in /etc/chrony.conf
  - pool cn.pool.ntp.org iburst
  - pool ${admin_ip} iburst       # assume non-admin nodes do not have internet access

How to force sync time on nodes?

Use chronyc to sync time. You have to configure the NTP service first.

ansible all -b -a 'chronyc -a makestep'     # sync time

You can replace all with any group or host IP address to limit execution scope.


Remote nodes are not accessible via SSH commands.

Consider using Ansible connection parameters if the target machine is hidden behind an SSH springboard machine, or if some customizations have been made that cannot be accessed directly using ssh ip. Additional SSH ports can be specified with ansible_port or ansible_host for SSH Alias.

pg-test:
  vars: { pg_cluster: pg-test }
  hosts:
    10.10.10.11: {pg_seq: 1, pg_role: primary, ansible_host: node-1 }
    10.10.10.12: {pg_seq: 2, pg_role: replica, ansible_port: 22223, ansible_user: admin }
    10.10.10.13: {pg_seq: 3, pg_role: offline, ansible_port: 22224 }

Password required for remote node SSH and SUDO

When performing deployments and changes, the admin user used must have ssh and sudo privileges for all nodes. Password-free is not required.

You can pass in ssh and sudo passwords via the -k|-K parameter when executing the playbook or even use another user to run the playbook via -eansible_host=<another_user>. However, Pigsty strongly recommends configuring SSH passwordless login with passwordless sudo for the admin user.


Create an admin user with the existing admin user.

This will create an admin user specified by node_admin_username with the existing one on that node.

./node.yml -k -K -e ansible_user=<another_admin> -t node_admin`

Exposing node services with HAProxy

You can expose service with haproxy_services in node.yml.

And here’s an example of exposing MinIO service with it: Expose MinIO Service


Why my nodes /etc/yum.repos.d/* are nuked?

Pigsty will try to include all dependencies in the local yum repo on infra nodes. This repo file will be added according to node_repo_modules. And existing repo files will be removed by default according to the default value of node_repo_remove. This will prevent the node from using the Internet repo or some stupid issues.

If you want to keep existing repo files during node init, just set node_repo_remove to false.

If you want to keep existing repo files during infra node local repo bootstrap, just set repo_remove to false.


Why my shell prompt change and how to restore it?

The pigsty prompt is defined with the environment variable PS1 in /etc/profile.d/node.sh.

To restore your existing prompt, just remove that file and login again.


Tencent OpenCloudOS Compatibility Issue

OpenCloudOS does not have softdog module, overwrite node_kernel_modules on global vars:

node_kernel_modules: [ br_netfilter, ip_vs, ip_vs_rr, ip_vs_wrr, ip_vs_sh ]