This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Module: INFRA

Optional standalone infrastructure that provides NTP, DNS, observability and other foundational services for PostgreSQL.

Configuration | Administration | Playbooks | Monitoring | Parameters


Overview

Every Pigsty deployment includes a set of infrastructure components that provide services for managed nodes and database clusters:

ComponentPortDomainDescription
Nginx80/443i.pigstyWeb service portal, local repo, and unified entry point
Grafana3000g.pigstyVisualization platform for monitoring dashboards and data apps
VictoriaMetrics8428p.pigstyTime-series database with VMUI, compatible with Prometheus API
VictoriaLogs9428-Centralized log database, receives structured logs from Vector
VictoriaTraces10428-Tracing and event storage for slow SQL / request tracing
VMAlert8880-Alert rule evaluator, triggers alerts based on VictoriaMetrics metrics
AlertManager9059a.pigstyAlert aggregation and dispatch, receives notifications from VMAlert
BlackboxExporter9115-ICMP/TCP/HTTP blackbox probing
DNSMASQ53-DNS server for internal domain resolution
Chronyd123-NTP time server
PostgreSQL5432-CMDB and default database
Ansible--Runs playbooks, orchestrates all infrastructure

In Pigsty, the PGSQL module uses some services on INFRA nodes, specifically:

  • Database cluster/host node domains depend on DNSMASQ on INFRA nodes for resolution.
  • Installing software on database nodes uses the local yum/apt repo hosted by Nginx on INFRA nodes.
  • Database cluster/node monitoring metrics are scraped and stored by VictoriaMetrics on INFRA nodes, accessible via VMUI / PromQL.
  • Database and node runtime logs are collected by Vector and pushed to VictoriaLogs on INFRA, searchable in Grafana.
  • VMAlert evaluates alert rules based on metrics in VictoriaMetrics and forwards events to Alertmanager.
  • Users initiate management of database nodes from Infra/Admin nodes using Ansible or other tools:
    • Execute cluster creation, scaling, instance/cluster recycling
    • Create business users, databases, modify services, HBA changes;
    • Execute log collection, garbage cleanup, backup, inspections, etc.
  • Database nodes sync time from the NTP server on INFRA/ADMIN nodes by default
  • If no dedicated cluster exists, the HA component Patroni uses etcd on INFRA nodes as the HA DCS.
  • If no dedicated cluster exists, the backup component pgbackrest uses MinIO on INFRA nodes as an optional centralized backup repository.

Nginx

Nginx is the access entry point for all WebUI services in Pigsty, using port 80 on the admin node by default.

Many infrastructure components with WebUI are exposed through Nginx, such as Grafana, VictoriaMetrics (VMUI), AlertManager, and HAProxy traffic management pages. Additionally, static file resources like yum/apt repos are served through Nginx.

Nginx routes access requests to corresponding upstream components based on domain names according to infra_portal configuration. If you use other domains or public domains, you can modify them here:

infra_portal:  # domain names and upstream servers
  home         : { domain: i.pigsty }
  grafana      : { domain: g.pigsty ,endpoint: "${admin_ip}:3000" , websocket: true }
  prometheus   : { domain: p.pigsty ,endpoint: "${admin_ip}:8428" }   # VMUI
  alertmanager : { domain: a.pigsty ,endpoint: "${admin_ip}:9059" }
  blackbox     : { endpoint: "${admin_ip}:9115" }
  vmalert      : { endpoint: "${admin_ip}:8880" }
  #logs         : { domain: logs.pigsty ,endpoint: "${admin_ip}:9428" }
  #minio        : { domain: sss.pigsty  ,endpoint: "${admin_ip}:9001" ,scheme: https ,websocket: true }

Pigsty strongly recommends using domain names to access Pigsty UI systems rather than direct IP+port access, for these reasons:

  • Using domains makes it easy to enable HTTPS traffic encryption, consolidate access to Nginx, audit all requests, and conveniently integrate authentication mechanisms.
  • Some components only listen on 127.0.0.1 by default, so they can only be accessed through Nginx proxy.
  • Domain names are easier to remember and provide additional configuration flexibility.

If you don’t have available internet domains or local DNS resolution, you can add local static resolution records in /etc/hosts (MacOS/Linux) or C:\Windows\System32\drivers\etc\hosts (Windows).

Nginx configuration parameters are at: Configuration: INFRA - NGINX


Local Software Repository

Pigsty creates a local software repository during installation to accelerate subsequent software installation.

This repository is served by Nginx, located by default at /www/pigsty, accessible via http://i.pigsty/pigsty.

Pigsty’s offline package is the entire software repository directory (yum/apt) compressed. When Pigsty tries to build a local repo, if it finds the local repo directory /www/pigsty already exists with the /www/pigsty/repo_complete marker file, it considers the local repo already built and skips downloading software from upstream, eliminating internet dependency.

The repo definition file is at /www/pigsty.repo, accessible by default via http://${admin_ip}/pigsty.repo

curl -L http://i.pigsty/pigsty.repo -o /etc/yum.repos.d/pigsty.repo

You can also use the file local repo directly without Nginx:

[pigsty-local]
name=Pigsty local $releasever - $basearch
baseurl=file:///www/pigsty/
enabled=1
gpgcheck=0

Local repository configuration parameters are at: Configuration: INFRA - REPO


Victoria Observability Suite

Pigsty v4.0 uses the VictoriaMetrics family to replace Prometheus/Loki, providing unified monitoring, logging, and tracing capabilities:

  • VictoriaMetrics listens on port 8428 by default, accessible via http://p.pigsty or https://i.pigsty/vmetrics/ for VMUI, compatible with Prometheus API.
  • VMAlert evaluates alert rules in /infra/rules/*.yml, listens on port 8880, and sends alert events to Alertmanager.
  • VictoriaLogs listens on port 9428, supports the https://i.pigsty/vlogs/ query interface. All nodes run Vector by default, pushing structured system logs, PostgreSQL logs, etc. to VictoriaLogs.
  • VictoriaTraces listens on port 10428 for slow SQL / Trace collection, Grafana accesses it as a Jaeger datasource.
  • Alertmanager listens on port 9059, accessible via http://a.pigsty or https://i.pigsty/alertmgr/ for managing alert notifications. After configuring SMTP, Webhook, etc., it can push messages.
  • Blackbox Exporter listens on port 9115 by default for Ping/TCP/HTTP probing, accessible via https://i.pigsty/blackbox/.

For more information, see: Configuration: INFRA - VICTORIA and Configuration: INFRA - PROMETHEUS.


Grafana

Grafana is the core of Pigsty’s WebUI, listening on port 3000 by default, accessible directly via IP:3000 or domain http://g.pigsty.

Pigsty comes with preconfigured datasources for VictoriaMetrics / Logs / Traces (vmetrics-*, vlogs-*, vtraces-*), and numerous dashboards with URL-based navigation for quick problem location.

Grafana can also be used as a general low-code visualization platform, so Pigsty installs plugins like ECharts and victoriametrics-datasource by default for building monitoring dashboards or inspection reports.

Grafana configuration parameters are at: Configuration: INFRA - GRAFANA.


Ansible

Pigsty installs Ansible on the meta node by default. Ansible is a popular operations tool with declarative configuration style and idempotent playbook design that greatly reduces system maintenance complexity.


DNSMASQ

DNSMASQ provides DNS resolution services within the environment. Domain names from other modules are registered with the DNSMASQ service on INFRA nodes.

DNS records are placed by default in the /etc/hosts.d/ directory on all INFRA nodes.

DNSMASQ configuration parameters are at: Configuration: INFRA - DNS


Chronyd

NTP service synchronizes time across all nodes in the environment (optional)

NTP configuration parameters are at: Configuration: NODES - NTP


Configuration

To install the INFRA module on a node, first add it to the infra group in the config inventory and assign an instance number infra_seq

# Configure single INFRA node
infra: { hosts: { 10.10.10.10: { infra_seq: 1 } }}

# Configure two INFRA nodes
infra:
  hosts:
    10.10.10.10: { infra_seq: 1 }
    10.10.10.11: { infra_seq: 2 }

Then use the infra.yml playbook to initialize the INFRA module on the nodes.


Administration

Here are some administration tasks related to the INFRA module:


Install/Uninstall Infra Module

./infra.yml     # Install INFRA module on infra group
./infra-rm.yml  # Uninstall INFRA module from infra group

Manage Local Software Repository

You can use the following playbook subtasks to manage the local yum repo on Infra nodes:

./infra.yml -t repo              # Create local repo from internet or offline package

./infra.yml -t repo_dir          # Create local repo directory
./infra.yml -t repo_check        # Check if local repo already exists
./infra.yml -t repo_prepare      # If exists, use existing local repo
./infra.yml -t repo_build        # If not exists, build local repo from upstream
./infra.yml     -t repo_upstream     # Handle upstream repo files in /etc/yum.repos.d
./infra.yml     -t repo_remove       # If repo_remove == true, delete existing repo files
./infra.yml     -t repo_add          # Add upstream repo files to /etc/yum.repos.d (or /etc/apt/sources.list.d)
./infra.yml     -t repo_url_pkg      # Download packages from internet defined by repo_url_packages
./infra.yml     -t repo_cache        # Create upstream repo metadata cache with yum makecache / apt update
./infra.yml     -t repo_boot_pkg     # Install bootstrap packages like createrepo_c, yum-utils... (or dpkg-)
./infra.yml     -t repo_pkg          # Download packages & dependencies from upstream repos
./infra.yml     -t repo_create       # Create local repo with createrepo_c & modifyrepo_c
./infra.yml     -t repo_use          # Add newly built repo to /etc/yum.repos.d | /etc/apt/sources.list.d
./infra.yml -t repo_nginx        # If no nginx serving, start nginx as web server

The most commonly used commands are:

./infra.yml     -t repo_upstream     # Add upstream repos defined in repo_upstream to INFRA nodes
./infra.yml     -t repo_pkg          # Download packages and dependencies from upstream repos
./infra.yml     -t repo_create       # Create/update local yum repo with createrepo_c & modifyrepo_c

Manage Infrastructure Components

You can use the following playbook subtasks to manage various infrastructure components on Infra nodes:

./infra.yml -t infra           # Configure infrastructure
./infra.yml -t infra_env       # Configure environment variables on admin node: env_dir, env_pg, env_var
./infra.yml -t infra_pkg       # Install software packages required by INFRA: infra_pkg_yum, infra_pkg_pip
./infra.yml -t infra_user      # Setup infra OS user group
./infra.yml -t infra_cert      # Issue certificates for infra components
./infra.yml -t dns             # Configure DNSMasq: dns_config, dns_record, dns_launch
./infra.yml -t nginx           # Configure Nginx: nginx_config, nginx_cert, nginx_static, nginx_launch, nginx_exporter
./infra.yml -t victoria        # Configure VictoriaMetrics/Logs/Traces: vmetrics|vlogs|vtraces|vmalert
./infra.yml -t alertmanager    # Configure AlertManager: alertmanager_config, alertmanager_launch
./infra.yml -t blackbox        # Configure Blackbox Exporter: blackbox_launch
./infra.yml -t grafana         # Configure Grafana: grafana_clean, grafana_config, grafana_plugin, grafana_launch, grafana_provision
./infra.yml -t infra_register  # Register infra components to VictoriaMetrics / Grafana

Other commonly used tasks include:

./infra.yml -t nginx_index                        # Re-render Nginx homepage content
./infra.yml -t nginx_config,nginx_reload          # Re-render Nginx portal config, expose new upstream services
./infra.yml -t vmetrics_config,vmetrics_launch    # Regenerate VictoriaMetrics main config and restart service
./infra.yml -t vlogs_config,vlogs_launch          # Re-render VictoriaLogs config
./infra.yml -t vmetrics_clean                     # Clean VictoriaMetrics storage data directory
./infra.yml -t grafana_plugin                     # Download Grafana plugins from internet

Playbooks

Pigsty provides three playbooks related to the INFRA module:

  • infra.yml: Initialize pigsty infrastructure on infra nodes
  • infra-rm.yml: Remove infrastructure components from infra nodes
  • deploy.yml: Complete one-time Pigsty installation on all nodes

infra.yml

The INFRA module playbook infra.yml initializes pigsty infrastructure on INFRA nodes

Executing this playbook completes the following tasks

  • Configure meta node directories and environment variables
  • Download and build a local software repository to accelerate subsequent installation. (If using offline package, skip download phase)
  • Add the current meta node as a regular node under Pigsty management
  • Deploy infrastructure components including VictoriaMetrics/Logs/Traces, VMAlert, Grafana, Alertmanager, Blackbox Exporter, etc.

This playbook executes on INFRA nodes by default

  • Pigsty uses the current node executing this playbook as Pigsty’s INFRA node and ADMIN node by default.
  • During configuration, Pigsty marks the current node as Infra/Admin node and replaces the placeholder IP 10.10.10.10 in config templates with the current node’s primary IP address.
  • Besides initiating management and hosting infrastructure, this node is no different from a regular managed node.
  • In single-node installation, ETCD is also installed on this node to provide DCS service

Notes about this playbook

  • This is an idempotent playbook; repeated execution will wipe infrastructure components on meta nodes.
  • To preserve historical monitoring data, first set vmetrics_clean, vlogs_clean, vtraces_clean to false.
  • When offline repo /www/pigsty/repo_complete exists, this playbook skips downloading software from internet. Full execution takes about 5-8 minutes depending on machine configuration.
  • Downloading directly from upstream internet sources without offline package may take 10-20 minutes depending on your network conditions.

asciicast


infra-rm.yml

The INFRA module playbook infra-rm.yml removes pigsty infrastructure from INFRA nodes

Common subtasks include:

./infra-rm.yml               # Remove INFRA module
./infra-rm.yml -t service    # Stop infrastructure services on INFRA
./infra-rm.yml -t data       # Remove remaining data on INFRA
./infra-rm.yml -t package    # Uninstall software packages installed on INFRA

deploy.yml

The INFRA module playbook deploy.yml performs a complete one-time Pigsty installation on all nodes

This playbook is described in more detail in Playbook: One-Time Installation.


Monitoring

Pigsty Home: Pigsty monitoring system homepage

Pigsty Home Dashboard

pigsty.jpg

INFRA Overview: Pigsty infrastructure self-monitoring overview

INFRA Overview Dashboard

infra-overview.jpg

Nginx Instance: Nginx metrics and logs

Nginx Overview Dashboard

nginx-overview.jpg

Grafana Instance: Grafana metrics and logs

Grafana Overview Dashboard

grafana-overview.jpg

VictoriaMetrics Instance: VictoriaMetrics scraping, querying, and storage metrics

VMAlert Instance: Alert rule evaluation and queue status

Alertmanager Instance: Alert aggregation, notification pipelines, and Silences

VictoriaLogs Instance: Log ingestion rate, query load, and index hits

VictoriaTraces Instance: Trace/KV storage and Jaeger interface

Logs Instance: Node log search based on Vector + VictoriaLogs

Logs Instance Dashboard

logs-instance.jpg

CMDB Overview: CMDB visualization

CMDB Overview Dashboard

cmdb-overview.jpg

ETCD Overview: etcd metrics and logs

ETCD Overview Dashboard

etcd-overview.jpg


Parameters

The INFRA module has the following 10 parameter groups.

  • META: Pigsty metadata
  • CA: Self-signed PKI/CA infrastructure
  • INFRA_ID: Infrastructure portal, Nginx domains
  • REPO: Local software repository
  • INFRA_PACKAGE: Infrastructure software packages
  • NGINX: Nginx web server
  • DNS: DNSMASQ domain server
  • VICTORIA: VictoriaMetrics / Logs / Traces suite
  • PROMETHEUS: Alertmanager and Blackbox Exporter
  • GRAFANA: Grafana observability suite
Parameter Overview

For the latest default values, types, and hierarchy, please refer to the Parameter Reference to stay consistent with the Pigsty version.

1 - Architecture

INFRA module architecture, functional components, and responsibilities in Pigsty.

Architecture Overview

Standard Pigsty deployment includes an INFRA module that provides services for managed nodes and database clusters:

  • Nginx: Web server providing local repo services; reverse proxy consolidates Grafana, VMUI, Alertmanager web UI access.
  • Grafana: Visualization platform for monitoring metrics, logs, and tracing—hosts monitoring dashboards, inspection reports, and custom data apps.
  • VictoriaMetrics Suite: Unified observability platform.
    • VictoriaMetrics: Scrapes all monitoring metrics, Prometheus API-compatible, provides query interface via VMUI.
    • VMAlert: Evaluates alert rules, pushes events to Alertmanager.
    • VictoriaLogs: Centralized log collection and storage. All nodes run Vector by default, pushing system and database logs here.
    • VictoriaTraces: Collects slow SQL, service traces, and other trace data.
    • AlertManager: Aggregates alert events, dispatches notifications (email, Webhook, etc.).
    • BlackboxExporter: Probes IP/VIP/URL reachability via ICMP/TCP/HTTP.
  • DNSMASQ: Provides DNS resolution for internal domain names.
  • Chronyd: NTP time sync service ensuring consistent time across all nodes.
  • PostgreSQL: CMDB and default database.
  • Ansible: Runs playbooks, orchestrates all infrastructure.

pigsty-arch

INFRA module is optional for PG HA. For example, Slim Install mode doesn’t install INFRA.

However, INFRA provides supporting services needed for prod-grade HA PG clusters, strongly recommended for full Pigsty DBaaS experience.

If you have existing infra (Nginx, local repo, monitoring, DNS, NTP), you can disable INFRA module and configure Pigsty to use existing infrastructure instead.


Nginx

Nginx = Pigsty web UI entry point—HTTP/HTTPS on ports 80/443 by default.

Web UIs with infrastructure components exposed via Nginx: Grafana, VictoriaMetrics (VMUI), AlertManager, and HAProxy traffic console. Local yum/apt repo static files also served via Nginx.

Nginx routes based on infra_portal configuration—domain-based proxy to upstream components. Customize for other or public domains:

infra_portal:  # domain names and upstream servers
  home         : { domain: i.pigsty }
  grafana      : { domain: g.pigsty ,endpoint: "${admin_ip}:3000" , websocket: true }
  prometheus   : { domain: p.pigsty ,endpoint: "${admin_ip}:8428" }   # VMUI
  alertmanager : { domain: a.pigsty ,endpoint: "${admin_ip}:9059" }
  blackbox     : { endpoint: "${admin_ip}:9115" }
  vmalert      : { endpoint: "${admin_ip}:8880" }
  #logs         : { domain: logs.pigsty ,endpoint: "${admin_ip}:9428" }
  #minio        : { domain: sss.pigsty ,endpoint: "${admin_ip}:9001" ,scheme: https ,websocket: true }
  #pgadmin      : { domain: adm.pigsty ,endpoint: "127.0.0.1:8885" }
  #pgweb        : { domain: cli.pigsty ,endpoint: "127.0.0.1:8886" }
  #bytebase     : { domain: ddl.pigsty ,endpoint: "127.0.0.1:8887" }
  #jupyter      : { domain: lab.pigsty ,endpoint: "127.0.0.1:8888"   ,websocket: true }
  #gitea        : { domain: git.pigsty ,endpoint: "127.0.0.1:8889" }
  #wiki         : { domain: wiki.pigsty ,endpoint: "127.0.0.1:9002" }
  #noco         : { domain: noco.pigsty ,endpoint: "127.0.0.1:9003" }
  #supa         : { domain: supa.pigsty ,endpoint: "10.2.82.163:8000" ,websocket: true }
  #odoo         : { domain: odoo.pigsty ,endpoint: "127.0.0.1:8069"   ,websocket: true }
  #mm           : { domain: mm.pigsty ,endpoint: "10.2.82.163:8065" ,websocket: true }
  web.io:
    domain: en.pigsty
    path: "/www/web.io"
    certbot: pigsty.doc
    enforce_https: true
    config: |
      # rewrite /zh/ to /
      location /zh/ {
                rewrite ^/zh/(.*)$ /$1 permanent;
      }
  web.cc:
    domain: pigsty.cc
    path: "/www/web.cc"
    domains: [ zh.pigsty.cc ]
    certbot: pigsty.doc
    config: |
      # rewrite /zh/ to /
      location /zh/ {
                rewrite ^/zh/(.*)$ /$1 permanent;
      }
  repo:
    domain: pro.pigsty
    path: "/www/repo"
    index: true
    certbot: pigsty.doc

Pigsty strongly recommends domain access over IP+port:

  • Enables HTTPS encryption, consolidates to Nginx, audits all requests, integrates auth.
  • Some components only listen on 127.0.0.1—only accessible via Nginx proxy.
  • Domains easier to remember, extra flexibility.

If no internet domain or local DNS resolution, add local static records in /etc/hosts (MacOS/Linux) or C:\Windows\System32\drivers\etc\hosts (Windows).

Nginx config: Configuration: INFRA - NGINX.


Local Software Repository

Pigsty creates local repo during install on INFRA nodes to accelerate subsequent software installation.

Repo served by Nginx, default location /www/pigsty, accessible via http://i.pigsty/pigsty.

Pigsty’s offline package = entire built repo directory (yum/apt) compressed. When building local repo, if /www/pigsty exists with /www/pigsty/repo_complete marker, considers repo already built—skips upstream downloads, eliminating internet dependency.

Repo definition file: /www/pigsty.repo, accessible via http://${admin_ip}/pigsty.repo.

curl -L http://i.pigsty/pigsty.repo -o /etc/yum.repos.d/pigsty.repo

Or use file local repo without Nginx:

[pigsty-local]
name=Pigsty local $releasever - $basearch
baseurl=file:///www/pigsty/
enabled=1
gpgcheck=0

Local repo config: Configuration: INFRA - REPO.


Victoria Observability Suite

Pigsty v4.0 uses VictoriaMetrics family—unified monitoring, logging, tracing:

  • VictoriaMetrics: Default port 8428, accessible via http://p.pigsty or https://i.pigsty/vmetrics/, Prometheus API-compatible.
  • VMAlert: Evaluates alert rules in /infra/rules/*.yml, port 8880, sends events to Alertmanager.
  • VictoriaLogs: Default port 9428, supports log search via https://i.pigsty/vlogs/. All nodes run Vector by default, pushing structured system logs, PG logs here.
  • VictoriaTraces: Port 10428 for slow SQL / Trace collection. Grafana accesses as Jaeger datasource.
  • AlertManager: Port 9059, accessible via http://a.pigsty or https://i.pigsty/alertmgr/ for managing alert notifications. Configure SMTP, Webhook, etc. to push messages.
  • Blackbox Exporter: Default port 9115 for Ping/TCP/HTTP probing, accessible via https://i.pigsty/blackbox/.

More: Configuration: INFRA - VICTORIA and Configuration: INFRA - PROMETHEUS.


Grafana

Grafana = Pigsty web UI core, default port 3000, accessible via IP:3000 or domain http://g.pigsty.

Pigsty includes preconfigured datasources for VictoriaMetrics / Logs / Traces (vmetrics-*, vlogs-*, vtraces-*), plus numerous dashboards with URL navigation for quick problem location.

Grafana = low-code visualization platform. Pigsty installs plugins (ECharts, victoriametrics-datasource) by default for building monitoring dashboards/inspection reports.

Grafana config: Configuration: INFRA - GRAFANA.


Ansible

Pigsty installs Ansible on meta node by default. Ansible = popular ops tool with declarative config style and idempotent playbook design—reduces system maintenance complexity.


DNSMASQ

DNSMASQ provides DNS resolution for internal Pigsty domain names. Other modules’ domain names register with DNSMASQ service on INFRA nodes.

DNS records: default location /etc/hosts.d/ on all INFRA nodes.

DNSMASQ config: Configuration: INFRA - DNS.


Chronyd

NTP service syncs time across all nodes in environment (optional).

NTP config: Configuration: NODE - NTP.

ComponentPortDefault DomainDescription
Nginx80/443i.pigstyWeb portal, local repo, unified entry
Grafana3000g.pigstyVisualization platform, monitoring dashboards
VictoriaMetrics8428p.pigstyTSDB, VMUI, Prometheus-compatible
VictoriaLogs9428-Centralized log DB, Vector pushes logs
VictoriaTraces10428-Tracing / slow SQL, Jaeger interface
VMAlert8880-Alert rule evaluator
AlertManager9059a.pigstyAlert aggregation, notifications
BlackboxExporter9115-ICMP/TCP/HTTP probes
DNSMASQ53-DNS server
Chronyd123-NTP time server

2 - Configuration

How to configure INFRA nodes? Customize Nginx, local repo, DNS, NTP, monitoring components.

Configuration Guide

INFRA = primarily monitoring infrastructure, optional for PostgreSQL databases.

Unless manually configured to depend on DNS/NTP services on INFRA nodes, INFRA module failures typically don’t affect PG cluster operations.

Single INFRA node suffices for most scenarios. Prod env recommends 2-3 INFRA nodes for HA.

For better resource utilization, ETCD module (required by PG HA) can share nodes with INFRA module.

Using more than 3 INFRA nodes provides little additional benefit, but more ETCD nodes (e.g., 5) can improve DCS availability.


Configuration Examples

Add node IPs to infra group in config inventory, assign INFRA instance number infra_seq.

Default single INFRA node config:

all:
  children:
    infra: { hosts: { 10.10.10.10: { infra_seq: 1 } }}

By default, 10.10.10.10 placeholder replaced with current node’s primary IP during config.

Use infra.yml playbook to init INFRA module on nodes.

More Nodes

Two INFRA nodes config:

all:
  children:
    infra:
      hosts:
        10.10.10.10: { infra_seq: 1 }
        10.10.10.11: { infra_seq: 2 }

Three INFRA nodes config (with params):

all:
  children:
    infra:
      hosts:
        10.10.10.10: { infra_seq: 1 }
        10.10.10.11: { infra_seq: 2, repo_enabled: false }
        10.10.10.12: { infra_seq: 3, repo_enabled: false }
      vars:
        grafana_clean: false
        vmetrics_clean: false
        vlogs_clean: false
        vtraces_clean: false

INFRA High Availability

Most INFRA module components = “stateless/identical state”. For HA, focus on “load balancing”.

HA achievable via Keepalived L2 VIP or HAProxy L4 load balancing. L2 VIP recommended for L2-reachable networks.

Config example:

infra:
  hosts:
    10.10.10.10: { infra_seq: 1 }
    10.10.10.11: { infra_seq: 2 }
    10.10.10.12: { infra_seq: 3 }
  vars:
    vip_enabled: true
    vip_vrid: 128
    vip_address: 10.10.10.8
    vip_interface: eth1

    infra_portal:
      home         : { domain: i.pigsty }
      grafana      : { domain: g.pigsty ,endpoint: "10.10.10.8:3000" , websocket: true }
      prometheus   : { domain: p.pigsty ,endpoint: "10.10.10.8:8428" }
      alertmanager : { domain: a.pigsty ,endpoint: "10.10.10.8:9059" }
      blackbox     : { endpoint: "10.10.10.8:9115" }
      vmalert      : { endpoint: "10.10.10.8:8880" }

Set VIP-related params and modify service endpoints in infra_portal.


Nginx Configuration

See Nginx Parameter Config and Tutorial: Nginx.


Local Repo Configuration

See Repo Parameter Config.


DNS Configuration

See DNS Parameter Config and Tutorial: DNS.


NTP Configuration

See NTP Parameter Config.

3 - Parameters

INFRA module provides 10 sections with 70+ configurable parameters

The INFRA module is responsible for deploying Pigsty’s infrastructure components: local software repository, Nginx, DNSMasq, VictoriaMetrics, VictoriaLogs, Grafana, Alertmanager, Blackbox Exporter, and other monitoring and alerting infrastructure.

Pigsty v4.0 uses VictoriaMetrics to replace Prometheus and VictoriaLogs to replace Loki, providing a superior observability solution.

SectionDescription
METAPigsty metadata: version, admin IP, region, language, proxy
CASelf-signed CA certificate management
INFRA_IDInfrastructure node identity and service portal
REPOLocal software repository configuration
INFRA_PACKAGEInfrastructure node package installation
NGINXNginx web server and reverse proxy configuration
DNSDNSMasq DNS server configuration
VICTORIAVictoriaMetrics/Logs/Traces observability stack
PROMETHEUSAlertmanager and Blackbox Exporter
GRAFANAGrafana visualization platform configuration

Parameter Overview

META parameters define Pigsty metadata, including version string, admin node IP, repository mirror region, default language, and proxy settings.

ParameterTypeLevelDescription
versionstringGPigsty version string
admin_ipipGAdmin node IP address
regionenumGUpstream mirror region: default,china,europe
languageenumGDefault language: en or zh
proxy_envdictGGlobal proxy environment variables

CA parameters configure Pigsty’s self-signed CA certificate management, including CA creation, CA name, and certificate validity.

ParameterTypeLevelDescription
ca_createboolGCreate CA if not exists? Default true
ca_cnstringGCA CN name, fixed as pigsty-ca
cert_validityintervalGCertificate validity, default 20 years

INFRA_ID parameters define infrastructure node identity, including node sequence number, service portal configuration, and data directory.

ParameterTypeLevelDescription
infra_seqintIInfrastructure node sequence, REQUIRED
infra_portaldictGInfrastructure services exposed via Nginx portal
infra_datapathGInfrastructure data directory, default /data/infra

REPO parameters configure the local software repository, including repository enable switch, directory paths, upstream source definitions, and packages to download.

ParameterTypeLevelDescription
repo_enabledboolG/ICreate local repo on this infra node?
repo_homepathGRepo home directory, default /www
repo_namestringGRepo name, default pigsty
repo_endpointurlGRepo access endpoint: domain or ip:port
repo_removeboolG/ARemove existing upstream repo definitions?
repo_modulesstringG/AEnabled upstream repo modules, comma separated
repo_upstreamupstream[]GUpstream repo definitions
repo_packagesstring[]GPackages to download from upstream
repo_extra_packagesstring[]G/C/IExtra packages to download
repo_url_packagesstring[]GExtra packages downloaded via URL

INFRA_PACKAGE parameters define packages to install on infrastructure nodes, including RPM/DEB packages and PIP packages.

ParameterTypeLevelDescription
infra_packagesstring[]GPackages to install on infra nodes
infra_packages_pipstringGPip packages to install on infra nodes

NGINX parameters configure Nginx web server and reverse proxy, including enable switch, ports, SSL mode, certificates, and basic authentication.

ParameterTypeLevelDescription
nginx_enabledboolG/IEnable Nginx on this infra node?
nginx_cleanboolG/AClean existing Nginx config during init?
nginx_exporter_enabledboolG/IEnable nginx_exporter on this infra node?
nginx_exporter_portportGnginx_exporter listen port, default 9113
nginx_sslmodeenumGNginx SSL mode: disable,enable,enforce
nginx_cert_validitydurationGNginx self-signed cert validity, default 397d
nginx_homepathGNginx content dir, default /www, symlink to nginx_data
nginx_datapathGNginx actual data dir, default /data/nginx
nginx_usersdictGNginx basic auth users: username-password dict
nginx_portportGNginx listen port, default 80
nginx_ssl_portportGNginx SSL listen port, default 443
certbot_signboolG/ASign cert with certbot?
certbot_emailstringG/ACertbot notification email address
certbot_optionsstringG/ACertbot extra command line options

DNS parameters configure DNSMasq DNS server, including enable switch, listen port, and dynamic DNS records.

ParameterTypeLevelDescription
dns_enabledboolG/ISetup dnsmasq on this infra node?
dns_portportGDNS server listen port, default 53
dns_recordsstring[]GDynamic DNS records resolved by dnsmasq

VICTORIA parameters configure the VictoriaMetrics/Logs/Traces observability stack, including enable switches, ports, and data retention policies.

ParameterTypeLevelDescription
vmetrics_enabledboolG/IEnable VictoriaMetrics on this infra node?
vmetrics_cleanboolG/AClean VictoriaMetrics data during init?
vmetrics_portportGVictoriaMetrics listen port, default 8428
vmetrics_scrape_intervalintervalGGlobal scrape interval, default 10s
vmetrics_scrape_timeoutintervalGGlobal scrape timeout, default 8s
vmetrics_optionsargGVictoriaMetrics extra CLI options
vlogs_enabledboolG/IEnable VictoriaLogs on this infra node?
vlogs_cleanboolG/AClean VictoriaLogs data during init?
vlogs_portportGVictoriaLogs listen port, default 9428
vlogs_optionsargGVictoriaLogs extra CLI options
vtraces_enabledboolG/IEnable VictoriaTraces on this infra node?
vtraces_cleanboolG/AClean VictoriaTraces data during init?
vtraces_portportGVictoriaTraces listen port, default 10428
vtraces_optionsargGVictoriaTraces extra CLI options
vmalert_enabledboolG/IEnable VMAlert on this infra node?
vmalert_portportGVMAlert listen port, default 8880
vmalert_optionsargGVMAlert extra CLI options

PROMETHEUS parameters configure Alertmanager and Blackbox Exporter, providing alert management and network probing capabilities.

ParameterTypeLevelDescription
blackbox_enabledboolG/ISetup blackbox_exporter on this infra node?
blackbox_portportGblackbox_exporter listen port, default 9115
blackbox_optionsargGblackbox_exporter extra CLI options
alertmanager_enabledboolG/ISetup alertmanager on this infra node?
alertmanager_portportGAlertManager listen port, default 9059
alertmanager_optionsargGalertmanager extra CLI options
exporter_metrics_pathpathGExporter metrics path, default /metrics

GRAFANA parameters configure the Grafana visualization platform, including enable switch, port, admin credentials, and data source configuration.

ParameterTypeLevelDescription
grafana_enabledboolG/IEnable Grafana on this infra node?
grafana_portportGGrafana listen port, default 3000
grafana_cleanboolG/AClean Grafana data during init?
grafana_admin_usernameusernameGGrafana admin username, default admin
grafana_admin_passwordpasswordGGrafana admin password, default pigsty
grafana_auth_proxyboolGEnable Grafana auth proxy?
grafana_pgurlurlGExternal PostgreSQL URL for Grafana persistence
grafana_view_passwordpasswordGGrafana metadb PG datasource password

META

This section defines Pigsty deployment metadata: version string, admin node IP address, repository mirror region, default language, and HTTP(S) proxy for downloading packages.

version: v4.0.0                   # pigsty version string
admin_ip: 10.10.10.10             # admin node ip address
region: default                   # upstream mirror region: default,china,europe
language: en                      # default language: en or zh
proxy_env:                        # global proxy env when downloading packages
  no_proxy: "localhost,127.0.0.1,10.0.0.0/8,192.168.0.0/16,*.pigsty,*.aliyun.com,mirrors.*,*.myqcloud.com,*.tsinghua.edu.cn"
  # http_proxy:  # set your proxy here: e.g http://user:[email protected]
  # https_proxy: # set your proxy here: e.g http://user:[email protected]
  # all_proxy:   # set your proxy here: e.g http://user:[email protected]

version

name: version, type: string, level: G

Pigsty version string, default value is the current version: v4.0.0.

Pigsty uses this version string internally for feature control and content rendering. Do not modify this parameter arbitrarily.

Pigsty uses semantic versioning, and the version string typically starts with the character v, e.g., v4.0.0.

admin_ip

name: admin_ip, type: ip, level: G

Admin node IP address, default is the placeholder IP address: 10.10.10.10

The node specified by this parameter will be treated as the admin node, typically pointing to the first node where Pigsty is installed, i.e., the control node.

The default value 10.10.10.10 is a placeholder that will be replaced with the actual admin node IP address during configure.

Many parameters reference this parameter, such as:

In these parameters, the string ${admin_ip} will be replaced with the actual value of admin_ip. Using this mechanism, you can specify different admin nodes for different nodes.

region

name: region, type: enum, level: G

Upstream mirror region, available options: default, china, europe, default is default

If a region other than default is set, and there’s a corresponding entry in repo_upstream with a matching baseurl, it will be used instead of the default baseurl.

For example, if your region is set to china, Pigsty will attempt to use Chinese mirror sites to accelerate downloads. If an upstream repository doesn’t have a corresponding China region mirror, the default upstream mirror site will be used instead. Additionally, URLs defined in repo_url_packages will be replaced from repo.pigsty.io to repo.pigsty.cc to use domestic mirrors.

language

name: language, type: enum, level: G

Default language setting, options are en (English) or zh (Chinese), default is en.

This parameter affects the language preference of some Pigsty-generated configurations and content, such as the initial language setting of Grafana dashboards.

If you are a Chinese user, it is recommended to set this parameter to zh for a better Chinese support experience.

proxy_env

name: proxy_env, type: dict, level: G

Global proxy environment variables used when downloading packages, default value specifies no_proxy, which is the list of addresses that should not use a proxy:

proxy_env:
  no_proxy: "localhost,127.0.0.1,10.0.0.0/8,192.168.0.0/16,*.pigsty,*.aliyun.com,mirrors.aliyuncs.com,mirrors.tuna.tsinghua.edu.cn,mirrors.zju.edu.cn"
  #http_proxy: 'http://username:[email protected]'
  #https_proxy: 'http://username:[email protected]'
  #all_proxy: 'http://username:[email protected]'

When installing from the Internet in mainland China, certain packages may be blocked. You can use a proxy to solve this problem.

Note that if the Docker module is used, the proxy server configuration here will also be written to the Docker Daemon configuration file.

Note that if the -x parameter is specified during ./configure, the proxy configuration information in the current environment will be automatically filled into the generated pigsty.yaml file.


CA

Pigsty uses self-signed CA certificates to support advanced security features such as HTTPS access, PostgreSQL SSL connections, etc.

ca_create: true                   # create CA if not exists? default true
ca_cn: pigsty-ca                  # CA CN name, fixed as pigsty-ca
cert_validity: 7300d              # certificate validity, default 20 years

ca_create

name: ca_create, type: bool, level: G

Create CA if not exists? Default value is true.

When set to true, if the CA public-private key pair does not exist in the files/pki/ca directory, Pigsty will automatically create a new CA.

If you already have a CA public-private key pair, you can copy them to the files/pki/ca directory:

  • files/pki/ca/ca.crt: CA public key certificate
  • files/pki/ca/ca.key: CA private key file

Pigsty will use the existing CA key pair instead of creating a new one. If the CA does not exist and this parameter is set to false, an error will occur.

Be sure to retain and backup the newly generated CA private key file during deployment, as it is crucial for issuing new certificates later.

Note: Pigsty v3.x used the ca_method parameter (with values create/recreate/copy), v4.0 simplifies this to the boolean ca_create.

ca_cn

name: ca_cn, type: string, level: G

CA CN (Common Name), fixed as pigsty-ca, not recommended to modify.

You can use the following command to view the Pigsty CA certificate details on a node:

openssl x509 -text -in /etc/pki/ca.crt

cert_validity

name: cert_validity, type: interval, level: G

Certificate validity period for issued certificates, default is 20 years, sufficient for most scenarios. Default value: 7300d

This parameter affects the validity of all certificates issued by the Pigsty CA, including:

  • PostgreSQL server certificates
  • Patroni API certificates
  • etcd server/client certificates
  • Other internal service certificates

Note: The validity of HTTPS certificates used by Nginx is controlled separately by nginx_cert_validity, because modern browsers have stricter requirements for website certificate validity (maximum 397 days).


INFRA_ID

Infrastructure identity and portal definition.

#infra_seq: 1                     # infra node sequence, REQUIRED identity parameter
infra_portal:                     # infrastructure services exposed via Nginx portal
  home : { domain: i.pigsty }     # default home server definition
infra_data: /data/infra           # infrastructure default data directory

infra_seq

name: infra_seq, type: int, level: I

Infrastructure node sequence number, REQUIRED identity parameter that must be explicitly specified on infrastructure nodes, so no default value is provided.

This parameter is used to uniquely identify each node in multi-infrastructure node deployments, typically using positive integers starting from 1.

Example configuration:

infra:
  hosts:
    10.10.10.10: { infra_seq: 1 }
    10.10.10.11: { infra_seq: 2 }

infra_portal

name: infra_portal, type: dict, level: G

Infrastructure services exposed via Nginx portal. The v4.0 default value is very concise:

infra_portal:
  home : { domain: i.pigsty }     # default home server definition

Pigsty will automatically configure the corresponding reverse proxies based on the actually enabled components. Users typically only need to define the home domain name.

Each record consists of a Key and a Value dictionary, where name is the key representing the component name, and the value is an object that can configure the following parameters:

  • name: REQUIRED, specifies the name of the Nginx server
    • Default record: home is a fixed name, please do not modify it.
    • Used as part of the Nginx configuration file name, corresponding to: /etc/nginx/conf.d/<name>.conf
    • Nginx servers without a domain field will not generate configuration files but will be used as references.
  • domain: OPTIONAL, when the service needs to be exposed via Nginx, this is a REQUIRED field specifying the domain name to use
    • In Pigsty self-signed Nginx HTTPS certificates, the domain will be added to the SAN field of the Nginx SSL certificate
    • Pigsty web page cross-references will use the default domain name here
  • endpoint: Usually used as an alternative to path, specifies the upstream server address. Setting endpoint indicates this is a reverse proxy server
    • ${admin_ip} can be used as a placeholder in the configuration and will be dynamically replaced with admin_ip during deployment
    • Default reverse proxy servers use endpoint.conf as the configuration template
    • Reverse proxy servers can also configure websocket and schema parameters
  • path: Usually used as an alternative to endpoint, specifies the local file server path. Setting path indicates this is a local web server
    • Local web servers use path.conf as the configuration template
    • Local web servers can also configure the index parameter to enable file index pages
  • certbot: Certbot certificate name; if configured, Certbot will be used to apply for certificates
    • If multiple servers specify the same certbot, Pigsty will merge certificate applications; the final certificate name will be this certbot value
  • cert: Certificate file path; if configured, will override the default certificate path
  • key: Certificate key file path; if configured, will override the default certificate key path
  • websocket: Whether to enable WebSocket support
    • Only reverse proxy servers can configure this parameter; if enabled, upstream WebSocket connections will be allowed
  • schema: Protocol used by the upstream server; if configured, will override the default protocol
    • Default is http; if configured as https, it will force HTTPS connections to the upstream server
  • index: Whether to enable file index pages
    • Only local web servers can configure this parameter; if enabled, autoindex configuration will be enabled to automatically generate directory index pages
  • log: Nginx log file path
    • If specified, access logs will be written to this file; otherwise, the default log file will be used based on server type
    • Reverse proxy servers use /var/log/nginx/<name>.log as the default log file path
    • Local web servers use the default Access log
  • conf: Nginx configuration file path
    • Explicitly specifies the configuration template file to use, located in roles/infra/templates/nginx or templates/nginx directory
    • If this parameter is not specified, the default configuration template will be used
  • config: Nginx configuration code block
    • Configuration text directly injected into the Nginx Server configuration block
  • enforce_https: Redirect HTTP to HTTPS
    • Global configuration can be specified via nginx_sslmode: enforce
    • This configuration does not affect the default home server, which will always listen on both ports 80 and 443 to ensure compatibility

infra_data

name: infra_data, type: path, level: G

Infrastructure data directory, default value is /data/infra.

This directory is used to store data files for infrastructure components, including:

  • VictoriaMetrics time series database data
  • VictoriaLogs log data
  • VictoriaTraces trace data
  • Other infrastructure component persistent data

It is recommended to place this directory on a separate data disk for easier management and expansion.


REPO

This section is about local software repository configuration. Pigsty enables a local software repository (APT/YUM) on infrastructure nodes by default.

During initialization, Pigsty downloads all packages and their dependencies (specified by repo_packages) from the Internet upstream repository (specified by repo_upstream) to {{ nginx_home }} / {{ repo_name }} (default /www/pigsty). The total size of all software and dependencies is approximately 1GB.

When creating the local repository, if it already exists (determined by the presence of a marker file named repo_complete in the repository directory), Pigsty will consider the repository already built, skip the software download phase, and directly use the built repository.

If some packages download too slowly, you can set a download proxy using the proxy_env configuration to complete the initial download, or directly download the pre-packaged offline package, which is essentially a local software repository built on the same operating system.

repo_enabled: true                # create local repo on this infra node?
repo_home: /www                   # repo home directory, default /www
repo_name: pigsty                 # repo name, default pigsty
repo_endpoint: http://${admin_ip}:80 # repo access endpoint
repo_remove: true                 # remove existing upstream repo definitions
repo_modules: infra,node,pgsql    # enabled upstream repo modules
#repo_upstream: []                # upstream repo definitions (inherited from OS variables)
#repo_packages: []                # packages to download (inherited from OS variables)
#repo_extra_packages: []          # extra packages to download
repo_url_packages: []             # extra packages downloaded via URL

repo_enabled

name: repo_enabled, type: bool, level: G/I

Create a local software repository on this infrastructure node? Default is true, meaning all Infra nodes will set up a local software repository.

If you have multiple infrastructure nodes, you can keep only 1-2 nodes as software repositories; other nodes can set this parameter to false to avoid duplicate software download builds.

repo_home

name: repo_home, type: path, level: G

Local software repository home directory, defaults to Nginx’s root directory: /www.

This directory is actually a symlink pointing to nginx_data. It’s not recommended to modify this directory. If modified, it should be consistent with nginx_home.

repo_name

name: repo_name, type: string, level: G

Local repository name, default is pigsty. Changing this repository name is not recommended.

The final repository path is {{ repo_home }}/{{ repo_name }}, defaulting to /www/pigsty.

repo_endpoint

name: repo_endpoint, type: url, level: G

Endpoint used by other nodes to access this repository, default value: http://${admin_ip}:80.

Pigsty starts Nginx on infrastructure nodes at ports 80/443 by default, providing local software repository (static files) service.

If you modify nginx_port or nginx_ssl_port, or use a different infrastructure node from the control node, adjust this parameter accordingly.

If you use a domain name, you can add resolution in node_default_etc_hosts, node_etc_hosts, or dns_records.

repo_remove

name: repo_remove, type: bool, level: G/A

Remove existing upstream repository definitions when building the local repository? Default value: true.

When this parameter is enabled, all existing repository files in /etc/yum.repos.d will be moved and backed up to /etc/yum.repos.d/backup. On Debian systems, /etc/apt/sources.list and /etc/apt/sources.list.d are removed and backed up to /etc/apt/backup.

Since existing OS sources have uncontrollable content, using Pigsty-validated upstream software sources can improve the success rate and speed of downloading packages from the Internet.

In certain situations (e.g., your OS is some EL/Deb compatible variant that uses private sources for many packages), you may need to keep existing upstream repository definitions. In such cases, set this parameter to false.

repo_modules

name: repo_modules, type: string, level: G/A

Which upstream repository modules will be added to the local software source, default value: infra,node,pgsql

When Pigsty attempts to add upstream repositories, it filters entries in repo_upstream based on this parameter’s value. Only entries whose module field matches this parameter’s value will be added to the local software source.

Modules are comma-separated. Available module lists can be found in the repo_upstream definitions; common modules include:

  • local: Local Pigsty repository
  • infra: Infrastructure packages (Nginx, Docker, etc.)
  • node: OS base packages
  • pgsql: PostgreSQL-related packages
  • extra: Extra PostgreSQL extensions
  • docker: Docker-related
  • redis: Redis-related
  • mongo: MongoDB-related
  • mysql: MySQL-related
  • etc…

repo_upstream

name: repo_upstream, type: upstream[], level: G

Where to download upstream packages when building the local repository? This parameter has no default value. If not explicitly specified by the user in the configuration file, it will be loaded from the repo_upstream_default variable defined in roles/node_id/vars based on the current node’s OS family.

Pigsty provides complete upstream repository definitions for different OS versions (EL8/9/10, Debian 11/12/13, Ubuntu 22/24), including:

  • OS base repositories (BaseOS, AppStream, EPEL, etc.)
  • PostgreSQL official PGDG repository
  • Pigsty extension repository
  • Various third-party software repositories (Docker, Nginx, Grafana, etc.)

Each upstream repository definition contains the following fields:

- name: pigsty-pgsql              # repository name
  description: 'Pigsty PGSQL'     # repository description
  module: pgsql                   # module it belongs to
  releases: [8,9,10]              # supported OS versions
  arch: [x86_64, aarch64]         # supported CPU architectures
  baseurl:                        # repository URL, configured by region
    default: 'https://repo.pigsty.io/yum/pgsql/el$releasever.$basearch'
    china: 'https://repo.pigsty.cc/yum/pgsql/el$releasever.$basearch'

Users typically don’t need to modify this parameter unless they have special repository requirements. For detailed repository definitions, refer to the configuration files for corresponding operating systems in the roles/node_id/vars/ directory.

repo_packages

name: repo_packages, type: string[], level: G

String array type, where each line is a space-separated list of software packages, specifying packages (and their dependencies) to download using repotrack or apt download.

This parameter has no default value, meaning its default state is undefined. If not explicitly defined, Pigsty will load the default from the repo_packages_default variable defined in roles/node_id/vars:

[ node-bootstrap, infra-package, infra-addons, node-package1, node-package2, pgsql-utility, extra-modules ]

Each element in this parameter will be translated according to the package_map in the above files, based on the specific OS distro major version. For example, on EL systems it translates to:

node-bootstrap:          "ansible python3 python3-pip python3-virtualenv python3-requests python3-jmespath python3-cryptography dnf-utils modulemd-tools createrepo_c sshpass"
infra-package:           "nginx dnsmasq etcd haproxy vip-manager node_exporter keepalived_exporter pg_exporter pgbackrest_exporter redis_exporter redis minio mcli pig"
infra-addons:            "grafana grafana-plugins grafana-victoriametrics-ds grafana-victorialogs-ds victoria-metrics victoria-logs victoria-traces vlogscli vmutils vector alertmanager"

As a convention, repo_packages typically includes packages unrelated to the PostgreSQL major version (such as Infra, Node, and PGDG Common parts), while PostgreSQL major version-related packages (kernel, extensions) are usually specified in repo_extra_packages to facilitate switching PG major versions.

repo_extra_packages

name: repo_extra_packages, type: string[], level: G/C/I

Used to specify additional packages to download without modifying repo_packages (typically PG major version-related packages), default value is an empty list.

If not explicitly defined, Pigsty will load the default from the repo_extra_packages_default variable defined in roles/node_id/vars:

[ pgsql-main ]

Elements in this parameter undergo package name translation, where $v will be replaced with pg_version, i.e., the current PG major version (default 18).

The pgsql-main here translates on EL systems to:

postgresql$v postgresql$v-server postgresql$v-libs postgresql$v-contrib postgresql$v-plperl postgresql$v-plpython3 postgresql$v-pltcl postgresql$v-llvmjit pg_repack_$v* wal2json_$v* pgvector_$v*

Users can typically specify PostgreSQL major version-related packages here without affecting the other PG version-independent packages defined in repo_packages.

repo_url_packages

name: repo_url_packages, type: object[] | string[], level: G

Packages downloaded directly from the Internet using URLs, default is an empty array: []

You can use URL strings directly as array elements in this parameter, or use object structures to explicitly specify URLs and filenames.

Note that this parameter is affected by the region variable. If you’re in mainland China, Pigsty will automatically replace URLs, changing repo.pigsty.io to repo.pigsty.cc.


INFRA_PACKAGE

These packages are installed only on INFRA nodes, including regular RPM/DEB packages and PIP packages.

infra_packages

name: infra_packages, type: string[], level: G

String array type, where each line is a space-separated list of software packages, specifying packages to install on Infra nodes.

This parameter has no default value, meaning its default state is undefined. If not explicitly specified by the user in the configuration file, Pigsty will load the default from the infra_packages_default variable defined in roles/node_id/vars based on the current node’s OS family.

v4.0 default value (EL operating systems):

infra_packages_default:
  - grafana,grafana-plugins,grafana-victorialogs-ds,grafana-victoriametrics-ds,victoria-metrics,victoria-logs,victoria-traces,vmutils,vlogscli,alertmanager
  - node_exporter,blackbox_exporter,nginx_exporter,pg_exporter,pev2,nginx,dnsmasq,ansible,etcd,python3-requests,redis,mcli,restic,certbot,python3-certbot-nginx

Default value (Debian/Ubuntu):

infra_packages_default:
  - grafana,grafana-plugins,grafana-victorialogs-ds,grafana-victoriametrics-ds,victoria-metrics,victoria-logs,victoria-traces,vmutils,vlogscli,alertmanager
  - node-exporter,blackbox-exporter,nginx-exporter,pg-exporter,pev2,nginx,dnsmasq,ansible,etcd,python3-requests,redis,mcli,restic,certbot,python3-certbot-nginx

Note: v4.0 uses the VictoriaMetrics suite to replace Prometheus and Loki, so the package list differs significantly from v3.x.

infra_packages_pip

name: infra_packages_pip, type: string, level: G

Additional packages to install using pip on Infra nodes, package names separated by commas. Default value is an empty string, meaning no additional python packages are installed.

Example:

infra_packages_pip: 'requests,boto3,awscli'

NGINX

Pigsty proxies all web service access through Nginx: Home Page, Grafana, VictoriaMetrics, etc., as well as other optional tools like PGWeb, Jupyter Lab, Pgadmin, Bytebase, and static resources and reports like pev, schemaspy, and pgbadger.

Most importantly, Nginx also serves as the web server for the local software repository (Yum/Apt), used to store and distribute Pigsty packages.

nginx_enabled: true               # enable Nginx on this infra node?
nginx_clean: false                # clean existing Nginx config during init?
nginx_exporter_enabled: true      # enable nginx_exporter?
nginx_exporter_port: 9113         # nginx_exporter listen port
nginx_sslmode: enable             # SSL mode: disable,enable,enforce
nginx_cert_validity: 397d         # self-signed cert validity
nginx_home: /www                  # Nginx content directory (symlink)
nginx_data: /data/nginx           # Nginx actual data directory
nginx_users: {}                   # basic auth users dictionary
nginx_port: 80                    # HTTP port
nginx_ssl_port: 443               # HTTPS port
certbot_sign: false               # sign cert with certbot?
certbot_email: [email protected]     # certbot email
certbot_options: ''               # certbot extra options

nginx_enabled

name: nginx_enabled, type: bool, level: G/I

Enable Nginx on this Infra node? Default value: true.

Nginx is a core component of Pigsty infrastructure, responsible for:

  • Providing local software repository service
  • Reverse proxying Grafana, VictoriaMetrics, and other web services
  • Hosting static files and reports

nginx_clean

name: nginx_clean, type: bool, level: G/A

Clean existing Nginx configuration during initialization? Default value: false.

When set to true, all existing configuration files under /etc/nginx/conf.d/ will be deleted during Nginx initialization, ensuring a clean start.

If you’re deploying for the first time or want to completely rebuild Nginx configuration, you can set this parameter to true.

nginx_exporter_enabled

name: nginx_exporter_enabled, type: bool, level: G/I

Enable nginx_exporter on this infrastructure node? Default value: true.

If this option is disabled, the /nginx health check stub will also be disabled. Consider disabling this when your Nginx version doesn’t support this feature.

nginx_exporter_port

name: nginx_exporter_port, type: port, level: G

nginx_exporter listen port, default value is 9113.

nginx_exporter is used to collect Nginx operational metrics for VictoriaMetrics to scrape and monitor.

nginx_sslmode

name: nginx_sslmode, type: enum, level: G

Nginx SSL operating mode. Three options: disable, enable, enforce, default value is enable, meaning SSL is enabled but not enforced.

  • disable: Only listen on the port specified by nginx_port to serve HTTP requests.
  • enable: Also listen on the port specified by nginx_ssl_port to serve HTTPS requests.
  • enforce: All links will be rendered to use https:// by default
    • Also redirect port 80 to port 443 for non-default servers in infra_portal

nginx_cert_validity

name: nginx_cert_validity, type: duration, level: G

Nginx self-signed certificate validity, default value is 397d (approximately 13 months).

Modern browsers require website certificate validity to be at most 397 days, hence this default value. Setting a longer validity is not recommended, as browsers may refuse to trust such certificates.

nginx_home

name: nginx_home, type: path, level: G

Nginx server static content directory, default: /www

This is a symlink that actually points to the nginx_data directory. This directory contains static resources and software repository files.

It’s best not to modify this parameter arbitrarily. If modified, it should be consistent with the repo_home parameter.

nginx_data

name: nginx_data, type: path, level: G

Nginx actual data directory, default is /data/nginx.

This is the actual storage location for Nginx static files; nginx_home is a symlink pointing to this directory.

It’s recommended to place this directory on a data disk for easier management of large package files.

nginx_users

name: nginx_users, type: dict, level: G

Nginx Basic Authentication user dictionary, default is an empty dictionary {}.

Format is { username: password } key-value pairs, for example:

nginx_users:
  admin: pigsty
  viewer: readonly

These users can be used to protect certain Nginx endpoints that require authentication.

nginx_port

name: nginx_port, type: port, level: G

Nginx default listening port (serving HTTP), default is port 80. It’s best not to modify this parameter.

When your server’s port 80 is occupied, you can consider using another port, but you need to also modify repo_endpoint and keep node_repo_local_urls consistent with the port used here.

nginx_ssl_port

name: nginx_ssl_port, type: port, level: G

Nginx SSL default listening port, default is 443. It’s best not to modify this parameter.

certbot_sign

name: certbot_sign, type: bool, level: G/A

Use certbot to sign Nginx certificates during installation? Default value is false.

When set to true, Pigsty will use certbot to automatically apply for free SSL certificates from Let’s Encrypt during the execution of infra.yml and install.yml playbooks (in the nginx role).

For domains defined in infra_portal, if a certbot parameter is defined, Pigsty will use certbot to apply for a certificate for that domain. The certificate name will be the value of the certbot parameter. If multiple servers/domains specify the same certbot parameter, Pigsty will merge and apply for certificates for these domains, using the certbot parameter value as the certificate name.

Enabling this option requires:

  • The current node can be accessed through a public domain name, and DNS resolution is correctly pointed to the current node’s public IP
  • The current node can access the Let’s Encrypt API interface

This option is disabled by default. You can manually execute the make cert command after installation, which actually calls the rendered /etc/nginx/sign-cert script to update or apply for certificates using certbot.

certbot_email

name: certbot_email, type: string, level: G/A

Email address for receiving certificate expiration reminder emails, default value is [email protected].

When certbot_sign is set to true, it’s recommended to provide this parameter. Let’s Encrypt will send reminder emails to this address when certificates are about to expire.

certbot_options

name: certbot_options, type: string, level: G/A

Additional configuration parameters passed to certbot, default value is an empty string.

You can pass additional command-line options to certbot through this parameter, for example --dry-run, which makes certbot perform a preview and test without actually applying for certificates.


DNS

Pigsty enables DNSMASQ service on Infra nodes by default to resolve auxiliary domain names such as i.pigsty, m.pigsty, api.pigsty, etc., and optionally sss.pigsty for MinIO.

Resolution records are stored in the /etc/hosts.d/default file on Infra nodes. To use this DNS server, you must add nameserver <ip> to /etc/resolv.conf. The node_dns_servers parameter handles this.

dns_enabled: true                 # setup dnsmasq on this infra node?
dns_port: 53                      # DNS server listen port
dns_records:                      # dynamic DNS records
  - "${admin_ip} i.pigsty"
  - "${admin_ip} m.pigsty supa.pigsty api.pigsty adm.pigsty cli.pigsty ddl.pigsty"

dns_enabled

name: dns_enabled, type: bool, level: G/I

Enable DNSMASQ service on this Infra node? Default value: true.

If you don’t want to use the default DNS server (e.g., you already have an external DNS server, or your provider doesn’t allow you to use a DNS server), you can set this value to false to disable it, and use node_default_etc_hosts and node_etc_hosts static resolution records instead.

dns_port

name: dns_port, type: port, level: G

DNSMASQ default listening port, default is 53. It’s not recommended to modify the default DNS service port.

dns_records

name: dns_records, type: string[], level: G

Dynamic DNS records resolved by dnsmasq, generally used to resolve auxiliary domain names to the admin node. These records are written to the /etc/hosts.d/default file on infrastructure nodes.

v4.0 default value:

dns_records:
  - "${admin_ip} i.pigsty"
  - "${admin_ip} m.pigsty supa.pigsty api.pigsty adm.pigsty cli.pigsty ddl.pigsty"

The ${admin_ip} placeholder is used here and will be replaced with the actual admin_ip value during deployment.

Common domain name purposes:

  • i.pigsty: Pigsty home page
  • m.pigsty: VictoriaMetrics Web UI
  • api.pigsty: API service
  • adm.pigsty: Admin service
  • Others customized based on actual deployment needs

VICTORIA

Pigsty v4.0 uses the VictoriaMetrics suite to replace Prometheus and Loki, providing a superior observability solution:

  • VictoriaMetrics: Replaces Prometheus as the time series database for storing monitoring metrics
  • VictoriaLogs: Replaces Loki as the log aggregation storage
  • VictoriaTraces: Distributed trace storage
  • VMAlert: Replaces Prometheus Alerting for alert rule evaluation
vmetrics_enabled: true            # enable VictoriaMetrics?
vmetrics_clean: false             # clean data during init?
vmetrics_port: 8428               # listen port
vmetrics_scrape_interval: 10s     # global scrape interval
vmetrics_scrape_timeout: 8s       # global scrape timeout
vmetrics_options: >-
  -retentionPeriod=15d
  -promscrape.fileSDCheckInterval=5s
vlogs_enabled: true               # enable VictoriaLogs?
vlogs_clean: false                # clean data during init?
vlogs_port: 9428                  # listen port
vlogs_options: >-
  -retentionPeriod=15d
  -retention.maxDiskSpaceUsageBytes=50GiB
  -insert.maxLineSizeBytes=1MB
  -search.maxQueryDuration=120s
vtraces_enabled: true             # enable VictoriaTraces?
vtraces_clean: false              # clean data during init?
vtraces_port: 10428               # listen port
vtraces_options: >-
  -retentionPeriod=15d
  -retention.maxDiskSpaceUsageBytes=50GiB
vmalert_enabled: true             # enable VMAlert?
vmalert_port: 8880                # listen port
vmalert_options: ''               # extra CLI options

vmetrics_enabled

name: vmetrics_enabled, type: bool, level: G/I

Enable VictoriaMetrics on this Infra node? Default value is true.

VictoriaMetrics is the core monitoring component in Pigsty v4.0, replacing Prometheus as the time series database, responsible for:

  • Scraping monitoring metrics from various exporters
  • Storing time series data
  • Providing PromQL-compatible query interface
  • Supporting Grafana data sources

vmetrics_clean

name: vmetrics_clean, type: bool, level: G/A

Clean existing VictoriaMetrics data during initialization? Default value is false.

When set to true, existing time series data will be deleted during initialization. Use this option carefully unless you’re sure you want to rebuild monitoring data.

vmetrics_port

name: vmetrics_port, type: port, level: G

VictoriaMetrics listen port, default value is 8428.

This port is used for:

  • HTTP API access
  • Web UI access
  • Prometheus-compatible remote write/read
  • Grafana data source connections

vmetrics_scrape_interval

name: vmetrics_scrape_interval, type: interval, level: G

VictoriaMetrics global metrics scrape interval, default value is 10s.

In production environments, 10-30 seconds is a suitable scrape interval. If you need finer monitoring data granularity, you can adjust this parameter, but it will increase storage and CPU overhead.

vmetrics_scrape_timeout

name: vmetrics_scrape_timeout, type: interval, level: G

VictoriaMetrics global scrape timeout, default is 8s.

Setting a scrape timeout can effectively prevent avalanches caused by monitoring system queries. The principle is that this parameter must be less than and close to vmetrics_scrape_interval to ensure each scrape duration doesn’t exceed the scrape interval.

vmetrics_options

name: vmetrics_options, type: arg, level: G

VictoriaMetrics extra command line options, default value:

vmetrics_options: >-
  -retentionPeriod=15d
  -promscrape.fileSDCheckInterval=5s

Common parameter descriptions:

  • -retentionPeriod=15d: Data retention period, default 15 days
  • -promscrape.fileSDCheckInterval=5s: File service discovery refresh interval

You can add other VictoriaMetrics-supported parameters as needed.

vlogs_enabled

name: vlogs_enabled, type: bool, level: G/I

Enable VictoriaLogs on this Infra node? Default value is true.

VictoriaLogs replaces Loki as the log aggregation storage, responsible for:

  • Receiving log data from Vector
  • Storing and indexing logs
  • Providing log query interface
  • Supporting Grafana VictoriaLogs data source

vlogs_clean

name: vlogs_clean, type: bool, level: G/A

Clean existing VictoriaLogs data during initialization? Default value is false.

vlogs_port

name: vlogs_port, type: port, level: G

VictoriaLogs listen port, default value is 9428.

vlogs_options

name: vlogs_options, type: arg, level: G

VictoriaLogs extra command line options, default value:

vlogs_options: >-
  -retentionPeriod=15d
  -retention.maxDiskSpaceUsageBytes=50GiB
  -insert.maxLineSizeBytes=1MB
  -search.maxQueryDuration=120s

Common parameter descriptions:

  • -retentionPeriod=15d: Log retention period, default 15 days
  • -retention.maxDiskSpaceUsageBytes=50GiB: Maximum disk usage
  • -insert.maxLineSizeBytes=1MB: Maximum single log line size
  • -search.maxQueryDuration=120s: Maximum query execution time

vtraces_enabled

name: vtraces_enabled, type: bool, level: G/I

Enable VictoriaTraces on this Infra node? Default value is true.

VictoriaTraces is used for distributed trace data storage and query, supporting Jaeger, Zipkin, and other trace protocols.

vtraces_clean

name: vtraces_clean, type: bool, level: G/A

Clean existing VictoriaTraces data during initialization? Default value is false.

vtraces_port

name: vtraces_port, type: port, level: G

VictoriaTraces listen port, default value is 10428.

vtraces_options

name: vtraces_options, type: arg, level: G

VictoriaTraces extra command line options, default value:

vtraces_options: >-
  -retentionPeriod=15d
  -retention.maxDiskSpaceUsageBytes=50GiB

vmalert_enabled

name: vmalert_enabled, type: bool, level: G/I

Enable VMAlert on this Infra node? Default value is true.

VMAlert is responsible for alert rule evaluation, replacing Prometheus Alerting functionality, working with Alertmanager.

vmalert_port

name: vmalert_port, type: port, level: G

VMAlert listen port, default value is 8880.

vmalert_options

name: vmalert_options, type: arg, level: G

VMAlert extra command line options, default value is an empty string.


PROMETHEUS

This section now primarily contains Blackbox Exporter and Alertmanager configuration.

Note: Pigsty v4.0 uses VictoriaMetrics to replace Prometheus. The original prometheus_* and pushgateway_* parameters have been moved to the VICTORIA section.

blackbox_enabled: true            # enable blackbox_exporter?
blackbox_port: 9115               # blackbox_exporter listen port
blackbox_options: ''              # extra CLI options
alertmanager_enabled: true        # enable alertmanager?
alertmanager_port: 9059           # alertmanager listen port
alertmanager_options: ''          # extra CLI options
exporter_metrics_path: /metrics   # exporter metrics path

blackbox_enabled

name: blackbox_enabled, type: bool, level: G/I

Enable BlackboxExporter on this Infra node? Default value is true.

BlackboxExporter sends ICMP packets to node IP addresses, VIP addresses, and PostgreSQL VIP addresses to test network connectivity. It can also perform HTTP, TCP, DNS, and other probes.

blackbox_port

name: blackbox_port, type: port, level: G

Blackbox Exporter listen port, default value is 9115.

blackbox_options

name: blackbox_options, type: arg, level: G

BlackboxExporter extra command line options, default value: empty string.

alertmanager_enabled

name: alertmanager_enabled, type: bool, level: G/I

Enable AlertManager on this Infra node? Default value is true.

AlertManager is responsible for receiving alert notifications from VMAlert and performing alert grouping, inhibition, silencing, routing, and other processing.

alertmanager_port

name: alertmanager_port, type: port, level: G

AlertManager listen port, default value is 9059.

If you modify this port, ensure you update the alertmanager entry’s endpoint configuration in infra_portal accordingly (if defined).

alertmanager_options

name: alertmanager_options, type: arg, level: G

AlertManager extra command line options, default value: empty string.

exporter_metrics_path

name: exporter_metrics_path, type: path, level: G

HTTP endpoint path where monitoring exporters expose metrics, default: /metrics. Not recommended to modify this parameter.

This parameter defines the standard path for all exporters to expose monitoring metrics.


GRAFANA

Pigsty uses Grafana as the monitoring system frontend. It can also serve as a data analysis and visualization platform, or for low-code data application development and data application prototyping.

grafana_enabled: true             # enable Grafana?
grafana_port: 3000                # Grafana listen port
grafana_clean: false              # clean data during init?
grafana_admin_username: admin     # admin username
grafana_admin_password: pigsty    # admin password
grafana_auth_proxy: false         # enable auth proxy?
grafana_pgurl: ''                 # external PostgreSQL URL
grafana_view_password: DBUser.Viewer  # PG datasource password

grafana_enabled

name: grafana_enabled, type: bool, level: G/I

Enable Grafana on Infra node? Default value: true, meaning all infrastructure nodes will install and enable Grafana by default.

grafana_port

name: grafana_port, type: port, level: G

Grafana listen port, default value is 3000.

If you need to access Grafana directly (not through Nginx reverse proxy), you can use this port.

grafana_clean

name: grafana_clean, type: bool, level: G/A

Clean Grafana data files during initialization? Default: false.

This operation removes /var/lib/grafana/grafana.db, ensuring a fresh Grafana installation.

If you want to preserve existing Grafana configuration (such as dashboards, users, data sources, etc.), set this parameter to false.

grafana_admin_username

name: grafana_admin_username, type: username, level: G

Grafana admin username, default is admin.

grafana_admin_password

name: grafana_admin_password, type: password, level: G

Grafana admin password, default is pigsty.

IMPORTANT: Be sure to change this password parameter before deploying to production!

grafana_auth_proxy

name: grafana_auth_proxy, type: bool, level: G

Enable Grafana auth proxy? Default is false.

When enabled, Grafana will trust user identity information passed by the reverse proxy (Nginx), enabling single sign-on (SSO) functionality.

This is typically used for integration with external identity authentication systems.

grafana_pgurl

name: grafana_pgurl, type: url, level: G

External PostgreSQL database URL for Grafana persistence storage. Default is an empty string.

If specified, Grafana will use this PostgreSQL database instead of the default SQLite database to store its configuration data.

Format example: postgres://grafana:password@pg-meta:5432/grafana?sslmode=disable

This is useful for scenarios requiring Grafana high availability deployment or data persistence.

grafana_view_password

name: grafana_view_password, type: password, level: G

Read-only user password used by Grafana metadb PG data source, default is DBUser.Viewer.

This password is used for Grafana to connect to the PostgreSQL CMDB data source to query metadata in read-only mode.

4 - Playbook

How to use built-in Ansible playbooks to manage the INFRA module, with a quick reference for common commands.

Pigsty provides four playbooks related to the INFRA module:

  • deploy.yml: Deploy all components on all nodes in one pass
  • infra.yml: Initialize Pigsty infrastructure on infra nodes
  • infra-rm.yml: Remove infrastructure components from infra nodes
  • install.yml: Perform a complete one-time installation of Pigsty on all nodes

deploy.yml

Deploy all components on all nodes in one pass, resolving INFRA/NODE circular dependency issues.

This playbook interleaves subtasks from infra.yml and node.yml, completing deployment of all components in the following order:

  1. id: Generate node and PostgreSQL identities
  2. ca: Create self-signed CA on localhost
  3. repo: Create local software repository on infra nodes
  4. node-init: Initialize nodes, HAProxy, and Docker
  5. infra: Initialize Nginx, DNS, VictoriaMetrics, Grafana, etc.
  6. node-monitor: Initialize node-exporter, vector
  7. etcd: Initialize etcd (required for PostgreSQL HA)
  8. minio: Initialize MinIO (optional)
  9. pgsql: Initialize PostgreSQL clusters
  10. pgsql-monitor: Initialize PostgreSQL monitoring

This playbook is equivalent to executing the following four playbooks sequentially:

./infra.yml -l infra    # Deploy infrastructure on infra group
./node.yml              # Initialize all nodes
./etcd.yml              # Initialize etcd cluster
./pgsql.yml             # Initialize PostgreSQL clusters

infra.yml

Initialize the infrastructure module on Infra nodes defined in the infra group of your configuration file.

This playbook performs the following tasks:

  • Configures directories and environment variables on Infra nodes
  • Downloads and creates a local software repository to accelerate subsequent installations
  • Incorporates the current Infra node as a common node managed by Pigsty
  • Deploys infrastructure components (VictoriaMetrics/Logs/Traces, VMAlert, Grafana, Alertmanager, Blackbox Exporter, etc.)

Playbook notes:

  • This is an idempotent playbook - repeated execution will overwrite infrastructure components on Infra nodes
  • To preserve historical monitoring data, set vmetrics_clean, vlogs_clean, vtraces_clean to false beforehand
  • Unless grafana_clean is set to false, Grafana dashboards and configuration changes will be lost
  • When the local software repository /www/pigsty/repo_complete exists, this playbook skips downloading software from the internet
  • Complete execution takes approximately 1-3 minutes, depending on machine configuration and network conditions

Available Tasks

# ca: create self-signed CA on localhost files/pki
#   - ca_dir        : create CA directory
#   - ca_private    : generate ca private key: files/pki/ca/ca.key
#   - ca_cert       : signing ca cert: files/pki/ca/ca.crt
#
# id: generate node identity
#
# repo: bootstrap a local yum repo from internet or offline packages
#   - repo_dir      : create repo directory
#   - repo_check    : check repo exists
#   - repo_prepare  : use existing repo if exists
#   - repo_build    : build repo from upstream if not exists
#     - repo_upstream    : handle upstream repo files in /etc/yum.repos.d
#       - repo_remove    : remove existing repo file if repo_remove == true
#       - repo_add       : add upstream repo files to /etc/yum.repos.d
#     - repo_url_pkg     : download packages from internet defined by repo_url_packages
#     - repo_cache       : make upstream yum cache with yum makecache
#     - repo_boot_pkg    : install bootstrap pkg such as createrepo_c,yum-utils,...
#     - repo_pkg         : download packages & dependencies from upstream repo
#     - repo_create      : create a local yum repo with createrepo_c & modifyrepo_c
#     - repo_use         : add newly built repo into /etc/yum.repos.d
#   - repo_nginx    : launch a nginx for repo if no nginx is serving
#
# node/haproxy/docker/monitor: setup infra node as a common node
#   - node_name, node_hosts, node_resolv, node_firewall, node_ca, node_repo, node_pkg
#   - node_feature, node_kernel, node_tune, node_sysctl, node_profile, node_ulimit
#   - node_data, node_admin, node_timezone, node_ntp, node_crontab, node_vip
#   - haproxy_install, haproxy_config, haproxy_launch, haproxy_reload
#   - docker_install, docker_admin, docker_config, docker_launch, docker_image
#   - haproxy_register, node_exporter, node_register, vector
#
# infra: setup infra components
#   - infra_env      : env_dir, env_pg, env_pgadmin, env_var
#   - infra_pkg      : infra_pkg_yum, infra_pkg_pip
#   - infra_user     : setup infra os user group
#   - infra_cert     : issue cert for infra components
#   - dns            : dns_config, dns_record, dns_launch
#   - nginx          : nginx_config, nginx_cert, nginx_static, nginx_launch, nginx_certbot, nginx_reload, nginx_exporter
#   - victoria       : vmetrics_config, vmetrics_launch, vlogs_config, vlogs_launch, vtraces_config, vtraces_launch, vmalert_config, vmalert_launch
#   - alertmanager   : alertmanager_config, alertmanager_launch
#   - blackbox       : blackbox_config, blackbox_launch
#   - grafana        : grafana_clean, grafana_config, grafana_launch, grafana_provision
#   - infra_register : register infra components to victoria

infra-rm.yml

Remove Pigsty infrastructure from Infra nodes defined in the infra group of your configuration file.

Common subtasks include:

./infra-rm.yml               # Remove the INFRA module
./infra-rm.yml -t service    # Stop infrastructure services on INFRA
./infra-rm.yml -t data       # Remove retained data on INFRA
./infra-rm.yml -t package    # Uninstall packages installed on INFRA

install.yml

Perform a complete one-time installation of Pigsty on all nodes.

This playbook is described in more detail in Playbook: One-Pass Installation.

5 - Monitoring

How to perform self-monitoring of infrastructure in Pigsty?

This document describes monitoring dashboards and alert rules for the INFRA module in Pigsty.


Dashboards

Pigsty provides the following monitoring dashboards for the Infra module:

DashboardDescription
Pigsty HomePigsty monitoring system homepage
INFRA OverviewPigsty infrastructure self-monitoring overview
Nginx InstanceNginx metrics and logs
Grafana InstanceGrafana metrics and logs
VictoriaMetrics InstanceVictoriaMetrics scraping/query status
VMAlert InstanceAlert rule execution status
Alertmanager InstanceAlert aggregation and notifications
VictoriaLogs InstanceLog ingestion, querying, and indexing
Logs InstanceView log information on a single node
VictoriaTraces InstanceTrace storage and querying
Inventory CMDBCMDB visualization
ETCD Overviewetcd cluster monitoring

Alert Rules

Pigsty provides the following two alert rules for the INFRA module:

Alert RuleDescription
InfraDownInfrastructure component is down
AgentDownMonitoring agent is down

You can modify or add new infrastructure alert rules in files/victoria/rules/infra.yml.

Alert Rule Configuration

################################################################
#                Infrastructure Alert Rules                    #
################################################################
- name: infra-alert
  rules:

    #==============================================================#
    #                       Infra Aliveness                        #
    #==============================================================#
    # infra components (victoria,grafana) down for 1m triggers a P1 alert
    - alert: InfraDown
      expr: infra_up < 1
      for: 1m
      labels: { level: 0, severity: CRIT, category: infra }
      annotations:
        summary: "CRIT InfraDown {{ $labels.type }}@{{ $labels.instance }}"
        description: |
          infra_up[type={{ $labels.type }}, instance={{ $labels.instance }}] = {{ $value  | printf "%.2f" }} < 1

    #==============================================================#
    #                       Agent Aliveness                        #
    #==============================================================#

    # agent aliveness are determined directly by exporter aliveness
    # including: node_exporter, pg_exporter, pgbouncer_exporter, haproxy_exporter
    - alert: AgentDown
      expr: agent_up < 1
      for: 1m
      labels: { level: 0, severity: CRIT, category: infra }
      annotations:
        summary: 'CRIT AgentDown {{ $labels.ins }}@{{ $labels.instance }}'
        description: |
          agent_up[ins={{ $labels.ins }}, instance={{ $labels.instance }}] = {{ $value  | printf "%.2f" }} < 1

6 - Metrics

Complete list of monitoring metrics provided by the Pigsty INFRA module

Note: Pigsty v4.0 has replaced Prometheus/Loki with VictoriaMetrics/Logs/Traces. The following metric list is still based on v3.x generation, for reference when troubleshooting older versions only. To get the latest metrics, query directly in https://p.pigsty (VMUI) or Grafana. Future versions will regenerate metric reference sheets consistent with the Victoria suite.

INFRA Metrics

The INFRA module has 964 available metrics.

Metric NameTypeLabelsDescription
alertmanager_alertsgaugeins, instance, ip, job, cls, stateHow many alerts by state.
alertmanager_alerts_invalid_totalcounterversion, ins, instance, ip, job, clsThe total number of received alerts that were invalid.
alertmanager_alerts_received_totalcounterversion, ins, instance, ip, status, job, clsThe total number of received alerts.
alertmanager_build_infogaugerevision, version, ins, instance, ip, tags, goarch, goversion, job, cls, branch, goosA metric with a constant ‘1’ value labeled by version, revision, branch, goversion from which alertmanager was built, and the goos and goarch for the build.
alertmanager_cluster_alive_messages_totalcounterins, instance, ip, peer, job, clsTotal number of received alive messages.
alertmanager_cluster_enabledgaugeins, instance, ip, job, clsIndicates whether the clustering is enabled or not.
alertmanager_cluster_failed_peersgaugeins, instance, ip, job, clsNumber indicating the current number of failed peers in the cluster.
alertmanager_cluster_health_scoregaugeins, instance, ip, job, clsHealth score of the cluster. Lower values are better and zero means ’totally healthy’.
alertmanager_cluster_membersgaugeins, instance, ip, job, clsNumber indicating current number of members in cluster.
alertmanager_cluster_messages_pruned_totalcounterins, instance, ip, job, clsTotal number of cluster messages pruned.
alertmanager_cluster_messages_queuedgaugeins, instance, ip, job, clsNumber of cluster messages which are queued.
alertmanager_cluster_messages_received_size_totalcounterins, instance, ip, msg_type, job, clsTotal size of cluster messages received.
alertmanager_cluster_messages_received_totalcounterins, instance, ip, msg_type, job, clsTotal number of cluster messages received.
alertmanager_cluster_messages_sent_size_totalcounterins, instance, ip, msg_type, job, clsTotal size of cluster messages sent.
alertmanager_cluster_messages_sent_totalcounterins, instance, ip, msg_type, job, clsTotal number of cluster messages sent.
alertmanager_cluster_peer_infogaugeins, instance, ip, peer, job, clsA metric with a constant ‘1’ value labeled by peer name.
alertmanager_cluster_peers_joined_totalcounterins, instance, ip, job, clsA counter of the number of peers that have joined.
alertmanager_cluster_peers_left_totalcounterins, instance, ip, job, clsA counter of the number of peers that have left.
alertmanager_cluster_peers_update_totalcounterins, instance, ip, job, clsA counter of the number of peers that have updated metadata.
alertmanager_cluster_reconnections_failed_totalcounterins, instance, ip, job, clsA counter of the number of failed cluster peer reconnection attempts.
alertmanager_cluster_reconnections_totalcounterins, instance, ip, job, clsA counter of the number of cluster peer reconnections.
alertmanager_cluster_refresh_join_failed_totalcounterins, instance, ip, job, clsA counter of the number of failed cluster peer joined attempts via refresh.
alertmanager_cluster_refresh_join_totalcounterins, instance, ip, job, clsA counter of the number of cluster peer joined via refresh.
alertmanager_config_hashgaugeins, instance, ip, job, clsHash of the currently loaded alertmanager configuration.
alertmanager_config_last_reload_success_timestamp_secondsgaugeins, instance, ip, job, clsTimestamp of the last successful configuration reload.
alertmanager_config_last_reload_successfulgaugeins, instance, ip, job, clsWhether the last configuration reload attempt was successful.
alertmanager_dispatcher_aggregation_groupsgaugeins, instance, ip, job, clsNumber of active aggregation groups
alertmanager_dispatcher_alert_processing_duration_seconds_countUnknownins, instance, ip, job, clsN/A
alertmanager_dispatcher_alert_processing_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
alertmanager_http_concurrency_limit_exceeded_totalcounterins, instance, method, ip, job, clsTotal number of times an HTTP request failed because the concurrency limit was reached.
alertmanager_http_request_duration_seconds_bucketUnknownins, instance, method, ip, le, job, cls, handlerN/A
alertmanager_http_request_duration_seconds_countUnknownins, instance, method, ip, job, cls, handlerN/A
alertmanager_http_request_duration_seconds_sumUnknownins, instance, method, ip, job, cls, handlerN/A
alertmanager_http_requests_in_flightgaugeins, instance, method, ip, job, clsCurrent number of HTTP requests being processed.
alertmanager_http_response_size_bytes_bucketUnknownins, instance, method, ip, le, job, cls, handlerN/A
alertmanager_http_response_size_bytes_countUnknownins, instance, method, ip, job, cls, handlerN/A
alertmanager_http_response_size_bytes_sumUnknownins, instance, method, ip, job, cls, handlerN/A
alertmanager_integrationsgaugeins, instance, ip, job, clsNumber of configured integrations.
alertmanager_marked_alertsgaugeins, instance, ip, job, cls, stateHow many alerts by state are currently marked in the Alertmanager regardless of their expiry.
alertmanager_nflog_gc_duration_seconds_countUnknownins, instance, ip, job, clsN/A
alertmanager_nflog_gc_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
alertmanager_nflog_gossip_messages_propagated_totalcounterins, instance, ip, job, clsNumber of received gossip messages that have been further gossiped.
alertmanager_nflog_maintenance_errors_totalcounterins, instance, ip, job, clsHow many maintenances were executed for the notification log that failed.
alertmanager_nflog_maintenance_totalcounterins, instance, ip, job, clsHow many maintenances were executed for the notification log.
alertmanager_nflog_queries_totalcounterins, instance, ip, job, clsNumber of notification log queries were received.
alertmanager_nflog_query_duration_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
alertmanager_nflog_query_duration_seconds_countUnknownins, instance, ip, job, clsN/A
alertmanager_nflog_query_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
alertmanager_nflog_query_errors_totalcounterins, instance, ip, job, clsNumber notification log received queries that failed.
alertmanager_nflog_snapshot_duration_seconds_countUnknownins, instance, ip, job, clsN/A
alertmanager_nflog_snapshot_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
alertmanager_nflog_snapshot_size_bytesgaugeins, instance, ip, job, clsSize of the last notification log snapshot in bytes.
alertmanager_notification_latency_seconds_bucketUnknownintegration, ins, instance, ip, le, job, clsN/A
alertmanager_notification_latency_seconds_countUnknownintegration, ins, instance, ip, job, clsN/A
alertmanager_notification_latency_seconds_sumUnknownintegration, ins, instance, ip, job, clsN/A
alertmanager_notification_requests_failed_totalcounterintegration, ins, instance, ip, job, clsThe total number of failed notification requests.
alertmanager_notification_requests_totalcounterintegration, ins, instance, ip, job, clsThe total number of attempted notification requests.
alertmanager_notifications_failed_totalcounterintegration, ins, instance, ip, reason, job, clsThe total number of failed notifications.
alertmanager_notifications_totalcounterintegration, ins, instance, ip, job, clsThe total number of attempted notifications.
alertmanager_oversize_gossip_message_duration_seconds_bucketUnknownins, instance, ip, le, key, job, clsN/A
alertmanager_oversize_gossip_message_duration_seconds_countUnknownins, instance, ip, key, job, clsN/A
alertmanager_oversize_gossip_message_duration_seconds_sumUnknownins, instance, ip, key, job, clsN/A
alertmanager_oversized_gossip_message_dropped_totalcounterins, instance, ip, key, job, clsNumber of oversized gossip messages that were dropped due to a full message queue.
alertmanager_oversized_gossip_message_failure_totalcounterins, instance, ip, key, job, clsNumber of oversized gossip message sends that failed.
alertmanager_oversized_gossip_message_sent_totalcounterins, instance, ip, key, job, clsNumber of oversized gossip message sent.
alertmanager_peer_positiongaugeins, instance, ip, job, clsPosition the Alertmanager instance believes it’s in. The position determines a peer’s behavior in the cluster.
alertmanager_receiversgaugeins, instance, ip, job, clsNumber of configured receivers.
alertmanager_silencesgaugeins, instance, ip, job, cls, stateHow many silences by state.
alertmanager_silences_gc_duration_seconds_countUnknownins, instance, ip, job, clsN/A
alertmanager_silences_gc_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
alertmanager_silences_gossip_messages_propagated_totalcounterins, instance, ip, job, clsNumber of received gossip messages that have been further gossiped.
alertmanager_silences_maintenance_errors_totalcounterins, instance, ip, job, clsHow many maintenances were executed for silences that failed.
alertmanager_silences_maintenance_totalcounterins, instance, ip, job, clsHow many maintenances were executed for silences.
alertmanager_silences_queries_totalcounterins, instance, ip, job, clsHow many silence queries were received.
alertmanager_silences_query_duration_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
alertmanager_silences_query_duration_seconds_countUnknownins, instance, ip, job, clsN/A
alertmanager_silences_query_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
alertmanager_silences_query_errors_totalcounterins, instance, ip, job, clsHow many silence received queries did not succeed.
alertmanager_silences_snapshot_duration_seconds_countUnknownins, instance, ip, job, clsN/A
alertmanager_silences_snapshot_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
alertmanager_silences_snapshot_size_bytesgaugeins, instance, ip, job, clsSize of the last silence snapshot in bytes.
blackbox_exporter_build_infogaugerevision, version, ins, instance, ip, tags, goarch, goversion, job, cls, branch, goosA metric with a constant ‘1’ value labeled by version, revision, branch, goversion from which blackbox_exporter was built, and the goos and goarch for the build.
blackbox_exporter_config_last_reload_success_timestamp_secondsgaugeins, instance, ip, job, clsTimestamp of the last successful configuration reload.
blackbox_exporter_config_last_reload_successfulgaugeins, instance, ip, job, clsBlackbox exporter config loaded successfully.
blackbox_module_unknown_totalcounterins, instance, ip, job, clsCount of unknown modules requested by probes
cortex_distributor_ingester_clientsgaugeins, instance, ip, job, clsThe current number of ingester clients.
cortex_dns_failures_totalUnknownins, instance, ip, job, clsN/A
cortex_dns_lookups_totalUnknownins, instance, ip, job, clsN/A
cortex_frontend_query_range_duration_seconds_bucketUnknownins, instance, method, ip, le, job, cls, status_codeN/A
cortex_frontend_query_range_duration_seconds_countUnknownins, instance, method, ip, job, cls, status_codeN/A
cortex_frontend_query_range_duration_seconds_sumUnknownins, instance, method, ip, job, cls, status_codeN/A
cortex_ingester_flush_queue_lengthgaugeins, instance, ip, job, clsThe total number of series pending in the flush queue.
cortex_kv_request_duration_seconds_bucketUnknownins, instance, role, ip, le, kv_name, type, operation, job, cls, status_codeN/A
cortex_kv_request_duration_seconds_countUnknownins, instance, role, ip, kv_name, type, operation, job, cls, status_codeN/A
cortex_kv_request_duration_seconds_sumUnknownins, instance, role, ip, kv_name, type, operation, job, cls, status_codeN/A
cortex_member_consul_heartbeats_totalUnknownins, instance, ip, job, clsN/A
cortex_prometheus_notifications_alertmanagers_discoveredgaugeins, instance, ip, user, job, clsThe number of alertmanagers discovered and active.
cortex_prometheus_notifications_dropped_totalUnknownins, instance, ip, user, job, clsN/A
cortex_prometheus_notifications_queue_capacitygaugeins, instance, ip, user, job, clsThe capacity of the alert notifications queue.
cortex_prometheus_notifications_queue_lengthgaugeins, instance, ip, user, job, clsThe number of alert notifications in the queue.
cortex_prometheus_rule_evaluation_duration_secondssummaryins, instance, ip, user, job, cls, quantileThe duration for a rule to execute.
cortex_prometheus_rule_evaluation_duration_seconds_countUnknownins, instance, ip, user, job, clsN/A
cortex_prometheus_rule_evaluation_duration_seconds_sumUnknownins, instance, ip, user, job, clsN/A
cortex_prometheus_rule_group_duration_secondssummaryins, instance, ip, user, job, cls, quantileThe duration of rule group evaluations.
cortex_prometheus_rule_group_duration_seconds_countUnknownins, instance, ip, user, job, clsN/A
cortex_prometheus_rule_group_duration_seconds_sumUnknownins, instance, ip, user, job, clsN/A
cortex_query_frontend_connected_schedulersgaugeins, instance, ip, job, clsNumber of schedulers this frontend is connected to.
cortex_query_frontend_queries_in_progressgaugeins, instance, ip, job, clsNumber of queries in progress handled by this frontend.
cortex_query_frontend_retries_bucketUnknownins, instance, ip, le, job, clsN/A
cortex_query_frontend_retries_countUnknownins, instance, ip, job, clsN/A
cortex_query_frontend_retries_sumUnknownins, instance, ip, job, clsN/A
cortex_query_scheduler_connected_frontend_clientsgaugeins, instance, ip, job, clsNumber of query-frontend worker clients currently connected to the query-scheduler.
cortex_query_scheduler_connected_querier_clientsgaugeins, instance, ip, job, clsNumber of querier worker clients currently connected to the query-scheduler.
cortex_query_scheduler_inflight_requestssummaryins, instance, ip, job, cls, quantileNumber of inflight requests (either queued or processing) sampled at a regular interval. Quantile buckets keep track of inflight requests over the last 60s.
cortex_query_scheduler_inflight_requests_countUnknownins, instance, ip, job, clsN/A
cortex_query_scheduler_inflight_requests_sumUnknownins, instance, ip, job, clsN/A
cortex_query_scheduler_queue_duration_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
cortex_query_scheduler_queue_duration_seconds_countUnknownins, instance, ip, job, clsN/A
cortex_query_scheduler_queue_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
cortex_query_scheduler_queue_lengthUnknownins, instance, ip, user, job, clsN/A
cortex_query_scheduler_runninggaugeins, instance, ip, job, clsValue will be 1 if the scheduler is in the ReplicationSet and actively receiving/processing requests
cortex_ring_member_heartbeats_totalUnknownins, instance, ip, job, clsN/A
cortex_ring_member_tokens_ownedgaugeins, instance, ip, job, clsThe number of tokens owned in the ring.
cortex_ring_member_tokens_to_owngaugeins, instance, ip, job, clsThe number of tokens to own in the ring.
cortex_ring_membersgaugeins, instance, ip, job, cls, stateNumber of members in the ring
cortex_ring_oldest_member_timestampgaugeins, instance, ip, job, cls, stateTimestamp of the oldest member in the ring.
cortex_ring_tokens_totalgaugeins, instance, ip, job, clsNumber of tokens in the ring
cortex_ruler_clientsgaugeins, instance, ip, job, clsThe current number of ruler clients in the pool.
cortex_ruler_config_last_reload_successfulgaugeins, instance, ip, user, job, clsBoolean set to 1 whenever the last configuration reload attempt was successful.
cortex_ruler_config_last_reload_successful_secondsgaugeins, instance, ip, user, job, clsTimestamp of the last successful configuration reload.
cortex_ruler_config_updates_totalUnknownins, instance, ip, user, job, clsN/A
cortex_ruler_managers_totalgaugeins, instance, ip, job, clsTotal number of managers registered and running in the ruler
cortex_ruler_ring_check_errors_totalUnknownins, instance, ip, job, clsN/A
cortex_ruler_sync_rules_totalUnknownins, instance, ip, reason, job, clsN/A
deprecated_flags_inuse_totalUnknownins, instance, ip, job, clsN/A
go_cgo_go_to_c_calls_calls_totalUnknownins, instance, ip, job, clsN/A
go_cpu_classes_gc_mark_assist_cpu_seconds_totalUnknownins, instance, ip, job, clsN/A
go_cpu_classes_gc_mark_dedicated_cpu_seconds_totalUnknownins, instance, ip, job, clsN/A
go_cpu_classes_gc_mark_idle_cpu_seconds_totalUnknownins, instance, ip, job, clsN/A
go_cpu_classes_gc_pause_cpu_seconds_totalUnknownins, instance, ip, job, clsN/A
go_cpu_classes_gc_total_cpu_seconds_totalUnknownins, instance, ip, job, clsN/A
go_cpu_classes_idle_cpu_seconds_totalUnknownins, instance, ip, job, clsN/A
go_cpu_classes_scavenge_assist_cpu_seconds_totalUnknownins, instance, ip, job, clsN/A
go_cpu_classes_scavenge_background_cpu_seconds_totalUnknownins, instance, ip, job, clsN/A
go_cpu_classes_scavenge_total_cpu_seconds_totalUnknownins, instance, ip, job, clsN/A
go_cpu_classes_total_cpu_seconds_totalUnknownins, instance, ip, job, clsN/A
go_cpu_classes_user_cpu_seconds_totalUnknownins, instance, ip, job, clsN/A
go_gc_cycles_automatic_gc_cycles_totalUnknownins, instance, ip, job, clsN/A
go_gc_cycles_forced_gc_cycles_totalUnknownins, instance, ip, job, clsN/A
go_gc_cycles_total_gc_cycles_totalUnknownins, instance, ip, job, clsN/A
go_gc_duration_secondssummaryins, instance, ip, job, cls, quantileA summary of the pause duration of garbage collection cycles.
go_gc_duration_seconds_countUnknownins, instance, ip, job, clsN/A
go_gc_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
go_gc_gogc_percentgaugeins, instance, ip, job, clsHeap size target percentage configured by the user, otherwise 100. This value is set by the GOGC environment variable, and the runtime/debug.SetGCPercent function.
go_gc_gomemlimit_bytesgaugeins, instance, ip, job, clsGo runtime memory limit configured by the user, otherwise math.MaxInt64. This value is set by the GOMEMLIMIT environment variable, and the runtime/debug.SetMemoryLimit function.
go_gc_heap_allocs_by_size_bytes_bucketUnknownins, instance, ip, le, job, clsN/A
go_gc_heap_allocs_by_size_bytes_countUnknownins, instance, ip, job, clsN/A
go_gc_heap_allocs_by_size_bytes_sumUnknownins, instance, ip, job, clsN/A
go_gc_heap_allocs_bytes_totalUnknownins, instance, ip, job, clsN/A
go_gc_heap_allocs_objects_totalUnknownins, instance, ip, job, clsN/A
go_gc_heap_frees_by_size_bytes_bucketUnknownins, instance, ip, le, job, clsN/A
go_gc_heap_frees_by_size_bytes_countUnknownins, instance, ip, job, clsN/A
go_gc_heap_frees_by_size_bytes_sumUnknownins, instance, ip, job, clsN/A
go_gc_heap_frees_bytes_totalUnknownins, instance, ip, job, clsN/A
go_gc_heap_frees_objects_totalUnknownins, instance, ip, job, clsN/A
go_gc_heap_goal_bytesgaugeins, instance, ip, job, clsHeap size target for the end of the GC cycle.
go_gc_heap_live_bytesgaugeins, instance, ip, job, clsHeap memory occupied by live objects that were marked by the previous GC.
go_gc_heap_objects_objectsgaugeins, instance, ip, job, clsNumber of objects, live or unswept, occupying heap memory.
go_gc_heap_tiny_allocs_objects_totalUnknownins, instance, ip, job, clsN/A
go_gc_limiter_last_enabled_gc_cyclegaugeins, instance, ip, job, clsGC cycle the last time the GC CPU limiter was enabled. This metric is useful for diagnosing the root cause of an out-of-memory error, because the limiter trades memory for CPU time when the GC’s CPU time gets too high. This is most likely to occur with use of SetMemoryLimit. The first GC cycle is cycle 1, so a value of 0 indicates that it was never enabled.
go_gc_pauses_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
go_gc_pauses_seconds_countUnknownins, instance, ip, job, clsN/A
go_gc_pauses_seconds_sumUnknownins, instance, ip, job, clsN/A
go_gc_scan_globals_bytesgaugeins, instance, ip, job, clsThe total amount of global variable space that is scannable.
go_gc_scan_heap_bytesgaugeins, instance, ip, job, clsThe total amount of heap space that is scannable.
go_gc_scan_stack_bytesgaugeins, instance, ip, job, clsThe number of bytes of stack that were scanned last GC cycle.
go_gc_scan_total_bytesgaugeins, instance, ip, job, clsThe total amount space that is scannable. Sum of all metrics in /gc/scan.
go_gc_stack_starting_size_bytesgaugeins, instance, ip, job, clsThe stack size of new goroutines.
go_godebug_non_default_behavior_execerrdot_events_totalUnknownins, instance, ip, job, clsN/A
go_godebug_non_default_behavior_gocachehash_events_totalUnknownins, instance, ip, job, clsN/A
go_godebug_non_default_behavior_gocachetest_events_totalUnknownins, instance, ip, job, clsN/A
go_godebug_non_default_behavior_gocacheverify_events_totalUnknownins, instance, ip, job, clsN/A
go_godebug_non_default_behavior_http2client_events_totalUnknownins, instance, ip, job, clsN/A
go_godebug_non_default_behavior_http2server_events_totalUnknownins, instance, ip, job, clsN/A
go_godebug_non_default_behavior_installgoroot_events_totalUnknownins, instance, ip, job, clsN/A
go_godebug_non_default_behavior_jstmpllitinterp_events_totalUnknownins, instance, ip, job, clsN/A
go_godebug_non_default_behavior_multipartmaxheaders_events_totalUnknownins, instance, ip, job, clsN/A
go_godebug_non_default_behavior_multipartmaxparts_events_totalUnknownins, instance, ip, job, clsN/A
go_godebug_non_default_behavior_multipathtcp_events_totalUnknownins, instance, ip, job, clsN/A
go_godebug_non_default_behavior_panicnil_events_totalUnknownins, instance, ip, job, clsN/A
go_godebug_non_default_behavior_randautoseed_events_totalUnknownins, instance, ip, job, clsN/A
go_godebug_non_default_behavior_tarinsecurepath_events_totalUnknownins, instance, ip, job, clsN/A
go_godebug_non_default_behavior_tlsmaxrsasize_events_totalUnknownins, instance, ip, job, clsN/A
go_godebug_non_default_behavior_x509sha1_events_totalUnknownins, instance, ip, job, clsN/A
go_godebug_non_default_behavior_x509usefallbackroots_events_totalUnknownins, instance, ip, job, clsN/A
go_godebug_non_default_behavior_zipinsecurepath_events_totalUnknownins, instance, ip, job, clsN/A
go_goroutinesgaugeins, instance, ip, job, clsNumber of goroutines that currently exist.
go_infogaugeversion, ins, instance, ip, job, clsInformation about the Go environment.
go_memory_classes_heap_free_bytesgaugeins, instance, ip, job, clsMemory that is completely free and eligible to be returned to the underlying system, but has not been. This metric is the runtime’s estimate of free address space that is backed by physical memory.
go_memory_classes_heap_objects_bytesgaugeins, instance, ip, job, clsMemory occupied by live objects and dead objects that have not yet been marked free by the garbage collector.
go_memory_classes_heap_released_bytesgaugeins, instance, ip, job, clsMemory that is completely free and has been returned to the underlying system. This metric is the runtime’s estimate of free address space that is still mapped into the process, but is not backed by physical memory.
go_memory_classes_heap_stacks_bytesgaugeins, instance, ip, job, clsMemory allocated from the heap that is reserved for stack space, whether or not it is currently in-use. Currently, this represents all stack memory for goroutines. It also includes all OS thread stacks in non-cgo programs. Note that stacks may be allocated differently in the future, and this may change.
go_memory_classes_heap_unused_bytesgaugeins, instance, ip, job, clsMemory that is reserved for heap objects but is not currently used to hold heap objects.
go_memory_classes_metadata_mcache_free_bytesgaugeins, instance, ip, job, clsMemory that is reserved for runtime mcache structures, but not in-use.
go_memory_classes_metadata_mcache_inuse_bytesgaugeins, instance, ip, job, clsMemory that is occupied by runtime mcache structures that are currently being used.
go_memory_classes_metadata_mspan_free_bytesgaugeins, instance, ip, job, clsMemory that is reserved for runtime mspan structures, but not in-use.
go_memory_classes_metadata_mspan_inuse_bytesgaugeins, instance, ip, job, clsMemory that is occupied by runtime mspan structures that are currently being used.
go_memory_classes_metadata_other_bytesgaugeins, instance, ip, job, clsMemory that is reserved for or used to hold runtime metadata.
go_memory_classes_os_stacks_bytesgaugeins, instance, ip, job, clsStack memory allocated by the underlying operating system. In non-cgo programs this metric is currently zero. This may change in the future.In cgo programs this metric includes OS thread stacks allocated directly from the OS. Currently, this only accounts for one stack in c-shared and c-archive build modes, and other sources of stacks from the OS are not measured. This too may change in the future.
go_memory_classes_other_bytesgaugeins, instance, ip, job, clsMemory used by execution trace buffers, structures for debugging the runtime, finalizer and profiler specials, and more.
go_memory_classes_profiling_buckets_bytesgaugeins, instance, ip, job, clsMemory that is used by the stack trace hash map used for profiling.
go_memory_classes_total_bytesgaugeins, instance, ip, job, clsAll memory mapped by the Go runtime into the current process as read-write. Note that this does not include memory mapped by code called via cgo or via the syscall package. Sum of all metrics in /memory/classes.
go_memstats_alloc_bytescounterins, instance, ip, job, clsTotal number of bytes allocated, even if freed.
go_memstats_alloc_bytes_totalcounterins, instance, ip, job, clsTotal number of bytes allocated, even if freed.
go_memstats_buck_hash_sys_bytesgaugeins, instance, ip, job, clsNumber of bytes used by the profiling bucket hash table.
go_memstats_frees_totalcounterins, instance, ip, job, clsTotal number of frees.
go_memstats_gc_sys_bytesgaugeins, instance, ip, job, clsNumber of bytes used for garbage collection system metadata.
go_memstats_heap_alloc_bytesgaugeins, instance, ip, job, clsNumber of heap bytes allocated and still in use.
go_memstats_heap_idle_bytesgaugeins, instance, ip, job, clsNumber of heap bytes waiting to be used.
go_memstats_heap_inuse_bytesgaugeins, instance, ip, job, clsNumber of heap bytes that are in use.
go_memstats_heap_objectsgaugeins, instance, ip, job, clsNumber of allocated objects.
go_memstats_heap_released_bytesgaugeins, instance, ip, job, clsNumber of heap bytes released to OS.
go_memstats_heap_sys_bytesgaugeins, instance, ip, job, clsNumber of heap bytes obtained from system.
go_memstats_last_gc_time_secondsgaugeins, instance, ip, job, clsNumber of seconds since 1970 of last garbage collection.
go_memstats_lookups_totalcounterins, instance, ip, job, clsTotal number of pointer lookups.
go_memstats_mallocs_totalcounterins, instance, ip, job, clsTotal number of mallocs.
go_memstats_mcache_inuse_bytesgaugeins, instance, ip, job, clsNumber of bytes in use by mcache structures.
go_memstats_mcache_sys_bytesgaugeins, instance, ip, job, clsNumber of bytes used for mcache structures obtained from system.
go_memstats_mspan_inuse_bytesgaugeins, instance, ip, job, clsNumber of bytes in use by mspan structures.
go_memstats_mspan_sys_bytesgaugeins, instance, ip, job, clsNumber of bytes used for mspan structures obtained from system.
go_memstats_next_gc_bytesgaugeins, instance, ip, job, clsNumber of heap bytes when next garbage collection will take place.
go_memstats_other_sys_bytesgaugeins, instance, ip, job, clsNumber of bytes used for other system allocations.
go_memstats_stack_inuse_bytesgaugeins, instance, ip, job, clsNumber of bytes in use by the stack allocator.
go_memstats_stack_sys_bytesgaugeins, instance, ip, job, clsNumber of bytes obtained from system for stack allocator.
go_memstats_sys_bytesgaugeins, instance, ip, job, clsNumber of bytes obtained from system.
go_sched_gomaxprocs_threadsgaugeins, instance, ip, job, clsThe current runtime.GOMAXPROCS setting, or the number of operating system threads that can execute user-level Go code simultaneously.
go_sched_goroutines_goroutinesgaugeins, instance, ip, job, clsCount of live goroutines.
go_sched_latencies_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
go_sched_latencies_seconds_countUnknownins, instance, ip, job, clsN/A
go_sched_latencies_seconds_sumUnknownins, instance, ip, job, clsN/A
go_sql_stats_connections_blocked_secondsunknownins, instance, db_name, ip, job, clsThe total time blocked waiting for a new connection.
go_sql_stats_connections_closed_max_idleunknownins, instance, db_name, ip, job, clsThe total number of connections closed due to SetMaxIdleConns.
go_sql_stats_connections_closed_max_idle_timeunknownins, instance, db_name, ip, job, clsThe total number of connections closed due to SetConnMaxIdleTime.
go_sql_stats_connections_closed_max_lifetimeunknownins, instance, db_name, ip, job, clsThe total number of connections closed due to SetConnMaxLifetime.
go_sql_stats_connections_idlegaugeins, instance, db_name, ip, job, clsThe number of idle connections.
go_sql_stats_connections_in_usegaugeins, instance, db_name, ip, job, clsThe number of connections currently in use.
go_sql_stats_connections_max_opengaugeins, instance, db_name, ip, job, clsMaximum number of open connections to the database.
go_sql_stats_connections_opengaugeins, instance, db_name, ip, job, clsThe number of established connections both in use and idle.
go_sql_stats_connections_waited_forunknownins, instance, db_name, ip, job, clsThe total number of connections waited for.
go_sync_mutex_wait_total_seconds_totalUnknownins, instance, ip, job, clsN/A
go_threadsgaugeins, instance, ip, job, clsNumber of OS threads created.
grafana_access_evaluation_countunknownins, instance, ip, job, clsnumber of evaluation calls
grafana_access_evaluation_duration_bucketUnknownins, instance, ip, le, job, clsN/A
grafana_access_evaluation_duration_countUnknownins, instance, ip, job, clsN/A
grafana_access_evaluation_duration_sumUnknownins, instance, ip, job, clsN/A
grafana_access_permissions_duration_bucketUnknownins, instance, ip, le, job, clsN/A
grafana_access_permissions_duration_countUnknownins, instance, ip, job, clsN/A
grafana_access_permissions_duration_sumUnknownins, instance, ip, job, clsN/A
grafana_aggregator_discovery_aggregation_count_totalUnknownins, instance, ip, job, clsN/A
grafana_alerting_active_alertsgaugeins, instance, ip, job, clsamount of active alerts
grafana_alerting_active_configurationsgaugeins, instance, ip, job, clsThe number of active Alertmanager configurations.
grafana_alerting_alertmanager_config_matchgaugeins, instance, ip, job, clsThe total number of match
grafana_alerting_alertmanager_config_match_regaugeins, instance, ip, job, clsThe total number of matchRE
grafana_alerting_alertmanager_config_matchersgaugeins, instance, ip, job, clsThe total number of matchers
grafana_alerting_alertmanager_config_object_matchersgaugeins, instance, ip, job, clsThe total number of object_matchers
grafana_alerting_discovered_configurationsgaugeins, instance, ip, job, clsThe number of organizations we’ve discovered that require an Alertmanager configuration.
grafana_alerting_dispatcher_aggregation_groupsgaugeins, instance, ip, job, clsNumber of active aggregation groups
grafana_alerting_dispatcher_alert_processing_duration_seconds_countUnknownins, instance, ip, job, clsN/A
grafana_alerting_dispatcher_alert_processing_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
grafana_alerting_execution_time_millisecondssummaryins, instance, ip, job, cls, quantilesummary of alert execution duration
grafana_alerting_execution_time_milliseconds_countUnknownins, instance, ip, job, clsN/A
grafana_alerting_execution_time_milliseconds_sumUnknownins, instance, ip, job, clsN/A
grafana_alerting_nflog_gc_duration_seconds_countUnknownins, instance, ip, job, clsN/A
grafana_alerting_nflog_gc_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
grafana_alerting_nflog_gossip_messages_propagated_totalUnknownins, instance, ip, job, clsN/A
grafana_alerting_nflog_queries_totalUnknownins, instance, ip, job, clsN/A
grafana_alerting_nflog_query_duration_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
grafana_alerting_nflog_query_duration_seconds_countUnknownins, instance, ip, job, clsN/A
grafana_alerting_nflog_query_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
grafana_alerting_nflog_query_errors_totalUnknownins, instance, ip, job, clsN/A
grafana_alerting_nflog_snapshot_duration_seconds_countUnknownins, instance, ip, job, clsN/A
grafana_alerting_nflog_snapshot_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
grafana_alerting_nflog_snapshot_size_bytesgaugeins, instance, ip, job, clsSize of the last notification log snapshot in bytes.
grafana_alerting_notification_latency_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
grafana_alerting_notification_latency_seconds_countUnknownins, instance, ip, job, clsN/A
grafana_alerting_notification_latency_seconds_sumUnknownins, instance, ip, job, clsN/A
grafana_alerting_schedule_alert_rulesgaugeins, instance, ip, job, clsThe number of alert rules that could be considered for evaluation at the next tick.
grafana_alerting_schedule_alert_rules_hashgaugeins, instance, ip, job, clsA hash of the alert rules that could be considered for evaluation at the next tick.
grafana_alerting_schedule_periodic_duration_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
grafana_alerting_schedule_periodic_duration_seconds_countUnknownins, instance, ip, job, clsN/A
grafana_alerting_schedule_periodic_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
grafana_alerting_schedule_query_alert_rules_duration_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
grafana_alerting_schedule_query_alert_rules_duration_seconds_countUnknownins, instance, ip, job, clsN/A
grafana_alerting_schedule_query_alert_rules_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
grafana_alerting_scheduler_behind_secondsgaugeins, instance, ip, job, clsThe total number of seconds the scheduler is behind.
grafana_alerting_silences_gc_duration_seconds_countUnknownins, instance, ip, job, clsN/A
grafana_alerting_silences_gc_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
grafana_alerting_silences_gossip_messages_propagated_totalUnknownins, instance, ip, job, clsN/A
grafana_alerting_silences_queries_totalUnknownins, instance, ip, job, clsN/A
grafana_alerting_silences_query_duration_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
grafana_alerting_silences_query_duration_seconds_countUnknownins, instance, ip, job, clsN/A
grafana_alerting_silences_query_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
grafana_alerting_silences_query_errors_totalUnknownins, instance, ip, job, clsN/A
grafana_alerting_silences_snapshot_duration_seconds_countUnknownins, instance, ip, job, clsN/A
grafana_alerting_silences_snapshot_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
grafana_alerting_silences_snapshot_size_bytesgaugeins, instance, ip, job, clsSize of the last silence snapshot in bytes.
grafana_alerting_state_calculation_duration_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
grafana_alerting_state_calculation_duration_seconds_countUnknownins, instance, ip, job, clsN/A
grafana_alerting_state_calculation_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
grafana_alerting_state_history_writes_bytes_totalUnknownins, instance, ip, job, clsN/A
grafana_alerting_ticker_interval_secondsgaugeins, instance, ip, job, clsInterval at which the ticker is meant to tick.
grafana_alerting_ticker_last_consumed_tick_timestamp_secondsgaugeins, instance, ip, job, clsTimestamp of the last consumed tick in seconds.
grafana_alerting_ticker_next_tick_timestamp_secondsgaugeins, instance, ip, job, clsTimestamp of the next tick in seconds before it is consumed.
grafana_api_admin_user_created_totalUnknownins, instance, ip, job, clsN/A
grafana_api_dashboard_get_millisecondssummaryins, instance, ip, job, cls, quantilesummary for dashboard get duration
grafana_api_dashboard_get_milliseconds_countUnknownins, instance, ip, job, clsN/A
grafana_api_dashboard_get_milliseconds_sumUnknownins, instance, ip, job, clsN/A
grafana_api_dashboard_save_millisecondssummaryins, instance, ip, job, cls, quantilesummary for dashboard save duration
grafana_api_dashboard_save_milliseconds_countUnknownins, instance, ip, job, clsN/A
grafana_api_dashboard_save_milliseconds_sumUnknownins, instance, ip, job, clsN/A
grafana_api_dashboard_search_millisecondssummaryins, instance, ip, job, cls, quantilesummary for dashboard search duration
grafana_api_dashboard_search_milliseconds_countUnknownins, instance, ip, job, clsN/A
grafana_api_dashboard_search_milliseconds_sumUnknownins, instance, ip, job, clsN/A
grafana_api_dashboard_snapshot_create_totalUnknownins, instance, ip, job, clsN/A
grafana_api_dashboard_snapshot_external_totalUnknownins, instance, ip, job, clsN/A
grafana_api_dashboard_snapshot_get_totalUnknownins, instance, ip, job, clsN/A
grafana_api_dataproxy_request_all_millisecondssummaryins, instance, ip, job, cls, quantilesummary for dataproxy request duration
grafana_api_dataproxy_request_all_milliseconds_countUnknownins, instance, ip, job, clsN/A
grafana_api_dataproxy_request_all_milliseconds_sumUnknownins, instance, ip, job, clsN/A
grafana_api_login_oauth_totalUnknownins, instance, ip, job, clsN/A
grafana_api_login_post_totalUnknownins, instance, ip, job, clsN/A
grafana_api_login_saml_totalUnknownins, instance, ip, job, clsN/A
grafana_api_models_dashboard_insert_totalUnknownins, instance, ip, job, clsN/A
grafana_api_org_create_totalUnknownins, instance, ip, job, clsN/A
grafana_api_response_status_totalUnknownins, instance, ip, job, cls, codeN/A
grafana_api_user_signup_completed_totalUnknownins, instance, ip, job, clsN/A
grafana_api_user_signup_invite_totalUnknownins, instance, ip, job, clsN/A
grafana_api_user_signup_started_totalUnknownins, instance, ip, job, clsN/A
grafana_apiserver_audit_event_totalUnknownins, instance, ip, job, clsN/A
grafana_apiserver_audit_requests_rejected_totalUnknownins, instance, ip, job, clsN/A
grafana_apiserver_client_certificate_expiration_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
grafana_apiserver_client_certificate_expiration_seconds_countUnknownins, instance, ip, job, clsN/A
grafana_apiserver_client_certificate_expiration_seconds_sumUnknownins, instance, ip, job, clsN/A
grafana_apiserver_envelope_encryption_dek_cache_fill_percentgaugeins, instance, ip, job, cls[ALPHA] Percent of the cache slots currently occupied by cached DEKs.
grafana_apiserver_flowcontrol_seat_fair_fracgaugeins, instance, ip, job, cls[ALPHA] Fair fraction of server’s concurrency to allocate to each priority level that can use it
grafana_apiserver_storage_data_key_generation_duration_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
grafana_apiserver_storage_data_key_generation_duration_seconds_countUnknownins, instance, ip, job, clsN/A
grafana_apiserver_storage_data_key_generation_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
grafana_apiserver_storage_data_key_generation_failures_totalUnknownins, instance, ip, job, clsN/A
grafana_apiserver_storage_envelope_transformation_cache_misses_totalUnknownins, instance, ip, job, clsN/A
grafana_apiserver_tls_handshake_errors_totalUnknownins, instance, ip, job, clsN/A
grafana_apiserver_webhooks_x509_insecure_sha1_totalUnknownins, instance, ip, job, clsN/A
grafana_apiserver_webhooks_x509_missing_san_totalUnknownins, instance, ip, job, clsN/A
grafana_authn_authn_failed_authentication_totalUnknownins, instance, ip, job, clsN/A
grafana_authn_authn_successful_authentication_totalUnknownins, instance, ip, client, job, clsN/A
grafana_authn_authn_successful_login_totalUnknownins, instance, ip, client, job, clsN/A
grafana_aws_cloudwatch_get_metric_data_totalUnknownins, instance, ip, job, clsN/A
grafana_aws_cloudwatch_get_metric_statistics_totalUnknownins, instance, ip, job, clsN/A
grafana_aws_cloudwatch_list_metrics_totalUnknownins, instance, ip, job, clsN/A
grafana_build_infogaugerevision, version, ins, instance, edition, ip, goversion, job, cls, branchA metric with a constant ‘1’ value labeled by version, revision, branch, and goversion from which Grafana was built
grafana_build_timestampgaugerevision, version, ins, instance, edition, ip, goversion, job, cls, branchA metric exposing when the binary was built in epoch
grafana_cardinality_enforcement_unexpected_categorizations_totalUnknownins, instance, ip, job, clsN/A
grafana_database_conn_idlegaugeins, instance, ip, job, clsThe number of idle connections
grafana_database_conn_in_usegaugeins, instance, ip, job, clsThe number of connections currently in use
grafana_database_conn_max_idle_closed_secondsunknownins, instance, ip, job, clsThe total number of connections closed due to SetConnMaxIdleTime
grafana_database_conn_max_idle_closed_totalUnknownins, instance, ip, job, clsN/A
grafana_database_conn_max_lifetime_closed_totalUnknownins, instance, ip, job, clsN/A
grafana_database_conn_max_opengaugeins, instance, ip, job, clsMaximum number of open connections to the database
grafana_database_conn_opengaugeins, instance, ip, job, clsThe number of established connections both in use and idle
grafana_database_conn_wait_count_totalUnknownins, instance, ip, job, clsN/A
grafana_database_conn_wait_duration_secondsunknownins, instance, ip, job, clsThe total time blocked waiting for a new connection
grafana_datasource_request_duration_seconds_bucketUnknowndatasource, ins, instance, method, ip, le, datasource_type, job, cls, codeN/A
grafana_datasource_request_duration_seconds_countUnknowndatasource, ins, instance, method, ip, datasource_type, job, cls, codeN/A
grafana_datasource_request_duration_seconds_sumUnknowndatasource, ins, instance, method, ip, datasource_type, job, cls, codeN/A
grafana_datasource_request_in_flightgaugedatasource, ins, instance, ip, datasource_type, job, clsA gauge of outgoing data source requests currently being sent by Grafana
grafana_datasource_request_totalUnknowndatasource, ins, instance, method, ip, datasource_type, job, cls, codeN/A
grafana_datasource_response_size_bytes_bucketUnknowndatasource, ins, instance, ip, le, datasource_type, job, clsN/A
grafana_datasource_response_size_bytes_countUnknowndatasource, ins, instance, ip, datasource_type, job, clsN/A
grafana_datasource_response_size_bytes_sumUnknowndatasource, ins, instance, ip, datasource_type, job, clsN/A
grafana_db_datasource_query_by_id_totalUnknownins, instance, ip, job, clsN/A
grafana_disabled_metrics_totalUnknownins, instance, ip, job, clsN/A
grafana_emails_sent_failedunknownins, instance, ip, job, clsNumber of emails Grafana failed to send
grafana_emails_sent_totalUnknownins, instance, ip, job, clsN/A
grafana_encryption_cache_reads_totalUnknownins, instance, method, ip, hit, job, clsN/A
grafana_encryption_ops_totalUnknownins, instance, ip, success, operation, job, clsN/A
grafana_environment_infogaugeversion, ins, instance, ip, job, cls, commitA metric with a constant ‘1’ value labeled by environment information about the running instance.
grafana_feature_toggles_infogaugeins, instance, ip, job, clsinfo metric that exposes what feature toggles are enabled or not
grafana_frontend_boot_css_time_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
grafana_frontend_boot_css_time_seconds_countUnknownins, instance, ip, job, clsN/A
grafana_frontend_boot_css_time_seconds_sumUnknownins, instance, ip, job, clsN/A
grafana_frontend_boot_first_contentful_paint_time_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
grafana_frontend_boot_first_contentful_paint_time_seconds_countUnknownins, instance, ip, job, clsN/A
grafana_frontend_boot_first_contentful_paint_time_seconds_sumUnknownins, instance, ip, job, clsN/A
grafana_frontend_boot_first_paint_time_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
grafana_frontend_boot_first_paint_time_seconds_countUnknownins, instance, ip, job, clsN/A
grafana_frontend_boot_first_paint_time_seconds_sumUnknownins, instance, ip, job, clsN/A
grafana_frontend_boot_js_done_time_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
grafana_frontend_boot_js_done_time_seconds_countUnknownins, instance, ip, job, clsN/A
grafana_frontend_boot_js_done_time_seconds_sumUnknownins, instance, ip, job, clsN/A
grafana_frontend_boot_load_time_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
grafana_frontend_boot_load_time_seconds_countUnknownins, instance, ip, job, clsN/A
grafana_frontend_boot_load_time_seconds_sumUnknownins, instance, ip, job, clsN/A
grafana_frontend_plugins_preload_ms_bucketUnknownins, instance, ip, le, job, clsN/A
grafana_frontend_plugins_preload_ms_countUnknownins, instance, ip, job, clsN/A
grafana_frontend_plugins_preload_ms_sumUnknownins, instance, ip, job, clsN/A
grafana_hidden_metrics_totalUnknownins, instance, ip, job, clsN/A
grafana_http_request_duration_seconds_bucketUnknownins, instance, method, ip, le, job, cls, status_code, handlerN/A
grafana_http_request_duration_seconds_countUnknownins, instance, method, ip, job, cls, status_code, handlerN/A
grafana_http_request_duration_seconds_sumUnknownins, instance, method, ip, job, cls, status_code, handlerN/A
grafana_http_request_in_flightgaugeins, instance, ip, job, clsA gauge of requests currently being served by Grafana.
grafana_idforwarding_idforwarding_failed_token_signing_totalUnknownins, instance, ip, job, clsN/A
grafana_idforwarding_idforwarding_token_signing_duration_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
grafana_idforwarding_idforwarding_token_signing_duration_seconds_countUnknownins, instance, ip, job, clsN/A
grafana_idforwarding_idforwarding_token_signing_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
grafana_idforwarding_idforwarding_token_signing_from_cache_totalUnknownins, instance, ip, job, clsN/A
grafana_idforwarding_idforwarding_token_signing_totalUnknownins, instance, ip, job, clsN/A
grafana_instance_start_totalUnknownins, instance, ip, job, clsN/A
grafana_ldap_users_sync_execution_timesummaryins, instance, ip, job, cls, quantilesummary for LDAP users sync execution duration
grafana_ldap_users_sync_execution_time_countUnknownins, instance, ip, job, clsN/A
grafana_ldap_users_sync_execution_time_sumUnknownins, instance, ip, job, clsN/A
grafana_live_client_command_duration_secondssummaryins, instance, method, ip, job, cls, quantileClient command duration summary.
grafana_live_client_command_duration_seconds_countUnknownins, instance, method, ip, job, clsN/A
grafana_live_client_command_duration_seconds_sumUnknownins, instance, method, ip, job, clsN/A
grafana_live_client_num_reply_errorsunknownins, instance, method, ip, job, cls, codeNumber of errors in replies sent to clients.
grafana_live_client_num_server_disconnectsunknownins, instance, ip, job, cls, codeNumber of server initiated disconnects.
grafana_live_client_recoverunknownins, instance, ip, recovered, job, clsCount of recover operations.
grafana_live_node_action_countunknownaction, ins, instance, ip, job, clsNumber of node actions called.
grafana_live_node_buildgaugeversion, ins, instance, ip, job, clsNode build info.
grafana_live_node_messages_received_countunknownins, instance, ip, type, job, clsNumber of messages received.
grafana_live_node_messages_sent_countunknownins, instance, ip, type, job, clsNumber of messages sent.
grafana_live_node_num_channelsgaugeins, instance, ip, job, clsNumber of channels with one or more subscribers.
grafana_live_node_num_clientsgaugeins, instance, ip, job, clsNumber of clients connected.
grafana_live_node_num_nodesgaugeins, instance, ip, job, clsNumber of nodes in cluster.
grafana_live_node_num_subscriptionsgaugeins, instance, ip, job, clsNumber of subscriptions.
grafana_live_node_num_usersgaugeins, instance, ip, job, clsNumber of unique users connected.
grafana_live_transport_connect_countunknownins, instance, ip, transport, job, clsNumber of connections to specific transport.
grafana_live_transport_messages_sentunknownins, instance, ip, transport, job, clsNumber of messages sent over specific transport.
grafana_loki_plugin_parse_response_duration_seconds_bucketUnknownendpoint, ins, instance, ip, le, status, job, clsN/A
grafana_loki_plugin_parse_response_duration_seconds_countUnknownendpoint, ins, instance, ip, status, job, clsN/A
grafana_loki_plugin_parse_response_duration_seconds_sumUnknownendpoint, ins, instance, ip, status, job, clsN/A
grafana_page_response_status_totalUnknownins, instance, ip, job, cls, codeN/A
grafana_plugin_build_infogaugeversion, signature_status, ins, instance, plugin_type, ip, plugin_id, job, clsA metric with a constant ‘1’ value labeled by pluginId, pluginType and version from which Grafana plugin was built
grafana_plugin_request_duration_milliseconds_bucketUnknownendpoint, ins, instance, target, ip, le, plugin_id, job, clsN/A
grafana_plugin_request_duration_milliseconds_countUnknownendpoint, ins, instance, target, ip, plugin_id, job, clsN/A
grafana_plugin_request_duration_milliseconds_sumUnknownendpoint, ins, instance, target, ip, plugin_id, job, clsN/A
grafana_plugin_request_duration_seconds_bucketUnknownendpoint, ins, instance, target, ip, le, status, plugin_id, source, job, clsN/A
grafana_plugin_request_duration_seconds_countUnknownendpoint, ins, instance, target, ip, status, plugin_id, source, job, clsN/A
grafana_plugin_request_duration_seconds_sumUnknownendpoint, ins, instance, target, ip, status, plugin_id, source, job, clsN/A
grafana_plugin_request_size_bytes_bucketUnknownendpoint, ins, instance, target, ip, le, plugin_id, source, job, clsN/A
grafana_plugin_request_size_bytes_countUnknownendpoint, ins, instance, target, ip, plugin_id, source, job, clsN/A
grafana_plugin_request_size_bytes_sumUnknownendpoint, ins, instance, target, ip, plugin_id, source, job, clsN/A
grafana_plugin_request_totalUnknownendpoint, ins, instance, target, ip, status, plugin_id, job, clsN/A
grafana_process_cpu_seconds_totalUnknownins, instance, ip, job, clsN/A
grafana_process_max_fdsgaugeins, instance, ip, job, clsMaximum number of open file descriptors.
grafana_process_open_fdsgaugeins, instance, ip, job, clsNumber of open file descriptors.
grafana_process_resident_memory_bytesgaugeins, instance, ip, job, clsResident memory size in bytes.
grafana_process_start_time_secondsgaugeins, instance, ip, job, clsStart time of the process since unix epoch in seconds.
grafana_process_virtual_memory_bytesgaugeins, instance, ip, job, clsVirtual memory size in bytes.
grafana_process_virtual_memory_max_bytesgaugeins, instance, ip, job, clsMaximum amount of virtual memory available in bytes.
grafana_prometheus_plugin_backend_request_countunknownendpoint, ins, instance, ip, status, errorSource, job, clsThe total amount of prometheus backend plugin requests
grafana_proxy_response_status_totalUnknownins, instance, ip, job, cls, codeN/A
grafana_public_dashboard_request_countunknownins, instance, ip, job, clscounter for public dashboards requests
grafana_registered_metrics_totalUnknownins, instance, ip, stability_level, deprecated_version, job, clsN/A
grafana_rendering_queue_sizegaugeins, instance, ip, job, clssize of rendering queue
grafana_search_dashboard_search_failures_duration_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
grafana_search_dashboard_search_failures_duration_seconds_countUnknownins, instance, ip, job, clsN/A
grafana_search_dashboard_search_failures_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
grafana_search_dashboard_search_successes_duration_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
grafana_search_dashboard_search_successes_duration_seconds_countUnknownins, instance, ip, job, clsN/A
grafana_search_dashboard_search_successes_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
grafana_stat_active_usersgaugeins, instance, ip, job, clsnumber of active users
grafana_stat_total_orgsgaugeins, instance, ip, job, clstotal amount of orgs
grafana_stat_total_playlistsgaugeins, instance, ip, job, clstotal amount of playlists
grafana_stat_total_service_account_tokensgaugeins, instance, ip, job, clstotal amount of service account tokens
grafana_stat_total_service_accountsgaugeins, instance, ip, job, clstotal amount of service accounts
grafana_stat_total_service_accounts_role_nonegaugeins, instance, ip, job, clstotal amount of service accounts with no role
grafana_stat_total_teamsgaugeins, instance, ip, job, clstotal amount of teams
grafana_stat_total_usersgaugeins, instance, ip, job, clstotal amount of users
grafana_stat_totals_active_adminsgaugeins, instance, ip, job, clstotal amount of active admins
grafana_stat_totals_active_editorsgaugeins, instance, ip, job, clstotal amount of active editors
grafana_stat_totals_active_viewersgaugeins, instance, ip, job, clstotal amount of active viewers
grafana_stat_totals_adminsgaugeins, instance, ip, job, clstotal amount of admins
grafana_stat_totals_alert_rulesgaugeins, instance, ip, job, clstotal amount of alert rules in the database
grafana_stat_totals_annotationsgaugeins, instance, ip, job, clstotal amount of annotations in the database
grafana_stat_totals_correlationsgaugeins, instance, ip, job, clstotal amount of correlations
grafana_stat_totals_dashboardgaugeins, instance, ip, job, clstotal amount of dashboards
grafana_stat_totals_dashboard_versionsgaugeins, instance, ip, job, clstotal amount of dashboard versions in the database
grafana_stat_totals_data_keysgaugeins, instance, ip, job, cls, activetotal amount of data keys in the database
grafana_stat_totals_datasourcegaugeins, instance, ip, plugin_id, job, clstotal number of defined datasources, labeled by pluginId
grafana_stat_totals_editorsgaugeins, instance, ip, job, clstotal amount of editors
grafana_stat_totals_foldergaugeins, instance, ip, job, clstotal amount of folders
grafana_stat_totals_library_panelsgaugeins, instance, ip, job, clstotal amount of library panels in the database
grafana_stat_totals_library_variablesgaugeins, instance, ip, job, clstotal amount of library variables in the database
grafana_stat_totals_public_dashboardgaugeins, instance, ip, job, clstotal amount of public dashboards
grafana_stat_totals_rule_groupsgaugeins, instance, ip, job, clstotal amount of alert rule groups in the database
grafana_stat_totals_viewersgaugeins, instance, ip, job, clstotal amount of viewers
infra_upUnknownins, instance, ip, job, clsN/A
jaeger_tracer_baggage_restrictions_updates_totalUnknownresult, ins, instance, ip, job, clsN/A
jaeger_tracer_baggage_truncations_totalUnknownins, instance, ip, job, clsN/A
jaeger_tracer_baggage_updates_totalUnknownresult, ins, instance, ip, job, clsN/A
jaeger_tracer_finished_spans_totalUnknownins, instance, ip, sampled, job, clsN/A
jaeger_tracer_reporter_queue_lengthgaugeins, instance, ip, job, clsCurrent number of spans in the reporter queue
jaeger_tracer_reporter_spans_totalUnknownresult, ins, instance, ip, job, clsN/A
jaeger_tracer_sampler_queries_totalUnknownresult, ins, instance, ip, job, clsN/A
jaeger_tracer_sampler_updates_totalUnknownresult, ins, instance, ip, job, clsN/A
jaeger_tracer_span_context_decoding_errors_totalUnknownins, instance, ip, job, clsN/A
jaeger_tracer_started_spans_totalUnknownins, instance, ip, sampled, job, clsN/A
jaeger_tracer_throttled_debug_spans_totalUnknownins, instance, ip, job, clsN/A
jaeger_tracer_throttler_updates_totalUnknownresult, ins, instance, ip, job, clsN/A
jaeger_tracer_traces_totalUnknownins, instance, ip, sampled, job, cls, stateN/A
kv_request_duration_seconds_bucketUnknownins, instance, role, ip, le, kv_name, type, operation, job, cls, status_codeN/A
kv_request_duration_seconds_countUnknownins, instance, role, ip, kv_name, type, operation, job, cls, status_codeN/A
kv_request_duration_seconds_sumUnknownins, instance, role, ip, kv_name, type, operation, job, cls, status_codeN/A
legacy_grafana_alerting_ticker_interval_secondsgaugeins, instance, ip, job, clsInterval at which the ticker is meant to tick.
legacy_grafana_alerting_ticker_last_consumed_tick_timestamp_secondsgaugeins, instance, ip, job, clsTimestamp of the last consumed tick in seconds.
legacy_grafana_alerting_ticker_next_tick_timestamp_secondsgaugeins, instance, ip, job, clsTimestamp of the next tick in seconds before it is consumed.
logql_query_duration_seconds_bucketUnknownins, instance, query_type, ip, le, job, clsN/A
logql_query_duration_seconds_countUnknownins, instance, query_type, ip, job, clsN/A
logql_query_duration_seconds_sumUnknownins, instance, query_type, ip, job, clsN/A
loki_azure_blob_egress_bytes_totalUnknownins, instance, ip, job, clsN/A
loki_boltdb_shipper_apply_retention_last_successful_run_timestamp_secondsgaugeins, instance, ip, job, clsUnix timestamp of the last successful retention run
loki_boltdb_shipper_compact_tables_operation_duration_secondsgaugeins, instance, ip, job, clsTime (in seconds) spent in compacting all the tables
loki_boltdb_shipper_compact_tables_operation_last_successful_run_timestamp_secondsgaugeins, instance, ip, job, clsUnix timestamp of the last successful compaction run
loki_boltdb_shipper_compact_tables_operation_totalUnknownins, instance, ip, status, job, clsN/A
loki_boltdb_shipper_compactor_runninggaugeins, instance, ip, job, clsValue will be 1 if compactor is currently running on this instance
loki_boltdb_shipper_open_existing_file_failures_totalUnknownins, instance, ip, component, job, clsN/A
loki_boltdb_shipper_query_time_table_download_duration_secondsunknownins, instance, ip, component, job, cls, tableTime (in seconds) spent in downloading of files per table at query time
loki_boltdb_shipper_request_duration_seconds_bucketUnknownins, instance, ip, le, component, operation, job, cls, status_codeN/A
loki_boltdb_shipper_request_duration_seconds_countUnknownins, instance, ip, component, operation, job, cls, status_codeN/A
loki_boltdb_shipper_request_duration_seconds_sumUnknownins, instance, ip, component, operation, job, cls, status_codeN/A
loki_boltdb_shipper_tables_download_operation_duration_secondsgaugeins, instance, ip, component, job, clsTime (in seconds) spent in downloading updated files for all the tables
loki_boltdb_shipper_tables_sync_operation_totalUnknownins, instance, ip, status, component, job, clsN/A
loki_boltdb_shipper_tables_upload_operation_totalUnknownins, instance, ip, status, component, job, clsN/A
loki_build_infogaugerevision, version, ins, instance, ip, tags, goarch, goversion, job, cls, branch, goosA metric with a constant ‘1’ value labeled by version, revision, branch, goversion from which loki was built, and the goos and goarch for the build.
loki_bytes_per_line_bucketUnknownins, instance, ip, le, job, clsN/A
loki_bytes_per_line_countUnknownins, instance, ip, job, clsN/A
loki_bytes_per_line_sumUnknownins, instance, ip, job, clsN/A
loki_cache_corrupt_chunks_totalUnknownins, instance, ip, job, clsN/A
loki_cache_fetched_keysunknownins, instance, ip, job, clsTotal count of keys requested from cache.
loki_cache_hitsunknownins, instance, ip, job, clsTotal count of keys found in cache.
loki_cache_request_duration_seconds_bucketUnknownins, instance, method, ip, le, job, cls, status_codeN/A
loki_cache_request_duration_seconds_countUnknownins, instance, method, ip, job, cls, status_codeN/A
loki_cache_request_duration_seconds_sumUnknownins, instance, method, ip, job, cls, status_codeN/A
loki_cache_value_size_bytes_bucketUnknownins, instance, method, ip, le, job, clsN/A
loki_cache_value_size_bytes_countUnknownins, instance, method, ip, job, clsN/A
loki_cache_value_size_bytes_sumUnknownins, instance, method, ip, job, clsN/A
loki_chunk_fetcher_cache_dequeued_totalUnknownins, instance, ip, job, clsN/A
loki_chunk_fetcher_cache_enqueued_totalUnknownins, instance, ip, job, clsN/A
loki_chunk_fetcher_cache_skipped_buffer_full_totalUnknownins, instance, ip, job, clsN/A
loki_chunk_fetcher_fetched_size_bytes_bucketUnknownins, instance, ip, le, source, job, clsN/A
loki_chunk_fetcher_fetched_size_bytes_countUnknownins, instance, ip, source, job, clsN/A
loki_chunk_fetcher_fetched_size_bytes_sumUnknownins, instance, ip, source, job, clsN/A
loki_chunk_store_chunks_per_query_bucketUnknownins, instance, ip, le, job, clsN/A
loki_chunk_store_chunks_per_query_countUnknownins, instance, ip, job, clsN/A
loki_chunk_store_chunks_per_query_sumUnknownins, instance, ip, job, clsN/A
loki_chunk_store_deduped_bytes_totalUnknownins, instance, ip, job, clsN/A
loki_chunk_store_deduped_chunks_totalUnknownins, instance, ip, job, clsN/A
loki_chunk_store_fetched_chunk_bytes_totalUnknownins, instance, ip, user, job, clsN/A
loki_chunk_store_fetched_chunks_totalUnknownins, instance, ip, user, job, clsN/A
loki_chunk_store_index_entries_per_chunk_bucketUnknownins, instance, ip, le, job, clsN/A
loki_chunk_store_index_entries_per_chunk_countUnknownins, instance, ip, job, clsN/A
loki_chunk_store_index_entries_per_chunk_sumUnknownins, instance, ip, job, clsN/A
loki_chunk_store_index_lookups_per_query_bucketUnknownins, instance, ip, le, job, clsN/A
loki_chunk_store_index_lookups_per_query_countUnknownins, instance, ip, job, clsN/A
loki_chunk_store_index_lookups_per_query_sumUnknownins, instance, ip, job, clsN/A
loki_chunk_store_series_post_intersection_per_query_bucketUnknownins, instance, ip, le, job, clsN/A
loki_chunk_store_series_post_intersection_per_query_countUnknownins, instance, ip, job, clsN/A
loki_chunk_store_series_post_intersection_per_query_sumUnknownins, instance, ip, job, clsN/A
loki_chunk_store_series_pre_intersection_per_query_bucketUnknownins, instance, ip, le, job, clsN/A
loki_chunk_store_series_pre_intersection_per_query_countUnknownins, instance, ip, job, clsN/A
loki_chunk_store_series_pre_intersection_per_query_sumUnknownins, instance, ip, job, clsN/A
loki_chunk_store_stored_chunk_bytes_totalUnknownins, instance, ip, user, job, clsN/A
loki_chunk_store_stored_chunks_totalUnknownins, instance, ip, user, job, clsN/A
loki_consul_request_duration_seconds_bucketUnknownins, instance, ip, le, kv_name, operation, job, cls, status_codeN/A
loki_consul_request_duration_seconds_countUnknownins, instance, ip, kv_name, operation, job, cls, status_codeN/A
loki_consul_request_duration_seconds_sumUnknownins, instance, ip, kv_name, operation, job, cls, status_codeN/A
loki_delete_request_lookups_failed_totalUnknownins, instance, ip, job, clsN/A
loki_delete_request_lookups_totalUnknownins, instance, ip, job, clsN/A
loki_discarded_bytes_totalUnknownins, instance, ip, reason, job, cls, tenantN/A
loki_discarded_samples_totalUnknownins, instance, ip, reason, job, cls, tenantN/A
loki_distributor_bytes_received_totalUnknownins, instance, retention_hours, ip, job, cls, tenantN/A
loki_distributor_ingester_appends_totalUnknownins, instance, ip, ingester, job, clsN/A
loki_distributor_lines_received_totalUnknownins, instance, ip, job, cls, tenantN/A
loki_distributor_replication_factorgaugeins, instance, ip, job, clsThe configured replication factor.
loki_distributor_structured_metadata_bytes_received_totalUnknownins, instance, retention_hours, ip, job, cls, tenantN/A
loki_experimental_features_in_use_totalUnknownins, instance, ip, job, clsN/A
loki_index_chunk_refs_totalUnknownins, instance, ip, status, job, clsN/A
loki_index_request_duration_seconds_bucketUnknownins, instance, ip, le, component, operation, job, cls, status_codeN/A
loki_index_request_duration_seconds_countUnknownins, instance, ip, component, operation, job, cls, status_codeN/A
loki_index_request_duration_seconds_sumUnknownins, instance, ip, component, operation, job, cls, status_codeN/A
loki_inflight_requestsgaugeins, instance, method, ip, route, job, clsCurrent number of inflight requests.
loki_ingester_autoforget_unhealthy_ingesters_totalUnknownins, instance, ip, job, clsN/A
loki_ingester_blocks_per_chunk_bucketUnknownins, instance, ip, le, job, clsN/A
loki_ingester_blocks_per_chunk_countUnknownins, instance, ip, job, clsN/A
loki_ingester_blocks_per_chunk_sumUnknownins, instance, ip, job, clsN/A
loki_ingester_checkpoint_creations_failed_totalUnknownins, instance, ip, job, clsN/A
loki_ingester_checkpoint_creations_totalUnknownins, instance, ip, job, clsN/A
loki_ingester_checkpoint_deletions_failed_totalUnknownins, instance, ip, job, clsN/A
loki_ingester_checkpoint_deletions_totalUnknownins, instance, ip, job, clsN/A
loki_ingester_checkpoint_duration_secondssummaryins, instance, ip, job, cls, quantileTime taken to create a checkpoint.
loki_ingester_checkpoint_duration_seconds_countUnknownins, instance, ip, job, clsN/A
loki_ingester_checkpoint_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
loki_ingester_checkpoint_logged_bytes_totalUnknownins, instance, ip, job, clsN/A
loki_ingester_chunk_age_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
loki_ingester_chunk_age_seconds_countUnknownins, instance, ip, job, clsN/A
loki_ingester_chunk_age_seconds_sumUnknownins, instance, ip, job, clsN/A
loki_ingester_chunk_bounds_hours_bucketUnknownins, instance, ip, le, job, clsN/A
loki_ingester_chunk_bounds_hours_countUnknownins, instance, ip, job, clsN/A
loki_ingester_chunk_bounds_hours_sumUnknownins, instance, ip, job, clsN/A
loki_ingester_chunk_compression_ratio_bucketUnknownins, instance, ip, le, job, clsN/A
loki_ingester_chunk_compression_ratio_countUnknownins, instance, ip, job, clsN/A
loki_ingester_chunk_compression_ratio_sumUnknownins, instance, ip, job, clsN/A
loki_ingester_chunk_encode_time_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
loki_ingester_chunk_encode_time_seconds_countUnknownins, instance, ip, job, clsN/A
loki_ingester_chunk_encode_time_seconds_sumUnknownins, instance, ip, job, clsN/A
loki_ingester_chunk_entries_bucketUnknownins, instance, ip, le, job, clsN/A
loki_ingester_chunk_entries_countUnknownins, instance, ip, job, clsN/A
loki_ingester_chunk_entries_sumUnknownins, instance, ip, job, clsN/A
loki_ingester_chunk_size_bytes_bucketUnknownins, instance, ip, le, job, clsN/A
loki_ingester_chunk_size_bytes_countUnknownins, instance, ip, job, clsN/A
loki_ingester_chunk_size_bytes_sumUnknownins, instance, ip, job, clsN/A
loki_ingester_chunk_stored_bytes_totalUnknownins, instance, ip, job, cls, tenantN/A
loki_ingester_chunk_utilization_bucketUnknownins, instance, ip, le, job, clsN/A
loki_ingester_chunk_utilization_countUnknownins, instance, ip, job, clsN/A
loki_ingester_chunk_utilization_sumUnknownins, instance, ip, job, clsN/A
loki_ingester_chunks_created_totalUnknownins, instance, ip, job, clsN/A
loki_ingester_chunks_flushed_totalUnknownins, instance, ip, reason, job, clsN/A
loki_ingester_chunks_stored_totalUnknownins, instance, ip, job, cls, tenantN/A
loki_ingester_client_request_duration_seconds_bucketUnknownins, instance, ip, le, operation, job, cls, status_codeN/A
loki_ingester_client_request_duration_seconds_countUnknownins, instance, ip, operation, job, cls, status_codeN/A
loki_ingester_client_request_duration_seconds_sumUnknownins, instance, ip, operation, job, cls, status_codeN/A
loki_ingester_limiter_enabledgaugeins, instance, ip, job, clsWhether the ingester’s limiter is enabled
loki_ingester_memory_chunksgaugeins, instance, ip, job, clsThe total number of chunks in memory.
loki_ingester_memory_streamsgaugeins, instance, ip, job, cls, tenantThe total number of streams in memory per tenant.
loki_ingester_memory_streams_labels_bytesgaugeins, instance, ip, job, clsTotal bytes of labels of the streams in memory.
loki_ingester_received_chunksunknownins, instance, ip, job, clsThe total number of chunks received by this ingester whilst joining.
loki_ingester_samples_per_chunk_bucketUnknownins, instance, ip, le, job, clsN/A
loki_ingester_samples_per_chunk_countUnknownins, instance, ip, job, clsN/A
loki_ingester_samples_per_chunk_sumUnknownins, instance, ip, job, clsN/A
loki_ingester_sent_chunksunknownins, instance, ip, job, clsThe total number of chunks sent by this ingester whilst leaving.
loki_ingester_shutdown_markergaugeins, instance, ip, job, cls1 if prepare shutdown has been called, 0 otherwise
loki_ingester_streams_created_totalUnknownins, instance, ip, job, cls, tenantN/A
loki_ingester_streams_removed_totalUnknownins, instance, ip, job, cls, tenantN/A
loki_ingester_wal_bytes_in_usegaugeins, instance, ip, job, clsTotal number of bytes in use by the WAL recovery process.
loki_ingester_wal_disk_full_failures_totalUnknownins, instance, ip, job, clsN/A
loki_ingester_wal_duplicate_entries_totalUnknownins, instance, ip, job, clsN/A
loki_ingester_wal_logged_bytes_totalUnknownins, instance, ip, job, clsN/A
loki_ingester_wal_records_logged_totalUnknownins, instance, ip, job, clsN/A
loki_ingester_wal_recovered_bytes_totalUnknownins, instance, ip, job, clsN/A
loki_ingester_wal_recovered_chunks_totalUnknownins, instance, ip, job, clsN/A
loki_ingester_wal_recovered_entries_totalUnknownins, instance, ip, job, clsN/A
loki_ingester_wal_recovered_streams_totalUnknownins, instance, ip, job, clsN/A
loki_ingester_wal_replay_activegaugeins, instance, ip, job, clsWhether the WAL is replaying
loki_ingester_wal_replay_duration_secondsgaugeins, instance, ip, job, clsTime taken to replay the checkpoint and the WAL.
loki_ingester_wal_replay_flushinggaugeins, instance, ip, job, clsWhether the wal replay is in a flushing phase due to backpressure
loki_internal_log_messages_totalUnknownins, instance, ip, level, job, clsN/A
loki_kv_request_duration_seconds_bucketUnknownins, instance, role, ip, le, kv_name, type, operation, job, cls, status_codeN/A
loki_kv_request_duration_seconds_countUnknownins, instance, role, ip, kv_name, type, operation, job, cls, status_codeN/A
loki_kv_request_duration_seconds_sumUnknownins, instance, role, ip, kv_name, type, operation, job, cls, status_codeN/A
loki_log_flushes_bucketUnknownins, instance, ip, le, job, clsN/A
loki_log_flushes_countUnknownins, instance, ip, job, clsN/A
loki_log_flushes_sumUnknownins, instance, ip, job, clsN/A
loki_log_messages_totalUnknownins, instance, ip, level, job, clsN/A
loki_logql_querystats_bytes_processed_per_seconds_bucketUnknownins, instance, range, ip, le, sharded, type, job, cls, status_code, latency_typeN/A
loki_logql_querystats_bytes_processed_per_seconds_countUnknownins, instance, range, ip, sharded, type, job, cls, status_code, latency_typeN/A
loki_logql_querystats_bytes_processed_per_seconds_sumUnknownins, instance, range, ip, sharded, type, job, cls, status_code, latency_typeN/A
loki_logql_querystats_chunk_download_latency_seconds_bucketUnknownins, instance, range, ip, le, type, job, cls, status_codeN/A
loki_logql_querystats_chunk_download_latency_seconds_countUnknownins, instance, range, ip, type, job, cls, status_codeN/A
loki_logql_querystats_chunk_download_latency_seconds_sumUnknownins, instance, range, ip, type, job, cls, status_codeN/A
loki_logql_querystats_downloaded_chunk_totalUnknownins, instance, range, ip, type, job, cls, status_codeN/A
loki_logql_querystats_duplicates_totalUnknownins, instance, ip, job, clsN/A
loki_logql_querystats_ingester_sent_lines_totalUnknownins, instance, ip, job, clsN/A
loki_logql_querystats_latency_seconds_bucketUnknownins, instance, range, ip, le, type, job, cls, status_codeN/A
loki_logql_querystats_latency_seconds_countUnknownins, instance, range, ip, type, job, cls, status_codeN/A
loki_logql_querystats_latency_seconds_sumUnknownins, instance, range, ip, type, job, cls, status_codeN/A
loki_panic_totalUnknownins, instance, ip, job, clsN/A
loki_querier_index_cache_corruptions_totalUnknownins, instance, ip, job, clsN/A
loki_querier_index_cache_encode_errors_totalUnknownins, instance, ip, job, clsN/A
loki_querier_index_cache_gets_totalUnknownins, instance, ip, job, clsN/A
loki_querier_index_cache_hits_totalUnknownins, instance, ip, job, clsN/A
loki_querier_index_cache_puts_totalUnknownins, instance, ip, job, clsN/A
loki_querier_query_frontend_clientsgaugeins, instance, ip, job, clsThe current number of clients connected to query-frontend.
loki_querier_query_frontend_request_duration_seconds_bucketUnknownins, instance, ip, le, operation, job, cls, status_codeN/A
loki_querier_query_frontend_request_duration_seconds_countUnknownins, instance, ip, operation, job, cls, status_codeN/A
loki_querier_query_frontend_request_duration_seconds_sumUnknownins, instance, ip, operation, job, cls, status_codeN/A
loki_querier_tail_activegaugeins, instance, ip, job, clsNumber of active tailers
loki_querier_tail_active_streamsgaugeins, instance, ip, job, clsNumber of active streams being tailed
loki_querier_tail_bytes_totalUnknownins, instance, ip, job, clsN/A
loki_querier_worker_concurrencygaugeins, instance, ip, job, clsNumber of concurrent querier workers
loki_querier_worker_inflight_queriesgaugeins, instance, ip, job, clsNumber of queries being processed by the querier workers
loki_query_frontend_log_result_cache_hit_totalUnknownins, instance, ip, job, clsN/A
loki_query_frontend_log_result_cache_miss_totalUnknownins, instance, ip, job, clsN/A
loki_query_frontend_partitions_bucketUnknownins, instance, ip, le, job, clsN/A
loki_query_frontend_partitions_countUnknownins, instance, ip, job, clsN/A
loki_query_frontend_partitions_sumUnknownins, instance, ip, job, clsN/A
loki_query_frontend_shard_factor_bucketUnknownins, instance, ip, le, mapper, job, clsN/A
loki_query_frontend_shard_factor_countUnknownins, instance, ip, mapper, job, clsN/A
loki_query_frontend_shard_factor_sumUnknownins, instance, ip, mapper, job, clsN/A
loki_query_scheduler_enqueue_countUnknownins, instance, ip, level, user, job, clsN/A
loki_rate_store_expired_streams_totalUnknownins, instance, ip, job, clsN/A
loki_rate_store_max_stream_rate_bytesgaugeins, instance, ip, job, clsThe maximum stream rate for any stream reported by ingesters during a sync operation. Sharded Streams are combined.
loki_rate_store_max_stream_shardsgaugeins, instance, ip, job, clsThe number of shards for a single stream reported by ingesters during a sync operation.
loki_rate_store_max_unique_stream_rate_bytesgaugeins, instance, ip, job, clsThe maximum stream rate for any stream reported by ingesters during a sync operation. Sharded Streams are considered separate.
loki_rate_store_stream_rate_bytes_bucketUnknownins, instance, ip, le, job, clsN/A
loki_rate_store_stream_rate_bytes_countUnknownins, instance, ip, job, clsN/A
loki_rate_store_stream_rate_bytes_sumUnknownins, instance, ip, job, clsN/A
loki_rate_store_stream_shards_bucketUnknownins, instance, ip, le, job, clsN/A
loki_rate_store_stream_shards_countUnknownins, instance, ip, job, clsN/A
loki_rate_store_stream_shards_sumUnknownins, instance, ip, job, clsN/A
loki_rate_store_streamsgaugeins, instance, ip, job, clsThe number of unique streams reported by all ingesters. Sharded streams are combined
loki_request_duration_seconds_bucketUnknownins, instance, method, ip, le, ws, route, job, cls, status_codeN/A
loki_request_duration_seconds_countUnknownins, instance, method, ip, ws, route, job, cls, status_codeN/A
loki_request_duration_seconds_sumUnknownins, instance, method, ip, ws, route, job, cls, status_codeN/A
loki_request_message_bytes_bucketUnknownins, instance, method, ip, le, route, job, clsN/A
loki_request_message_bytes_countUnknownins, instance, method, ip, route, job, clsN/A
loki_request_message_bytes_sumUnknownins, instance, method, ip, route, job, clsN/A
loki_response_message_bytes_bucketUnknownins, instance, method, ip, le, route, job, clsN/A
loki_response_message_bytes_countUnknownins, instance, method, ip, route, job, clsN/A
loki_response_message_bytes_sumUnknownins, instance, method, ip, route, job, clsN/A
loki_results_cache_version_comparisons_totalUnknownins, instance, ip, job, clsN/A
loki_store_chunks_downloaded_totalUnknownins, instance, ip, status, job, clsN/A
loki_store_chunks_per_batch_bucketUnknownins, instance, ip, le, status, job, clsN/A
loki_store_chunks_per_batch_countUnknownins, instance, ip, status, job, clsN/A
loki_store_chunks_per_batch_sumUnknownins, instance, ip, status, job, clsN/A
loki_store_series_totalUnknownins, instance, ip, status, job, clsN/A
loki_stream_sharding_countunknownins, instance, ip, job, clsTotal number of times the distributor has sharded streams
loki_tcp_connectionsgaugeins, instance, ip, protocol, job, clsCurrent number of accepted TCP connections.
loki_tcp_connections_limitgaugeins, instance, ip, protocol, job, clsThe max number of TCP connections that can be accepted (0 means no limit).
net_conntrack_dialer_conn_attempted_totalcounterins, instance, ip, dialer_name, job, clsTotal number of connections attempted by the given dialer a given name.
net_conntrack_dialer_conn_closed_totalcounterins, instance, ip, dialer_name, job, clsTotal number of connections closed which originated from the dialer of a given name.
net_conntrack_dialer_conn_established_totalcounterins, instance, ip, dialer_name, job, clsTotal number of connections successfully established by the given dialer a given name.
net_conntrack_dialer_conn_failed_totalcounterins, instance, ip, dialer_name, reason, job, clsTotal number of connections failed to dial by the dialer a given name.
net_conntrack_listener_conn_accepted_totalcounterins, instance, ip, listener_name, job, clsTotal number of connections opened to the listener of a given name.
net_conntrack_listener_conn_closed_totalcounterins, instance, ip, listener_name, job, clsTotal number of connections closed that were made to the listener of a given name.
nginx_connections_acceptedcounterins, instance, ip, job, clsAccepted client connections
nginx_connections_activegaugeins, instance, ip, job, clsActive client connections
nginx_connections_handledcounterins, instance, ip, job, clsHandled client connections
nginx_connections_readinggaugeins, instance, ip, job, clsConnections where NGINX is reading the request header
nginx_connections_waitinggaugeins, instance, ip, job, clsIdle client connections
nginx_connections_writinggaugeins, instance, ip, job, clsConnections where NGINX is writing the response back to the client
nginx_exporter_build_infogaugerevision, version, ins, instance, ip, tags, goarch, goversion, job, cls, branch, goosA metric with a constant ‘1’ value labeled by version, revision, branch, goversion from which nginx_exporter was built, and the goos and goarch for the build.
nginx_http_requests_totalcounterins, instance, ip, job, clsTotal http requests
nginx_upgaugeins, instance, ip, job, clsStatus of the last metric scrape
plugins_active_instancesgaugeins, instance, ip, job, clsThe number of active plugin instances
plugins_datasource_instances_totalUnknownins, instance, ip, job, clsN/A
process_cpu_seconds_totalcounterins, instance, ip, job, clsTotal user and system CPU time spent in seconds.
process_max_fdsgaugeins, instance, ip, job, clsMaximum number of open file descriptors.
process_open_fdsgaugeins, instance, ip, job, clsNumber of open file descriptors.
process_resident_memory_bytesgaugeins, instance, ip, job, clsResident memory size in bytes.
process_start_time_secondsgaugeins, instance, ip, job, clsStart time of the process since unix epoch in seconds.
process_virtual_memory_bytesgaugeins, instance, ip, job, clsVirtual memory size in bytes.
process_virtual_memory_max_bytesgaugeins, instance, ip, job, clsMaximum amount of virtual memory available in bytes.
prometheus_api_remote_read_queriesgaugeins, instance, ip, job, clsThe current number of remote read queries being executed or waiting.
prometheus_build_infogaugerevision, version, ins, instance, ip, tags, goarch, goversion, job, cls, branch, goosA metric with a constant ‘1’ value labeled by version, revision, branch, goversion from which prometheus was built, and the goos and goarch for the build.
prometheus_config_last_reload_success_timestamp_secondsgaugeins, instance, ip, job, clsTimestamp of the last successful configuration reload.
prometheus_config_last_reload_successfulgaugeins, instance, ip, job, clsWhether the last configuration reload attempt was successful.
prometheus_engine_queriesgaugeins, instance, ip, job, clsThe current number of queries being executed or waiting.
prometheus_engine_queries_concurrent_maxgaugeins, instance, ip, job, clsThe max number of concurrent queries.
prometheus_engine_query_duration_secondssummaryins, instance, ip, job, cls, quantile, sliceQuery timings
prometheus_engine_query_duration_seconds_countUnknownins, instance, ip, job, cls, sliceN/A
prometheus_engine_query_duration_seconds_sumUnknownins, instance, ip, job, cls, sliceN/A
prometheus_engine_query_log_enabledgaugeins, instance, ip, job, clsState of the query log.
prometheus_engine_query_log_failures_totalcounterins, instance, ip, job, clsThe number of query log failures.
prometheus_engine_query_samples_totalcounterins, instance, ip, job, clsThe total number of samples loaded by all queries.
prometheus_http_request_duration_seconds_bucketUnknownins, instance, ip, le, job, cls, handlerN/A
prometheus_http_request_duration_seconds_countUnknownins, instance, ip, job, cls, handlerN/A
prometheus_http_request_duration_seconds_sumUnknownins, instance, ip, job, cls, handlerN/A
prometheus_http_requests_totalcounterins, instance, ip, job, cls, code, handlerCounter of HTTP requests.
prometheus_http_response_size_bytes_bucketUnknownins, instance, ip, le, job, cls, handlerN/A
prometheus_http_response_size_bytes_countUnknownins, instance, ip, job, cls, handlerN/A
prometheus_http_response_size_bytes_sumUnknownins, instance, ip, job, cls, handlerN/A
prometheus_notifications_alertmanagers_discoveredgaugeins, instance, ip, job, clsThe number of alertmanagers discovered and active.
prometheus_notifications_dropped_totalcounterins, instance, ip, job, clsTotal number of alerts dropped due to errors when sending to Alertmanager.
prometheus_notifications_errors_totalcounterins, instance, ip, alertmanager, job, clsTotal number of errors sending alert notifications.
prometheus_notifications_latency_secondssummaryins, instance, ip, alertmanager, job, cls, quantileLatency quantiles for sending alert notifications.
prometheus_notifications_latency_seconds_countUnknownins, instance, ip, alertmanager, job, clsN/A
prometheus_notifications_latency_seconds_sumUnknownins, instance, ip, alertmanager, job, clsN/A
prometheus_notifications_queue_capacitygaugeins, instance, ip, job, clsThe capacity of the alert notifications queue.
prometheus_notifications_queue_lengthgaugeins, instance, ip, job, clsThe number of alert notifications in the queue.
prometheus_notifications_sent_totalcounterins, instance, ip, alertmanager, job, clsTotal number of alerts sent.
prometheus_readygaugeins, instance, ip, job, clsWhether Prometheus startup was fully completed and the server is ready for normal operation.
prometheus_remote_storage_exemplars_in_totalcounterins, instance, ip, job, clsExemplars in to remote storage, compare to exemplars out for queue managers.
prometheus_remote_storage_highest_timestamp_in_secondsgaugeins, instance, ip, job, clsHighest timestamp that has come into the remote storage via the Appender interface, in seconds since epoch.
prometheus_remote_storage_histograms_in_totalcounterins, instance, ip, job, clsHistogramSamples in to remote storage, compare to histograms out for queue managers.
prometheus_remote_storage_samples_in_totalcounterins, instance, ip, job, clsSamples in to remote storage, compare to samples out for queue managers.
prometheus_remote_storage_string_interner_zero_reference_releases_totalcounterins, instance, ip, job, clsThe number of times release has been called for strings that are not interned.
prometheus_rule_evaluation_duration_secondssummaryins, instance, ip, job, cls, quantileThe duration for a rule to execute.
prometheus_rule_evaluation_duration_seconds_countUnknownins, instance, ip, job, clsN/A
prometheus_rule_evaluation_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
prometheus_rule_evaluation_failures_totalcounterins, instance, ip, job, cls, rule_groupThe total number of rule evaluation failures.
prometheus_rule_evaluations_totalcounterins, instance, ip, job, cls, rule_groupThe total number of rule evaluations.
prometheus_rule_group_duration_secondssummaryins, instance, ip, job, cls, quantileThe duration of rule group evaluations.
prometheus_rule_group_duration_seconds_countUnknownins, instance, ip, job, clsN/A
prometheus_rule_group_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
prometheus_rule_group_interval_secondsgaugeins, instance, ip, job, cls, rule_groupThe interval of a rule group.
prometheus_rule_group_iterations_missed_totalcounterins, instance, ip, job, cls, rule_groupThe total number of rule group evaluations missed due to slow rule group evaluation.
prometheus_rule_group_iterations_totalcounterins, instance, ip, job, cls, rule_groupThe total number of scheduled rule group evaluations, whether executed or missed.
prometheus_rule_group_last_duration_secondsgaugeins, instance, ip, job, cls, rule_groupThe duration of the last rule group evaluation.
prometheus_rule_group_last_evaluation_samplesgaugeins, instance, ip, job, cls, rule_groupThe number of samples returned during the last rule group evaluation.
prometheus_rule_group_last_evaluation_timestamp_secondsgaugeins, instance, ip, job, cls, rule_groupThe timestamp of the last rule group evaluation in seconds.
prometheus_rule_group_rulesgaugeins, instance, ip, job, cls, rule_groupThe number of rules.
prometheus_sd_azure_cache_hit_totalcounterins, instance, ip, job, clsNumber of cache hit during refresh.
prometheus_sd_azure_failures_totalcounterins, instance, ip, job, clsNumber of Azure service discovery refresh failures.
prometheus_sd_consul_rpc_duration_secondssummaryendpoint, ins, instance, ip, job, cls, call, quantileThe duration of a Consul RPC call in seconds.
prometheus_sd_consul_rpc_duration_seconds_countUnknownendpoint, ins, instance, ip, job, cls, callN/A
prometheus_sd_consul_rpc_duration_seconds_sumUnknownendpoint, ins, instance, ip, job, cls, callN/A
prometheus_sd_consul_rpc_failures_totalcounterins, instance, ip, job, clsThe number of Consul RPC call failures.
prometheus_sd_discovered_targetsgaugeins, instance, ip, config, job, clsCurrent number of discovered targets.
prometheus_sd_dns_lookup_failures_totalcounterins, instance, ip, job, clsThe number of DNS-SD lookup failures.
prometheus_sd_dns_lookups_totalcounterins, instance, ip, job, clsThe number of DNS-SD lookups.
prometheus_sd_failed_configsgaugeins, instance, ip, job, clsCurrent number of service discovery configurations that failed to load.
prometheus_sd_file_mtime_secondsgaugeins, instance, ip, filename, job, clsTimestamp (mtime) of files read by FileSD. Timestamp is set at read time.
prometheus_sd_file_read_errors_totalcounterins, instance, ip, job, clsThe number of File-SD read errors.
prometheus_sd_file_scan_duration_secondssummaryins, instance, ip, job, cls, quantileThe duration of the File-SD scan in seconds.
prometheus_sd_file_scan_duration_seconds_countUnknownins, instance, ip, job, clsN/A
prometheus_sd_file_scan_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
prometheus_sd_file_watcher_errors_totalcounterins, instance, ip, job, clsThe number of File-SD errors caused by filesystem watch failures.
prometheus_sd_http_failures_totalcounterins, instance, ip, job, clsNumber of HTTP service discovery refresh failures.
prometheus_sd_kubernetes_events_totalcounterevent, ins, instance, role, ip, job, clsThe number of Kubernetes events handled.
prometheus_sd_kuma_fetch_duration_secondssummaryins, instance, ip, job, cls, quantileThe duration of a Kuma MADS fetch call.
prometheus_sd_kuma_fetch_duration_seconds_countUnknownins, instance, ip, job, clsN/A
prometheus_sd_kuma_fetch_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
prometheus_sd_kuma_fetch_failures_totalcounterins, instance, ip, job, clsThe number of Kuma MADS fetch call failures.
prometheus_sd_kuma_fetch_skipped_updates_totalcounterins, instance, ip, job, clsThe number of Kuma MADS fetch calls that result in no updates to the targets.
prometheus_sd_linode_failures_totalcounterins, instance, ip, job, clsNumber of Linode service discovery refresh failures.
prometheus_sd_nomad_failures_totalcounterins, instance, ip, job, clsNumber of nomad service discovery refresh failures.
prometheus_sd_received_updates_totalcounterins, instance, ip, job, clsTotal number of update events received from the SD providers.
prometheus_sd_updates_totalcounterins, instance, ip, job, clsTotal number of update events sent to the SD consumers.
prometheus_target_interval_length_secondssummaryins, instance, interval, ip, job, cls, quantileActual intervals between scrapes.
prometheus_target_interval_length_seconds_countUnknownins, instance, interval, ip, job, clsN/A
prometheus_target_interval_length_seconds_sumUnknownins, instance, interval, ip, job, clsN/A
prometheus_target_metadata_cache_bytesgaugeins, instance, ip, scrape_job, job, clsThe number of bytes that are currently used for storing metric metadata in the cache
prometheus_target_metadata_cache_entriesgaugeins, instance, ip, scrape_job, job, clsTotal number of metric metadata entries in the cache
prometheus_target_scrape_pool_exceeded_label_limits_totalcounterins, instance, ip, job, clsTotal number of times scrape pools hit the label limits, during sync or config reload.
prometheus_target_scrape_pool_exceeded_target_limit_totalcounterins, instance, ip, job, clsTotal number of times scrape pools hit the target limit, during sync or config reload.
prometheus_target_scrape_pool_reloads_failed_totalcounterins, instance, ip, job, clsTotal number of failed scrape pool reloads.
prometheus_target_scrape_pool_reloads_totalcounterins, instance, ip, job, clsTotal number of scrape pool reloads.
prometheus_target_scrape_pool_sync_totalcounterins, instance, ip, scrape_job, job, clsTotal number of syncs that were executed on a scrape pool.
prometheus_target_scrape_pool_target_limitgaugeins, instance, ip, scrape_job, job, clsMaximum number of targets allowed in this scrape pool.
prometheus_target_scrape_pool_targetsgaugeins, instance, ip, scrape_job, job, clsCurrent number of targets in this scrape pool.
prometheus_target_scrape_pools_failed_totalcounterins, instance, ip, job, clsTotal number of scrape pool creations that failed.
prometheus_target_scrape_pools_totalcounterins, instance, ip, job, clsTotal number of scrape pool creation attempts.
prometheus_target_scrapes_cache_flush_forced_totalcounterins, instance, ip, job, clsHow many times a scrape cache was flushed due to getting big while scrapes are failing.
prometheus_target_scrapes_exceeded_body_size_limit_totalcounterins, instance, ip, job, clsTotal number of scrapes that hit the body size limit
prometheus_target_scrapes_exceeded_native_histogram_bucket_limit_totalcounterins, instance, ip, job, clsTotal number of scrapes that hit the native histogram bucket limit and were rejected.
prometheus_target_scrapes_exceeded_sample_limit_totalcounterins, instance, ip, job, clsTotal number of scrapes that hit the sample limit and were rejected.
prometheus_target_scrapes_exemplar_out_of_order_totalcounterins, instance, ip, job, clsTotal number of exemplar rejected due to not being out of the expected order.
prometheus_target_scrapes_sample_duplicate_timestamp_totalcounterins, instance, ip, job, clsTotal number of samples rejected due to duplicate timestamps but different values.
prometheus_target_scrapes_sample_out_of_bounds_totalcounterins, instance, ip, job, clsTotal number of samples rejected due to timestamp falling outside of the time bounds.
prometheus_target_scrapes_sample_out_of_order_totalcounterins, instance, ip, job, clsTotal number of samples rejected due to not being out of the expected order.
prometheus_target_sync_failed_totalcounterins, instance, ip, scrape_job, job, clsTotal number of target sync failures.
prometheus_target_sync_length_secondssummaryins, instance, ip, scrape_job, job, cls, quantileActual interval to sync the scrape pool.
prometheus_target_sync_length_seconds_countUnknownins, instance, ip, scrape_job, job, clsN/A
prometheus_target_sync_length_seconds_sumUnknownins, instance, ip, scrape_job, job, clsN/A
prometheus_template_text_expansion_failures_totalcounterins, instance, ip, job, clsThe total number of template text expansion failures.
prometheus_template_text_expansions_totalcounterins, instance, ip, job, clsThe total number of template text expansions.
prometheus_treecache_watcher_goroutinesgaugeins, instance, ip, job, clsThe current number of watcher goroutines.
prometheus_treecache_zookeeper_failures_totalcounterins, instance, ip, job, clsThe total number of ZooKeeper failures.
prometheus_tsdb_blocks_loadedgaugeins, instance, ip, job, clsNumber of currently loaded data blocks
prometheus_tsdb_checkpoint_creations_failed_totalcounterins, instance, ip, job, clsTotal number of checkpoint creations that failed.
prometheus_tsdb_checkpoint_creations_totalcounterins, instance, ip, job, clsTotal number of checkpoint creations attempted.
prometheus_tsdb_checkpoint_deletions_failed_totalcounterins, instance, ip, job, clsTotal number of checkpoint deletions that failed.
prometheus_tsdb_checkpoint_deletions_totalcounterins, instance, ip, job, clsTotal number of checkpoint deletions attempted.
prometheus_tsdb_clean_startgaugeins, instance, ip, job, cls-1: lockfile is disabled. 0: a lockfile from a previous execution was replaced. 1: lockfile creation was clean
prometheus_tsdb_compaction_chunk_range_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
prometheus_tsdb_compaction_chunk_range_seconds_countUnknownins, instance, ip, job, clsN/A
prometheus_tsdb_compaction_chunk_range_seconds_sumUnknownins, instance, ip, job, clsN/A
prometheus_tsdb_compaction_chunk_samples_bucketUnknownins, instance, ip, le, job, clsN/A
prometheus_tsdb_compaction_chunk_samples_countUnknownins, instance, ip, job, clsN/A
prometheus_tsdb_compaction_chunk_samples_sumUnknownins, instance, ip, job, clsN/A
prometheus_tsdb_compaction_chunk_size_bytes_bucketUnknownins, instance, ip, le, job, clsN/A
prometheus_tsdb_compaction_chunk_size_bytes_countUnknownins, instance, ip, job, clsN/A
prometheus_tsdb_compaction_chunk_size_bytes_sumUnknownins, instance, ip, job, clsN/A
prometheus_tsdb_compaction_duration_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
prometheus_tsdb_compaction_duration_seconds_countUnknownins, instance, ip, job, clsN/A
prometheus_tsdb_compaction_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
prometheus_tsdb_compaction_populating_blockgaugeins, instance, ip, job, clsSet to 1 when a block is currently being written to the disk.
prometheus_tsdb_compactions_failed_totalcounterins, instance, ip, job, clsTotal number of compactions that failed for the partition.
prometheus_tsdb_compactions_skipped_totalcounterins, instance, ip, job, clsTotal number of skipped compactions due to disabled auto compaction.
prometheus_tsdb_compactions_totalcounterins, instance, ip, job, clsTotal number of compactions that were executed for the partition.
prometheus_tsdb_compactions_triggered_totalcounterins, instance, ip, job, clsTotal number of triggered compactions for the partition.
prometheus_tsdb_data_replay_duration_secondsgaugeins, instance, ip, job, clsTime taken to replay the data on disk.
prometheus_tsdb_exemplar_exemplars_appended_totalcounterins, instance, ip, job, clsTotal number of appended exemplars.
prometheus_tsdb_exemplar_exemplars_in_storagegaugeins, instance, ip, job, clsNumber of exemplars currently in circular storage.
prometheus_tsdb_exemplar_last_exemplars_timestamp_secondsgaugeins, instance, ip, job, clsThe timestamp of the oldest exemplar stored in circular storage. Useful to check for what timerange the current exemplar buffer limit allows. This usually means the last timestampfor all exemplars for a typical setup. This is not true though if one of the series timestamp is in future compared to rest series.
prometheus_tsdb_exemplar_max_exemplarsgaugeins, instance, ip, job, clsTotal number of exemplars the exemplar storage can store, resizeable.
prometheus_tsdb_exemplar_out_of_order_exemplars_totalcounterins, instance, ip, job, clsTotal number of out of order exemplar ingestion failed attempts.
prometheus_tsdb_exemplar_series_with_exemplars_in_storagegaugeins, instance, ip, job, clsNumber of series with exemplars currently in circular storage.
prometheus_tsdb_head_active_appendersgaugeins, instance, ip, job, clsNumber of currently active appender transactions
prometheus_tsdb_head_chunksgaugeins, instance, ip, job, clsTotal number of chunks in the head block.
prometheus_tsdb_head_chunks_created_totalcounterins, instance, ip, job, clsTotal number of chunks created in the head
prometheus_tsdb_head_chunks_removed_totalcounterins, instance, ip, job, clsTotal number of chunks removed in the head
prometheus_tsdb_head_chunks_storage_size_bytesgaugeins, instance, ip, job, clsSize of the chunks_head directory.
prometheus_tsdb_head_gc_duration_seconds_countUnknownins, instance, ip, job, clsN/A
prometheus_tsdb_head_gc_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
prometheus_tsdb_head_max_timegaugeins, instance, ip, job, clsMaximum timestamp of the head block. The unit is decided by the library consumer.
prometheus_tsdb_head_max_time_secondsgaugeins, instance, ip, job, clsMaximum timestamp of the head block.
prometheus_tsdb_head_min_timegaugeins, instance, ip, job, clsMinimum time bound of the head block. The unit is decided by the library consumer.
prometheus_tsdb_head_min_time_secondsgaugeins, instance, ip, job, clsMinimum time bound of the head block.
prometheus_tsdb_head_out_of_order_samples_appended_totalcounterins, instance, ip, job, clsTotal number of appended out of order samples.
prometheus_tsdb_head_samples_appended_totalcounterins, instance, ip, type, job, clsTotal number of appended samples.
prometheus_tsdb_head_seriesgaugeins, instance, ip, job, clsTotal number of series in the head block.
prometheus_tsdb_head_series_created_totalcounterins, instance, ip, job, clsTotal number of series created in the head
prometheus_tsdb_head_series_not_found_totalcounterins, instance, ip, job, clsTotal number of requests for series that were not found.
prometheus_tsdb_head_series_removed_totalcounterins, instance, ip, job, clsTotal number of series removed in the head
prometheus_tsdb_head_truncations_failed_totalcounterins, instance, ip, job, clsTotal number of head truncations that failed.
prometheus_tsdb_head_truncations_totalcounterins, instance, ip, job, clsTotal number of head truncations attempted.
prometheus_tsdb_isolation_high_watermarkgaugeins, instance, ip, job, clsThe highest TSDB append ID that has been given out.
prometheus_tsdb_isolation_low_watermarkgaugeins, instance, ip, job, clsThe lowest TSDB append ID that is still referenced.
prometheus_tsdb_lowest_timestampgaugeins, instance, ip, job, clsLowest timestamp value stored in the database. The unit is decided by the library consumer.
prometheus_tsdb_lowest_timestamp_secondsgaugeins, instance, ip, job, clsLowest timestamp value stored in the database.
prometheus_tsdb_mmap_chunk_corruptions_totalcounterins, instance, ip, job, clsTotal number of memory-mapped chunk corruptions.
prometheus_tsdb_mmap_chunks_totalcounterins, instance, ip, job, clsTotal number of chunks that were memory-mapped.
prometheus_tsdb_out_of_bound_samples_totalcounterins, instance, ip, type, job, clsTotal number of out of bound samples ingestion failed attempts with out of order support disabled.
prometheus_tsdb_out_of_order_samples_totalcounterins, instance, ip, type, job, clsTotal number of out of order samples ingestion failed attempts due to out of order being disabled.
prometheus_tsdb_reloads_failures_totalcounterins, instance, ip, job, clsNumber of times the database failed to reloadBlocks block data from disk.
prometheus_tsdb_reloads_totalcounterins, instance, ip, job, clsNumber of times the database reloaded block data from disk.
prometheus_tsdb_retention_limit_bytesgaugeins, instance, ip, job, clsMax number of bytes to be retained in the tsdb blocks, configured 0 means disabled
prometheus_tsdb_retention_limit_secondsgaugeins, instance, ip, job, clsHow long to retain samples in storage.
prometheus_tsdb_size_retentions_totalcounterins, instance, ip, job, clsThe number of times that blocks were deleted because the maximum number of bytes was exceeded.
prometheus_tsdb_snapshot_replay_error_totalcounterins, instance, ip, job, clsTotal number snapshot replays that failed.
prometheus_tsdb_storage_blocks_bytesgaugeins, instance, ip, job, clsThe number of bytes that are currently used for local storage by all blocks.
prometheus_tsdb_symbol_table_size_bytesgaugeins, instance, ip, job, clsSize of symbol table in memory for loaded blocks
prometheus_tsdb_time_retentions_totalcounterins, instance, ip, job, clsThe number of times that blocks were deleted because the maximum time limit was exceeded.
prometheus_tsdb_tombstone_cleanup_seconds_bucketUnknownins, instance, ip, le, job, clsN/A
prometheus_tsdb_tombstone_cleanup_seconds_countUnknownins, instance, ip, job, clsN/A
prometheus_tsdb_tombstone_cleanup_seconds_sumUnknownins, instance, ip, job, clsN/A
prometheus_tsdb_too_old_samples_totalcounterins, instance, ip, type, job, clsTotal number of out of order samples ingestion failed attempts with out of support enabled, but sample outside of time window.
prometheus_tsdb_vertical_compactions_totalcounterins, instance, ip, job, clsTotal number of compactions done on overlapping blocks.
prometheus_tsdb_wal_completed_pages_totalcounterins, instance, ip, job, clsTotal number of completed pages.
prometheus_tsdb_wal_corruptions_totalcounterins, instance, ip, job, clsTotal number of WAL corruptions.
prometheus_tsdb_wal_fsync_duration_secondssummaryins, instance, ip, job, cls, quantileDuration of write log fsync.
prometheus_tsdb_wal_fsync_duration_seconds_countUnknownins, instance, ip, job, clsN/A
prometheus_tsdb_wal_fsync_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
prometheus_tsdb_wal_page_flushes_totalcounterins, instance, ip, job, clsTotal number of page flushes.
prometheus_tsdb_wal_segment_currentgaugeins, instance, ip, job, clsWrite log segment index that TSDB is currently writing to.
prometheus_tsdb_wal_storage_size_bytesgaugeins, instance, ip, job, clsSize of the write log directory.
prometheus_tsdb_wal_truncate_duration_seconds_countUnknownins, instance, ip, job, clsN/A
prometheus_tsdb_wal_truncate_duration_seconds_sumUnknownins, instance, ip, job, clsN/A
prometheus_tsdb_wal_truncations_failed_totalcounterins, instance, ip, job, clsTotal number of write log truncations that failed.
prometheus_tsdb_wal_truncations_totalcounterins, instance, ip, job, clsTotal number of write log truncations attempted.
prometheus_tsdb_wal_writes_failed_totalcounterins, instance, ip, job, clsTotal number of write log writes that failed.
prometheus_web_federation_errors_totalcounterins, instance, ip, job, clsTotal number of errors that occurred while sending federation responses.
prometheus_web_federation_warnings_totalcounterins, instance, ip, job, clsTotal number of warnings that occurred while sending federation responses.
promhttp_metric_handler_requests_in_flightgaugeins, instance, ip, job, clsCurrent number of scrapes being served.
promhttp_metric_handler_requests_totalcounterins, instance, ip, job, cls, codeTotal number of scrapes by HTTP status code.
pushgateway_build_infogaugerevision, version, ins, instance, ip, tags, goarch, goversion, job, cls, branch, goosA metric with a constant ‘1’ value labeled by version, revision, branch, goversion from which pushgateway was built, and the goos and goarch for the build.
pushgateway_http_requests_totalcounterins, instance, method, ip, job, cls, code, handlerTotal HTTP requests processed by the Pushgateway, excluding scrapes.
querier_cache_added_new_totalUnknownins, instance, ip, job, cache, clsN/A
querier_cache_added_totalUnknownins, instance, ip, job, cache, clsN/A
querier_cache_entriesgaugeins, instance, ip, job, cache, clsThe total number of entries
querier_cache_evicted_totalUnknownins, instance, ip, job, reason, cache, clsN/A
querier_cache_gets_totalUnknownins, instance, ip, job, cache, clsN/A
querier_cache_memory_bytesgaugeins, instance, ip, job, cache, clsThe current cache size in bytes
querier_cache_misses_totalUnknownins, instance, ip, job, cache, clsN/A
querier_cache_stale_gets_totalUnknownins, instance, ip, job, cache, clsN/A
ring_member_heartbeats_totalUnknownins, instance, ip, job, clsN/A
ring_member_tokens_ownedgaugeins, instance, ip, job, clsThe number of tokens owned in the ring.
ring_member_tokens_to_owngaugeins, instance, ip, job, clsThe number of tokens to own in the ring.
scrape_duration_secondsUnknownins, instance, ip, job, clsN/A
scrape_samples_post_metric_relabelingUnknownins, instance, ip, job, clsN/A
scrape_samples_scrapedUnknownins, instance, ip, job, clsN/A
scrape_series_addedUnknownins, instance, ip, job, clsN/A
upUnknownins, instance, ip, job, clsN/A

PING Metrics

PING job has 54 metrics, provided by blackbox_exporter.

Metric NameTypeLabelsDescription
agent_upUnknownins, ip, job, instance, clsN/A
probe_dns_lookup_time_secondsgaugeins, ip, job, instance, clsReturns the time taken for probe dns lookup in seconds
probe_duration_secondsgaugeins, ip, job, instance, clsReturns how long the probe took to complete in seconds
probe_icmp_duration_secondsgaugeins, ip, job, phase, instance, clsDuration of icmp request by phase
probe_icmp_reply_hop_limitgaugeins, ip, job, instance, clsReplied packet hop limit (TTL for ipv4)
probe_ip_addr_hashgaugeins, ip, job, instance, clsSpecifies the hash of IP address. It’s useful to detect if the IP address changes.
probe_ip_protocolgaugeins, ip, job, instance, clsSpecifies whether probe ip protocol is IP4 or IP6
probe_successgaugeins, ip, job, instance, clsDisplays whether or not the probe was a success
scrape_duration_secondsUnknownins, ip, job, instance, clsN/A
scrape_samples_post_metric_relabelingUnknownins, ip, job, instance, clsN/A
scrape_samples_scrapedUnknownins, ip, job, instance, clsN/A
scrape_series_addedUnknownins, ip, job, instance, clsN/A
upUnknownins, ip, job, instance, clsN/A

PUSH Metrics

PushGateway provides 44 metrics.

Metric NameTypeLabelsDescription
agent_upUnknownjob, cls, instance, ins, ipN/A
go_gc_duration_secondssummaryjob, cls, instance, ins, quantile, ipA summary of the pause duration of garbage collection cycles.
go_gc_duration_seconds_countUnknownjob, cls, instance, ins, ipN/A
go_gc_duration_seconds_sumUnknownjob, cls, instance, ins, ipN/A
go_goroutinesgaugejob, cls, instance, ins, ipNumber of goroutines that currently exist.
go_infogaugejob, cls, instance, ins, ip, versionInformation about the Go environment.
go_memstats_alloc_bytescounterjob, cls, instance, ins, ipTotal number of bytes allocated, even if freed.
go_memstats_alloc_bytes_totalcounterjob, cls, instance, ins, ipTotal number of bytes allocated, even if freed.
go_memstats_buck_hash_sys_bytesgaugejob, cls, instance, ins, ipNumber of bytes used by the profiling bucket hash table.
go_memstats_frees_totalcounterjob, cls, instance, ins, ipTotal number of frees.
go_memstats_gc_sys_bytesgaugejob, cls, instance, ins, ipNumber of bytes used for garbage collection system metadata.
go_memstats_heap_alloc_bytesgaugejob, cls, instance, ins, ipNumber of heap bytes allocated and still in use.
go_memstats_heap_idle_bytesgaugejob, cls, instance, ins, ipNumber of heap bytes waiting to be used.
go_memstats_heap_inuse_bytesgaugejob, cls, instance, ins, ipNumber of heap bytes that are in use.
go_memstats_heap_objectsgaugejob, cls, instance, ins, ipNumber of allocated objects.
go_memstats_heap_released_bytesgaugejob, cls, instance, ins, ipNumber of heap bytes released to OS.
go_memstats_heap_sys_bytesgaugejob, cls, instance, ins, ipNumber of heap bytes obtained from system.
go_memstats_last_gc_time_secondsgaugejob, cls, instance, ins, ipNumber of seconds since 1970 of last garbage collection.
go_memstats_lookups_totalcounterjob, cls, instance, ins, ipTotal number of pointer lookups.
go_memstats_mallocs_totalcounterjob, cls, instance, ins, ipTotal number of mallocs.
go_memstats_mcache_inuse_bytesgaugejob, cls, instance, ins, ipNumber of bytes in use by mcache structures.
go_memstats_mcache_sys_bytesgaugejob, cls, instance, ins, ipNumber of bytes used for mcache structures obtained from system.
go_memstats_mspan_inuse_bytesgaugejob, cls, instance, ins, ipNumber of bytes in use by mspan structures.
go_memstats_mspan_sys_bytesgaugejob, cls, instance, ins, ipNumber of bytes used for mspan structures obtained from system.
go_memstats_next_gc_bytesgaugejob, cls, instance, ins, ipNumber of heap bytes when next garbage collection will take place.
go_memstats_other_sys_bytesgaugejob, cls, instance, ins, ipNumber of bytes used for other system allocations.
go_memstats_stack_inuse_bytesgaugejob, cls, instance, ins, ipNumber of bytes in use by the stack allocator.
go_memstats_stack_sys_bytesgaugejob, cls, instance, ins, ipNumber of bytes obtained from system for stack allocator.
go_memstats_sys_bytesgaugejob, cls, instance, ins, ipNumber of bytes obtained from system.
go_threadsgaugejob, cls, instance, ins, ipNumber of OS threads created.
process_cpu_seconds_totalcounterjob, cls, instance, ins, ipTotal user and system CPU time spent in seconds.
process_max_fdsgaugejob, cls, instance, ins, ipMaximum number of open file descriptors.
process_open_fdsgaugejob, cls, instance, ins, ipNumber of open file descriptors.
process_resident_memory_bytesgaugejob, cls, instance, ins, ipResident memory size in bytes.
process_start_time_secondsgaugejob, cls, instance, ins, ipStart time of the process since unix epoch in seconds.
process_virtual_memory_bytesgaugejob, cls, instance, ins, ipVirtual memory size in bytes.
process_virtual_memory_max_bytesgaugejob, cls, instance, ins, ipMaximum amount of virtual memory available in bytes.
pushgateway_build_infogaugejob, goversion, cls, branch, instance, tags, revision, goarch, ins, ip, version, goosA metric with a constant ‘1’ value labeled by version, revision, branch, goversion from which pushgateway was built, and the goos and goarch for the build.
pushgateway_http_requests_totalcounterjob, cls, method, code, handler, instance, ins, ipTotal HTTP requests processed by the Pushgateway, excluding scrapes.
scrape_duration_secondsUnknownjob, cls, instance, ins, ipN/A
scrape_samples_post_metric_relabelingUnknownjob, cls, instance, ins, ipN/A
scrape_samples_scrapedUnknownjob, cls, instance, ins, ipN/A
scrape_series_addedUnknownjob, cls, instance, ins, ipN/A
upUnknownjob, cls, instance, ins, ipN/A

7 - FAQ

Frequently asked questions about the Pigsty INFRA infrastructure module

What components are included in the INFRA module?

  • Ansible: Used for automation configuration, deployment, and daily operations.
  • Nginx: Exposes WebUIs like Grafana, VictoriaMetrics (VMUI), Alertmanager, and hosts local YUM/APT repositories.
  • Self-signed CA: Issues SSL/TLS certificates for components like Nginx, Patroni, pgBackRest.
  • VictoriaMetrics Suite: Replaces Prometheus/Loki, including VictoriaMetrics (TSDB), VMAlert (alert evaluation), VictoriaLogs (centralized logs), VictoriaTraces (tracing).
  • Vector: Node-side log collector, pushes system/database logs to VictoriaLogs.
  • AlertManager: Aggregates and dispatches alert notifications.
  • Grafana: Monitoring/visualization platform with numerous preconfigured dashboards and datasources.
  • Chronyd: Provides NTP time synchronization.
  • DNSMasq: Provides DNS registration and resolution.
  • ETCD: Acts as PostgreSQL HA DCS (can also be deployed on dedicated cluster).
  • PostgreSQL: Acts as CMDB on the admin node (optional).
  • Docker: Runs stateless tools or applications on nodes (optional).

How to re-register monitoring targets to VictoriaMetrics?

VictoriaMetrics uses static service discovery through the /infra/targets/<job>/*.yml directory. If target files are accidentally deleted, use the following commands to re-register:

./infra.yml  -t infra_register   # Re-render infra self-monitoring targets
./node.yml   -t node_register    # Re-render node / HAProxy / Vector targets
./etcd.yml   -t etcd_register    # Re-render etcd targets
./minio.yml  -t minio_register   # Re-render MinIO targets
./pgsql.yml  -t pg_register      # Re-render PGSQL/Patroni targets
./redis.yml  -t redis_register   # Re-render Redis targets

Other modules (like pg_monitor.yml, mongo.yml, mysql.yml) also provide corresponding *_register tags that can be executed as needed.


How to re-register PostgreSQL datasources to Grafana?

PGSQL databases defined in pg_databases are registered as Grafana datasources by default (for use by PGCAT applications).

If you accidentally delete postgres datasources registered in Grafana, you can register them again using the following command:

# Register all pgsql databases (defined in pg_databases) as grafana datasources
./pgsql.yml -t register_grafana

How to re-register node HAProxy admin pages to Nginx?

If you accidentally delete the registered haproxy proxy settings in /etc/nginx/conf.d/haproxy, you can restore them using the following command:

./node.yml -t register_nginx     # Register all haproxy admin page proxy settings to nginx on infra nodes

How to restore DNS registration records in DNSMASQ?

PGSQL cluster/instance domains are registered by default to /etc/hosts.d/<name> on infra nodes. You can restore them using the following command:

./pgsql.yml -t pg_dns    # Register pg DNS names to dnsmasq on infra nodes

How to expose new upstream services via Nginx?

Although you can access services directly via IP:Port, we still recommend consolidating access entry points by using domain names and accessing various WebUI services through Nginx proxy. This helps consolidate access, reduce exposed ports, and facilitate access control and auditing.

If you want to expose new WebUI services through the Nginx portal, you can add service definitions to the infra_portal parameter. For example, here’s the Infra portal configuration used by Pigsty’s official demo, exposing several additional services:

infra_portal:
  home         : { domain: home.pigsty.cc }
  grafana      : { domain: demo.pigsty.io ,endpoint: "${admin_ip}:3000" ,websocket: true }
  prometheus   : { domain: p.pigsty.cc ,endpoint: "${admin_ip}:8428" }
  alertmanager : { domain: a.pigsty.cc ,endpoint: "${admin_ip}:9059" }
  blackbox     : { endpoint: "${admin_ip}:9115" }
  vmalert      : { endpoint: "${admin_ip}:8880" }
  # Additional web portals
  minio        : { domain: sss.pigsty  ,endpoint: "${admin_ip}:9001" ,scheme: https ,websocket: true }
  postgrest    : { domain: api.pigsty.cc  ,endpoint: "127.0.0.1:8884"   }
  pgadmin      : { domain: adm.pigsty.cc  ,endpoint: "127.0.0.1:8885"   }
  pgweb        : { domain: cli.pigsty.cc  ,endpoint: "127.0.0.1:8886"   }
  bytebase     : { domain: ddl.pigsty.cc  ,endpoint: "127.0.0.1:8887"   }
  gitea        : { domain: git.pigsty.cc  ,endpoint: "127.0.0.1:8889"   }
  wiki         : { domain: wiki.pigsty.cc ,endpoint: "127.0.0.1:9002"   }
  noco         : { domain: noco.pigsty.cc ,endpoint: "127.0.0.1:9003"   }
  supa         : { domain: supa.pigsty.cc ,endpoint: "127.0.0.1:8000", websocket: true }

After completing the Nginx upstream service definition, use the following configuration and commands to register new services to Nginx.

./infra.yml -t nginx_config           # Regenerate Nginx configuration files
./infra.yml -t nginx_launch           # Update and apply Nginx configuration

# You can also manually reload Nginx config with Ansible
ansible infra -b -a 'nginx -s reload'  # Reload Nginx config

If you want HTTPS access, you must delete files/pki/csr/pigsty.csr and files/pki/nginx/pigsty.{key,crt} to force regeneration of Nginx SSL/TLS certificates to include new upstream domains. If you want to use certificates issued by an authoritative CA instead of Pigsty self-signed CA certificates, you can place them in the /etc/nginx/conf.d/cert/ directory and modify the corresponding configuration: /etc/nginx/conf.d/<name>.conf.


How to manually add upstream repo files to nodes?

Pigsty has a built-in wrapper script bin/repo-add that calls the ansible playbook node.yml to add repo files to corresponding nodes.

bin/repo-add <selector> [modules]
bin/repo-add 10.10.10.10           # Add node repo for node 10.10.10.10
bin/repo-add infra   node,infra    # Add node and infra repos for infra group
bin/repo-add infra   node,local    # Add node repo and local pigsty repo for infra group
bin/repo-add pg-test node,pgsql    # Add node and pgsql repos for pg-test group

8 - Administration

Infrastructure components and INFRA cluster administration SOP: create, destroy, scale out, scale in, certificates, repositories…

This section covers daily administration and operations for Pigsty deployments.


Create INFRA Module

Use infra.yml playbook to install INFRA module on infra group:

./infra.yml     # Install INFRA module on infra group

Uninstall INFRA Module

Use dedicated infra-rm.yml playbook to remove INFRA module from infra group:

./infra-rm.yml  # Remove INFRA module from infra group

Manage Local Repository

Pigsty includes local yum/apt repo for software packages. Manage repo configuration:

Repo Variables

VariableDescription
repo_enabledEnable local repo on node
repo_upstreamUpstream repos to include
repo_removeRemove upstream repos if true
repo_url_pkgExtra packages to download
repo_cleanClean repo cache (makecache)
repo_pkgPackages to include

Repo Tasks

./infra.yml -t repo              # Create or update repo

Repo location: /www/pigsty served by Nginx.

More: Configuration: INFRA - REPO

8.1 - Ansible

Using Ansible to run administration commands

Ansible is installed by default on all INFRA nodes and can be used to manage the entire deployment.

Pigsty implements automation based on Ansible, following the Infrastructure-as-Code philosophy.

Ansible knowledge is useful for managing databases and infrastructure, but not required. You only need to know how to execute Playbooks - YAML files that define a series of automated tasks.


Installation

Pigsty automatically installs ansible and its dependencies during the bootstrap process. For manual installation, use the following commands:

# Debian / Ubuntu
sudo apt install -y ansible python3-jmespath

# EL 10
sudo dnf install -y ansible python-jmespath

# EL 8/9
sudo dnf install -y ansible python3.12-jmespath

# EL 7
sudo yum install -y ansible python-jmespath

macOS

macOS users can install using Homebrew:

brew install ansible
pip3 install jmespath

Basic Usage

To run a playbook, simply execute ./path/to/playbook.yml. Here are the most commonly used Ansible command-line parameters:

PurposeParameterDescription
Where-l / --limit <pattern>Limit target hosts/groups/patterns
What-t / --tags <tags>Only run tasks with specified tags
How-e / --extra-vars <vars>Pass extra command-line variables
Config-i / --inventory <path>Specify inventory file path

Limiting Hosts

Use -l|--limit <pattern> to limit execution to specific groups, hosts, or patterns:

./node.yml                      # Execute on all nodes
./pgsql.yml -l pg-test          # Only execute on pg-test cluster
./pgsql.yml -l pg-*             # Execute on all clusters starting with pg-
./pgsql.yml -l 10.10.10.10      # Only execute on specific IP host

Running playbooks without host limits can be very dangerous! By default, most playbooks execute on all hosts. Use with caution!


Limiting Tasks

Use -t|--tags <tags> to only execute task subsets with specified tags:

./infra.yml -t repo           # Only execute tasks to create local repo
./infra.yml -t repo_upstream  # Only execute tasks to add upstream repos
./node.yml -t node_pkg        # Only execute tasks to install node packages
./pgsql.yml -t pg_hba         # Only execute tasks to render pg_hba.conf

Passing Variables

Use -e|--extra-vars <key=value> to override variables at runtime:

./pgsql.yml -e pg_clean=true         # Force clean existing PG instances
./pgsql-rm.yml -e pg_rm_pkg=false    # Keep packages when uninstalling
./node.yml -e '{"node_tune":"tiny"}' # Pass variables in JSON format
./pgsql.yml -e @/path/to/config.yml  # Load variables from YAML file

Specifying Inventory

By default, Ansible uses pigsty.yml in the current directory as the inventory. Use -i|--inventory <path> to specify a different config file:

./pgsql.yml -i files/pigsty/full.yml -l pg-test

[!NOTE]

To permanently change the default config file path, modify the inventory parameter in ansible.cfg.

8.2 - Playbooks

Built-in Ansible playbooks in Pigsty

Pigsty uses idempotent Ansible playbooks for management and control. Running playbooks requires ansible-playbook to be in the system PATH; users must first install Ansible before executing playbooks.

Available Playbooks

ModulePlaybookPurpose
INFRAinstall.ymlOne-click Pigsty installation
INFRAinfra.ymlInitialize Pigsty infrastructure on infra nodes
INFRAinfra-rm.ymlRemove infrastructure components from infra nodes
INFRAcache.ymlCreate offline installation packages from target nodes
INFRAcert.ymlIssue certificates using Pigsty self-signed CA
NODEnode.ymlInitialize nodes, configure to desired state
NODEnode-rm.ymlRemove nodes from Pigsty
PGSQLpgsql.ymlInitialize HA PostgreSQL cluster, or add new replica
PGSQLpgsql-rm.ymlRemove PostgreSQL cluster, or remove replica
PGSQLpgsql-db.ymlAdd new business database to existing cluster
PGSQLpgsql-user.ymlAdd new business user to existing cluster
PGSQLpgsql-pitr.ymlPerform point-in-time recovery (PITR) on cluster
PGSQLpgsql-monitor.ymlMonitor remote PostgreSQL using local exporters
PGSQLpgsql-migration.ymlGenerate migration manual and scripts for PostgreSQL
PGSQLslim.ymlInstall Pigsty with minimal components
REDISredis.ymlInitialize Redis cluster/node/instance
REDISredis-rm.ymlRemove Redis cluster/node/instance
ETCDetcd.ymlInitialize ETCD cluster, or add new member
ETCDetcd-rm.ymlRemove ETCD cluster, or remove existing member
MINIOminio.ymlInitialize MinIO cluster
MINIOminio-rm.ymlRemove MinIO cluster
DOCKERdocker.ymlInstall Docker on nodes
DOCKERapp.ymlInstall applications using Docker Compose
FERRETmongo.ymlInstall Mongo/FerretDB on nodes

Deployment Strategy

The install.yml playbook orchestrates specialized playbooks in the following group order for complete deployment:

  • infra: infra.yml (-l infra)
  • nodes: node.yml
  • etcd: etcd.yml (-l etcd)
  • minio: minio.yml (-l minio)
  • pgsql: pgsql.yml

Circular Dependency Note: There is a weak circular dependency between NODE and INFRA: to register NODE to INFRA, INFRA must already exist; while INFRA module depends on NODE to work. The solution is to initialize infra nodes first, then add other nodes. To complete all deployment at once, use install.yml.


Safety Notes

Most playbooks are idempotent, which means some deployment playbooks may wipe existing databases and create new ones when protection options are not enabled. Use extra caution with pgsql, minio, and infra playbooks. Read the documentation carefully and proceed with caution.

Best Practices

  1. Read playbook documentation carefully before execution
  2. Press Ctrl-C immediately to stop when anomalies occur
  3. Test in non-production environments first
  4. Use -l parameter to limit target hosts, avoiding unintended hosts
  5. Use -t parameter to specify tags, executing only specific tasks

Dry-Run Mode

Use --check --diff options to preview changes without actually executing:

# Preview changes without execution
./pgsql.yml -l pg-test --check --diff

# Check specific tasks with tags
./pgsql.yml -l pg-test -t pg_config --check --diff

8.3 - Nginx Management

Nginx management, web portal configuration, web server, upstream services

Pigsty installs Nginx on INFRA nodes as the entry point for all web services, listening on standard ports 80/443.

In Pigsty, you can configure Nginx to provide various services through inventory:

  • Expose web interfaces for monitoring components like Grafana, VictoriaMetrics (VMUI), Alertmanager, and VictoriaLogs
  • Serve static files (software repos, documentation sites, websites, etc.)
  • Proxy custom application services (internal apps, database management UIs, Docker application interfaces, etc.)
  • Automatically issue self-signed HTTPS certificates, or use Certbot to obtain free Let’s Encrypt certificates
  • Expose services through a single port using different subdomains for unified access

Basic Configuration

Customize Nginx behavior via infra_portal parameter:

infra_portal:
  home: { domain: i.pigsty }
  grafana      : { domain: g.pigsty ,endpoint: "${admin_ip}:3000" , websocket: true }
  prometheus   : { domain: p.pigsty ,endpoint: "${admin_ip}:8428" }
  alertmanager : { domain: a.pigsty ,endpoint: "${admin_ip}:9059" }
  blackbox     : { endpoint: "${admin_ip}:9115" }
  vmalert      : { endpoint: "${admin_ip}:8880" }

Server Parameters

ParameterDescription

8.4 - Software Repository

Managing local APT/YUM software repositories

Pigsty supports creating and managing local APT/YUM software repositories for offline deployment or accelerated package installation.


Quick Start

To add packages to the local repository:

  1. Add packages to repo_packages (default packages)
  2. Add packages to repo_extra_packages (extra packages)
  3. Run the build command:
./infra.yml -t repo_build   # Build local repo from upstream
./node.yml -t node_repo     # Refresh node repository cache

Package Aliases

Pigsty predefines common package combinations for batch installation:

EL Systems (RHEL/CentOS/Rocky)

AliasDescription
node-bootstrapAnsible, Python3 tools, SSH related
infra-packageNginx, etcd, HAProxy, monitoring exporters, MinIO
pgsql-utilityPatroni, pgBouncer, pgBackRest, PG tools
pgsqlFull PostgreSQL (server, client, extensions)
pgsql-miniMinimal PostgreSQL installation

Debian/Ubuntu Systems

AliasDescription
node-bootstrapAnsible, development tools
infra-packageInfrastructure components (Debian naming)
pgsql-clientPostgreSQL client
pgsql-serverPostgreSQL server and related packages

Playbook Tasks

Main Tasks

TaskDescription
repoCreate local repo from internet or offline packages
repo_buildBuild from upstream if not exists
repo_upstreamAdd upstream repository files
repo_pkgDownload packages and dependencies
repo_createCreate/update YUM or APT repository
repo_nginxStart Nginx file server

Complete Task List

./infra.yml -t repo_dir          # Create local repository directory
./infra.yml -t repo_check        # Check if local repo exists
./infra.yml -t repo_prepare      # Use existing repo directly
./infra.yml -t repo_build        # Build repo from upstream
./infra.yml -t repo_upstream     # Add upstream repositories
./infra.yml -t repo_remove       # Delete existing repo files
./infra.yml -t repo_add          # Add repo to system directory
./infra.yml -t repo_url_pkg      # Download packages from internet
./infra.yml -t repo_cache        # Create metadata cache
./infra.yml -t repo_boot_pkg     # Install bootstrap packages
./infra.yml -t repo_pkg          # Download packages and dependencies
./infra.yml -t repo_create       # Create local repository
./infra.yml -t repo_use          # Add new repo to system
./infra.yml -t repo_nginx        # Start Nginx file server

Common Operations

Add New Packages

# 1. Configure upstream repositories
./infra.yml -t repo_upstream

# 2. Download packages and dependencies
./infra.yml -t repo_pkg

# 3. Build local repository metadata
./infra.yml -t repo_create

Refresh Node Repositories

./node.yml -t node_repo    # Refresh repository cache on all nodes

Complete Repository Rebuild

./infra.yml -t repo        # Create repo from internet or offline packages

8.5 - Domain Management

Configure local or public domain names to access Pigsty services.

Use domain names instead of IP addresses to access Pigsty’s various web services.

Quick Start

Add the following static resolution records to /etc/hosts:

10.10.10.10 i.pigsty g.pigsty a.pigsty

Replace IP address with your actual Pigsty node’s IP.


Why Use Domain Names

  • Easier to remember than IP addresses
  • Flexible pointing to different IPs
  • Unified service management through Nginx
  • Support for HTTPS encryption
  • Prevent ISP hijacking in some regions
  • Allow access to internally bound services via proxy

DNS Mechanism

  • DNS Protocol: Resolves domain names to IP addresses. Multiple domains can point to same IP.

  • HTTP Protocol: Uses Host header to route requests to different sites on same port (80/443).


Default Domains

8.6 - Module Management

INFRA module management SOP: define, create, destroy, scale out, scale in

This document covers daily management operations for the INFRA module, including installation, uninstallation, scaling, and component maintenance.


Install INFRA Module

Use the infra.yml playbook to install the INFRA module on the infra group:

./infra.yml     # Install INFRA module on infra group

Uninstall INFRA Module

Use the infra-rm.yml playbook to uninstall the INFRA module from the infra group:

./infra-rm.yml  # Uninstall INFRA module from infra group

Scale Out INFRA Module

Assign infra_seq to new nodes and add them to the infra group in the inventory:

all:
  children:
    infra:
      hosts:
        10.10.10.10: { infra_seq: 1 }  # Existing node
        10.10.10.11: { infra_seq: 2 }  # New node

Use the -l limit option to execute the playbook on the new node only:

./infra.yml -l 10.10.10.11    # Install INFRA module on new node

Manage Local Repository

Local repository management tasks:

./infra.yml -t repo              # Create repo from internet or offline packages
./infra.yml -t repo_upstream     # Add upstream repositories
./infra.yml -t repo_pkg          # Download packages and dependencies
./infra.yml -t repo_create       # Create local yum/apt repository

Complete subtask list:

./infra.yml -t repo_dir          # Create local repository directory
./infra.yml -t repo_check        # Check if local repo exists
./infra.yml -t repo_prepare      # Use existing repo directly
./infra.yml -t repo_build        # Build repo from upstream
./infra.yml -t repo_upstream     # Add upstream repositories
./infra.yml -t repo_remove       # Delete existing repo files
./infra.yml -t repo_add          # Add repo to system directory
./infra.yml -t repo_url_pkg      # Download packages from internet
./infra.yml -t repo_cache        # Create metadata cache
./infra.yml -t repo_boot_pkg     # Install bootstrap packages
./infra.yml -t repo_pkg          # Download packages and dependencies
./infra.yml -t repo_create       # Create local repository
./infra.yml -t repo_use          # Add new repo to system
./infra.yml -t repo_nginx        # Start Nginx file server

Manage Nginx

Nginx management tasks:

./infra.yml -t nginx                       # Reset Nginx component
./infra.yml -t nginx_index                 # Re-render homepage
./infra.yml -t nginx_config,nginx_reload   # Re-render config and reload

Request HTTPS certificate:

./infra.yml -t nginx_certbot,nginx_reload -e certbot_sign=true

Manage Infrastructure Components

Management commands for various infrastructure components:

./infra.yml -t infra           # Configure infrastructure
./infra.yml -t infra_env       # Configure environment variables
./infra.yml -t infra_pkg       # Install packages
./infra.yml -t infra_user      # Set up OS user
./infra.yml -t infra_cert      # Issue certificates
./infra.yml -t dns             # Configure DNSMasq
./infra.yml -t nginx           # Configure Nginx
./infra.yml -t victoria        # Configure VictoriaMetrics/Logs/Traces
./infra.yml -t alertmanager    # Configure AlertManager
./infra.yml -t blackbox        # Configure Blackbox Exporter
./infra.yml -t grafana         # Configure Grafana
./infra.yml -t infra_register  # Register to VictoriaMetrics/Grafana

Common maintenance commands:

./infra.yml -t nginx_index                        # Re-render homepage
./infra.yml -t nginx_config,nginx_reload          # Reconfigure and reload
./infra.yml -t vmetrics_config,vmetrics_launch    # Regenerate VictoriaMetrics config and restart
./infra.yml -t vlogs_config,vlogs_launch          # Update VictoriaLogs config
./infra.yml -t grafana_plugin                     # Download Grafana plugins

8.7 - CA and Certificates

Using self-signed CA or real HTTPS certificates

Pigsty uses a self-signed Certificate Authority (CA) by default for internal SSL/TLS encryption. This document covers:


Self-Signed CA

Pigsty automatically creates a self-signed CA during infrastructure initialization (infra.yml). The CA signs certificates for:

  • PostgreSQL server/client SSL
  • Patroni REST API
  • etcd cluster communication
  • MinIO cluster communication
  • Nginx HTTPS (fallback)
  • Infrastructure services

PKI Directory Structure

files/pki/
├── ca/
│   ├── ca.key                # CA private key (keep secure!)
│   └── ca.crt                # CA certificate
├── csr/                      # Certificate signing requests
│   ├── misc/                     # Miscellaneous certificates (cert.yml output)
│   ├── etcd/                     # ETCD certificates
│   ├── pgsql/                    # PostgreSQL certificates
│   ├── minio/                    # MinIO certificates
│   ├── nginx/                    # Nginx certificates
│   └── mongo/                    # FerretDB certificates
└── infra/                    # Infrastructure certificates

CA Variables

VariableDefaultDescription
ca_createtrueCreate CA if not exists, or abort
ca_cnpigsty-caCA certificate common name
cert_validity7300dDefault validity for issued certificates
VariableDefault
:—————-————–—————————————-
CA Certificate100 yearsHardcoded (36500 days)
Server/Client20 yearscert_validity (7300d)
Nginx HTTPS~1 yearnginx_cert_validity (397d)
> Note: Browser vendors limit trust to 398-day certificates. Nginx uses shorter validity for browser compatibility.

Using External CA

To use your own enterprise CA instead of auto-generated one:

1. Set ca_create: false in your configuration.

2. Place your CA files before running playbook:

mkdir -p files/pki/ca
cp /path/to/your/ca.key files/pki/ca/ca.key
cp /path/to/your/ca.crt files/pki/ca/ca.crt
chmod 600 files/pki/ca/ca.key
chmod 644 files/pki/ca/ca.crt

3. Run ./infra.yml


Backup CA Files

The CA private key is critical. Back it up securely:

# Backup with timestamp
tar -czvf pigsty-ca-$(date +%Y%m%d).tar.gz files/pki/ca/

Warning: If you lose CA private key, all certificates signed by it become unverifiable. You’ll need to regenerate everything.


Issue Certificates

Use cert.yml to issue additional certificates signed by Pigsty CA.

Basic Usage

# Issue certificate for database user (client cert)
./cert.yml -e cn=dbuser_dba

# Issue certificate for monitor user
./cert.yml -e cn=dbuser_monitor

Certificates generated in files/pki/misc/<cn>.{key,crt} by default.

Parameters

ParameterDefaultDescription
cnpigstyCommon Name (required)
san[DNS:localhost, IP:127.0.0.1]Subject Alternative Names
orgpigstyOrganization name
unitpigstyOrganizational unit name
expire7300dCertificate validity (20 years)
keyfiles/pki/misc/<cn>.keyPrivate key output path
crtfiles/pki/misc/<cn>.crtCertificate output path

Advanced Examples

# Issue certificate with custom SAN (DNS and IP)
./cert.yml -e cn=myservice -e san=DNS:myservice,IP:10.2.82.163

(File has more lines. Use ‘offset’ parameter to read beyond line 130)