1 - HA Drill: 2/3 Failure

How to recover from emergency scneraio with 2-node broken in a 3-node setup?

如果经典3节点高可用部署同时出现两台(多数主体)故障,系统通常无法自动完成故障切换,需要人工介入:

首先判断另外两台服务器的情况,如果短时间内可以拉起,优先选择拉起另外两台服务。否则进入 紧急止血流程

紧急止血流程假设您的管理节点故障,只有单台普通数据库节点存活,在这种情况下,最快的恢复操作流程为:

  • 调整 HAProxy 配置,将流量指向主库。
  • 关闭 Patroni,手动提升 PostgreSQL 从库为主库。

调整HAProxy配置

如果你通过其他方式绕开 HAProxy 访问集群,那么可以跳过这一步。 如果你通过 HAProxy 方式访问数据库集群,那么你需要调整负载均衡配置,将读写流量手工指向主库。

  • 编辑 /etc/haproxy/<pg_cluster>-primary.cfg 配置文件,其中 <pg_cluster> 为你的 PostgreSQL 集群名称,例如 pg-meta
  • 将健康检查配置选项注释,停止进行健康鉴擦好
  • 将服务器列表中,其他两台故障的机器注释掉,只保留当前主库服务器。
listen pg-meta-primary
    bind *:5433
    mode tcp
    maxconn 5000
    balance roundrobin

    # 注释掉以下四行健康检查配置
    #option httpchk                               # <---- remove this
    #option http-keep-alive                       # <---- remove this
    #http-check send meth OPTIONS uri /primary    # <---- remove this
    #http-check expect status 200                 # <---- remove this

    default-server inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100
    server pg-meta-1 10.10.10.10:6432 check port 8008 weight 100

    # 注释掉其他两台故障的机器
    #server pg-meta-2 10.10.10.11:6432 check port 8008 weight 100 <---- comment this
    #server pg-meta-3 10.10.10.12:6432 check port 8008 weight 100 <---- comment this

配置调整完成后,先不着急执行 systemctl reload haproxy 重载生效,等待后续主库提升后一起执行。 以上配置的效果是,HAProxy 将不再进行主库健康检查(默认使用 Patroni),而是直接将写入流量指向当前主库


手工提升备库

登陆目标服务器,切换至 dbsu 用户,执行 CHECKPOINT 刷盘后,关闭 Patroni,重启 PostgreSQL 并执行 Promote。

sudo su - postgres                     # 切换到数据库 dbsu 用户
psql -c 'checkpoint; checkpoint;'      # 两次 Checkpoint 刷脏页,避免PG后重启耗时过久
sudo systemctl stop patroni            # 关闭 Patroni
pg-restart                             # 重新拉起 PostgreSQL
pg-promote                             # 将 PostgreSQL 从库提升为主库
psql -c 'SELECT pg_is_in_recovery();'  # 如果结果为 f,表示已经提升为主库

如果你上面调整了 HAProxy 配置,那么现在可以执行 systemctl reload haproxy 重载 HAProxy 配置,将流量指向新的主库。

systemctl reload haproxy                # 重载 HAProxy 配置,将写入流量指向当前实例

避免脑裂

紧急止血后,第二优先级问题为:避免脑裂。用户应当防止另外两台服务器重新上线后,与当前主库形成脑裂,导致数据不一致。

简单的做法是:

  • 将另外两台服务器直接 断电/断网,确保它们不会在不受控的情况下再次上线。
  • 调整应用使用的数据库连接串,将其 HOST 直接指向唯一幸存服务器上的主库。

然后应当根据具体情况,决定下一步的操作:

  • A:这两台服务器是临时故障(比如断网断电),可以原地修复后继续服务
  • B:这两台故障服务器是永久故障(比如硬件损坏),将移除并下线。

临时故障后的复原

如果另外两台服务器是临时故障,可以修复后继续服务,那么可以按照以下步骤进行修复与重建:

  • 每次处理一台故障服务器,优先处理 管理节点 / INFRA 管理节点
  • 启动故障服务器,并在启动后关停 Patroni

ETCD 集群在法定人数恢复后,将恢复工作,此时可以启动幸存服务器(当前主库)上的 Patroni,接管现有 PostgreSQL,并重新获取集群领导者身份。 Patroni 启动后进入维护模式。

systemctl restart patroni
pg pause <pg_cluster>

在另外两台实例上以 postgres 用户身份创建 touch /pg/data/standby.signal 标记文件将其标记为从库,然后拉起 Patroni:

systemctl restart patroni

确认 Patroni 集群身份/角色正常后,退出维护模式:

pg resume <pg_cluster>

永久故障后的复原

出现永久故障后,首先需要恢复管理节点上的 ~/pigsty 目录,主要是需要 pigsty.ymlfiles/pki/ca/ca.key 两个核心文件。

如果您无法取回或没有备份这两个文件,您可以选择部署一套新的 Pigsty,并通过 备份集群 的方式将现有集群迁移至新部署中。

请定期备份 pigsty 目录(例如使用 Git 进行版本管理)。建议吸取教训,下次不要犯这样的错误。

配置修复

您可以将幸存的节点作为新的管理节点,将 ~/pigsty 目录拷贝到新的管理节点上,然后开始调整配置。 例如,将原本默认的管理节点 10.10.10.10 替换为幸存节点 10.10.10.12

all:
  vars:
    admin_ip: 10.10.10.12               # 使用新的管理节点地址
    node_etc_hosts: [10.10.10.12 h.pigsty a.pigsty p.pigsty g.pigsty sss.pigsty]
    infra_portal: {}                    # 一并修改其他引用旧管理节点 IP (10.10.10.10) 的配置

  children:

    infra:                              # 调整 Infra 集群
      hosts:
        # 10.10.10.10: { infra_seq: 1 } # 老的 Infra 节点
        10.10.10.12: { infra_seq: 3 }   # 新增 Infra 节点

    etcd:                               # 调整 ETCD 集群
      hosts:
        #10.10.10.10: { etcd_seq: 1 }   # 注释掉此故障节点
        #10.10.10.11: { etcd_seq: 2 }   # 注释掉此故障节点
        10.10.10.12: { etcd_seq: 3 }    # 保留幸存节点
      vars:
        etcd_cluster: etcd

    pg-meta:                            # 调整 PGSQL 集群配置
      hosts:
        #10.10.10.10: { pg_seq: 1, pg_role: primary }
        #10.10.10.11: { pg_seq: 2, pg_role: replica }
        #10.10.10.12: { pg_seq: 3, pg_role: replica , pg_offline_query: true }
        10.10.10.12: { pg_seq: 3, pg_role: primary , pg_offline_query: true }
      vars:
        pg_cluster: pg-meta

ETCD修复

然后执行以下命令,将 ETCD 重置为单节点集群:

./etcd.yml -e etcd_safeguard=false -e etcd_clean=true

根据 ETCD重载配置 的说明,调整对 ETCD Endpoint 的引用。

INFRA修复

如果幸存节点上没有 INFRA 模块,请在当前节点上配置新的 INFRA 模块并安装。执行以下命令,将 INFRA 模块部署到幸存节点上:

./infra.yml -l 10.10.10.12

修复当前节点的监控

./node.yml -t node_monitor

PGSQL修复

./pgsql.yml -t pg_conf                            # 重新生成 PG 配置文件
systemctl reload patroni                          # 在幸存节点上重载 Patroni 配置

各模块修复后,您可以参考标准扩容流程,将新的节点加入集群,恢复集群的高可用性。




2 - Nginx: Expose Web Service

How to expose, proxy, and forward web services using Nginx?

Pigsty will install Nginx on INFRA Node, as a Web service proxy.

Nginx is the access entry for all WebUI services of Pigsty, and it defaults to the use the 80/443 port on INFRA nodes.

Pigsty provides a global parameter infra_portal to configure Nginx proxy rules and corresponding upstream services.

infra_portal:  # domain names and upstream servers
  home         : { domain: h.pigsty }
  grafana      : { domain: g.pigsty ,endpoint: "${admin_ip}:3000" , websocket: true }
  prometheus   : { domain: p.pigsty ,endpoint: "${admin_ip}:9090" }
  alertmanager : { domain: a.pigsty ,endpoint: "${admin_ip}:9093" }
  blackbox     : { endpoint: "${admin_ip}:9115" }
  loki         : { endpoint: "${admin_ip}:3100" }
  #minio        : { domain: sss.pigsty  ,endpoint: "${admin_ip}:9001" ,scheme: https ,websocket: true }

If you access Nginx directly through the ip:port, it will route to h.pigsty, which is the Pigsty homepage (/www/ directory, served as software repo).

Because Nginx provides multiple services through the same port, it must be distinguished by the domain name (HOST header by the browser). Therefore, by default, Nginx only exposes services with the domain parameter.

And Pigsty will expose grafana, prometheus, and alertmanager services by default in addition to the home server.


How to configure nginx upstream?

Pigsty has a built-in configuration template full.yml, could be used as a reference, and also exposes some Web services in addition to the default services.

    infra_portal:                     # domain names and upstream servers
      home         : { domain: h.pigsty }
      grafana      : { domain: g.pigsty ,endpoint: "${admin_ip}:3000" , websocket: true }
      prometheus   : { domain: p.pigsty ,endpoint: "${admin_ip}:9090" }
      alertmanager : { domain: a.pigsty ,endpoint: "${admin_ip}:9093" }
      blackbox     : { endpoint: "${admin_ip}:9115" }
      loki         : { endpoint: "${admin_ip}:3100" }
      
      minio        : { domain: sss.pigsty  ,endpoint: "${admin_ip}:9001" ,scheme: https ,websocket: true }
      postgrest    : { domain: api.pigsty  ,endpoint: "127.0.0.1:8884" }
      pgadmin      : { domain: adm.pigsty  ,endpoint: "127.0.0.1:8885" }
      pgweb        : { domain: cli.pigsty  ,endpoint: "127.0.0.1:8886" }
      bytebase     : { domain: ddl.pigsty  ,endpoint: "127.0.0.1:8887" }
      jupyter      : { domain: lab.pigsty  ,endpoint: "127.0.0.1:8888", websocket: true }
      gitea        : { domain: git.pigsty  ,endpoint: "127.0.0.1:8889" }
      wiki         : { domain: wiki.pigsty ,endpoint: "127.0.0.1:9002" }
      noco         : { domain: noco.pigsty ,endpoint: "127.0.0.1:9003" }
      supa         : { domain: supa.pigsty ,endpoint: "10.10.10.10:8000", websocket: true }

Each record in infra_portal is a key-value pair, where the key is the name of the service, and the value is a dictionary. Currently, there are four available configuration items in the configuration dictionary:

  • endpoint: REQUIRED, specifies the address of the upstream service, which can be IP:PORT or DOMAIN:PORT.
    • In this parameter, you can use the placeholder ${admin_ip}, and Pigsty will fill in the value of admin_ip.
  • domain: OPTIONAL, specifies the domain name of the proxy. If not filled in, Nginx will not expose this service.
    • For services that need to know the endpoint address but do not want to expose them (such as Loki, Blackbox Exporter), you can leave the domain blank.
  • scheme: OPTIONAL, specifies the protocol (http/https) when forwarding, leave it blank to default to http.
    • For services that require HTTPS access (such as the MinIO management interface), you need to specify scheme: https.
  • websocket: OPTIONAL, specifies whether to enable WebSocket, leave it blank to default to off.
    • Services that require WebSocket (such as Grafana, Jupyter) need to be set to true to work properly.

If you need to add a new Web service exposed by Nginx, you need to add the corresponding record in the infra_portal parameter in the pigsty.yml file, and then execute the playbook to take effect:

./infra.yml -t nginx           # Tune nginx into desired state

To avoid service interruption, you can also execute the following tasks separately:

./infra.yml -t nginx_config    # re-generate nginx upstream config in /etc/nginx/conf.d
./infra.yml -t nginx_cert      # re-generate nginx ssl cert to include new domain names
nginx -s reload                # online reload nginx configuration

Or the simple way:

Nginx related configuration parameters are located at: Parameters: INFRA - NGINX


How to access via domain names?

Nginx distinguishes between different services using the domain name in the HOST header set by the browser. Thus, by default, except for the software repository, you need to access services via domain name.

You can directly access these services via IP address + port, but we recommend accessing all components through Nginx on ports 80/443 using domain names.

When accessing the Pigsty WebUI via domain name, you need to configure DNS resolution or modify the local /etc/hosts file for static resolution. There are several typical methods:

  • If your service needs to be publicly accessible, you should resolve the internet domain name via a DNS provider (Cloudflare, Aliyun DNS, etc.). Note that in this case, you usually need to modify the Pigsty infra_portalparameter, as the default *.pigsty is not suitable for public use.
  • If your service needs to be shared within an office network, you should resolve the internal domain name via an internal DNS provider (company internal DNS server) and point it to the IP of the Nginx server. You can request the network administrator to add the appropriate resolution records in the internal DNS server, or ask the system users to manually configure static DNS resolution records.
  • If your service is only for personal use or a few users (e.g., DBA), you can ask these users to use static domain name resolution records. On Linux/MacOS systems, modify the /etc/hosts file (requires sudo permissions) or C:\Windows\System32\drivers\etc\hosts (Windows) file.

We recommend ordinary single-machine users use the third method, adding the following resolution records on the machine used to access the web system via a browser:

<your_public_ip_address>  h.pigsty a.pigsty p.pigsty g.pigsty

The IP address here is the public IP address where the Pigsty service is installed, and then you can access Pigsty subsystems in the browser via a domain like: http://g.pigsty.

Other web services and custom domains can be added similarly. For example, the following are possible domain resolution records for the Pigsty sandbox demo:

10.10.10.10 h.pigsty a.pigsty p.pigsty g.pigsty
10.10.10.10 api.pigsty ddl.pigsty adm.pigsty cli.pigsty lab.pigsty
10.10.10.10 supa.pigsty noco.pigsty odoo.pigsty dify.pigsty

How to access via HTTPS?

If nginx_sslmode is set to enabled or enforced, you can trust self-signed ca: files/pki/ca/ca.crt to use https in your browser.

Pigsty will generate self-signed certs for Nginx, if you wish to access via HTTPS without “Warning”, here are some options:

  • Apply & add real certs from trusted CA: such as Let’s Encrypt
  • Trust your generated CA crt as root ca in your OS and browser
  • Type thisisunsafe in Chrome will supress the warning

You can access these web UI directly via IP + port. While the common best practice would be access them through Nginx and distinguish via domain names. You’ll need configure DNS records, or use the local static records (/etc/hosts) for that.


How to access Pigsty Web UI by domain name?

There are several options:

  1. Resolve internet domain names through a DNS service provider, suitable for systems accessible from the public internet.
  2. Configure internal network DNS server resolution records for internal domain name resolution.
  3. Modify the local machine’s /etc/hosts file to add static resolution records. (For Windows, it’s located at:)

We recommend the third method for common users. On the machine (which runs the browser), add the following record into /etc/hosts (sudo required) or C:\Windows\System32\drivers\etc\hosts in Windows:

<your_public_ip_address>  h.pigsty a.pigsty p.pigsty g.pigsty

You have to use the external IP address of the node here.


How to configure server side domain names?

The server-side domain name is configured with Nginx. If you want to replace the default domain name, simply enter the domain you wish to use in the parameter infra_portal. When you access the Grafana monitoring homepage via http://g.pigsty, it is actually accessed through the Nginx proxy to Grafana’s WebUI:

http://g.pigsty ️-> http://10.10.10.10:80 (nginx) -> http://10.10.10.10:3000 (grafana)

3 - Docker: Setup & Proxy

How to configure container support in Pigsty? and how to configure proxy for DockerHub?

Pigsty has a DOCKER module, which provides a set of playbooks to install and manage Docker on the target nodes.

This document will guide you through how to enable Docker support in Pigsty, and how to configure a proxy server for DockerHub.


Install Docker

To install docker on specified nodes, you can use the docker.yml playbook:

./docker.yml -l <ip|group|cls>

That’s it.


Proxy 101

Assuming you have a working HTTP(s) proxy server, if you wish to connect to docker hub or other registry sites via the proxy server:

You proxy service should provide you with something like

  • http://<ip|domain>:<port> | https://[user]:[pass]@<ip|domain>:<port>

For example, if you have a proxy server configuring like this:

export ALL_PROXY=http://192.168.0.106:8118
export HTTP_PROXY=http://192.168.0.106:8118
export HTTPS_PROXY=http://192.168.0.106:8118
export NO_PROXY="localhost,127.0.0.1,10.0.0.0/8,192.168.0.0/16,*.pigsty,*.aliyun.com,mirrors.*,*.myqcloud.com,*.tsinghua.edu.cn"

You can check the proxy server by using the curl command, for example:

curl -x http://192.168.0.106:8118 -I http://www.google.com

How to configure proxy for Docker Daemon?

If you wish to use a proxy server when Docker pulls images, you should specify the proxy_env parameter in the global variables of the pigsty.yml configuration file:

all:
  vars:
    proxy_env:                        # global proxy env when downloading packages
      no_proxy: "localhost,127.0.0.1,10.0.0.0/8,192.168.0.0/16,*.pigsty,*.aliyun.com,mirrors.*,*.myqcloud.com,*.tsinghua.edu.cn"
      http_proxy: http://192.168.0.106:8118
      all_proxy: http://192.168.0.106:8118
      https_proxy: http://192.168.0.106:8118

And when the Docker playbook is executed, these configurations will be rendered as proxy configurations in /etc/docker/daemon.json:

{
  "proxies": {
    "http-proxy": "{{ proxy_env['http_proxy'] }}",
    "https-proxy": "{{ proxy_env['http_proxy'] }}",
    "no-proxy": "{{ proxy_env['no_proxy'] }}"
  }
}

Please note that Docker Daemon does not use the all_proxy parameter

If you wish to manually specify a proxy server, you can directly modify the proxies configuration in /etc/docker/daemon.json;

Or you can modify the service definition in /lib/systemd/system/docker.service (Debian/Ubuntu) and /usr/lib/systemd/system/docker.service to add environment variable declarations in the [Service] section

[Service]
Environment="HTTP_PROXY=http://192.168.0.106:8118"
Environment="HTTPS_PROXY=http://192.168.0.106:8118"
Environment="NO_PROXY=localhost,127.0.0.1,10.0.0.0/8,192.168.0.0/16,*.pigsty,*.aliyun.com,mirrors.*,*.myqcloud.com,*.tsinghua.edu.cn"

And restart dockerd service to take effect:

systemctl restart docker

How to use other registry?

You can specify other registry sites in the docker_registry_mirrors parameter.

It may look like this:

[ "https://mirror.ccs.tencentyun.com" ]         # tencent cloud mirror, intranet only
["https://registry.cn-hangzhou.aliyuncs.com"]   # aliyun cloud mirror, login required

You can also log in to other mirror sites, such as quay.io, by executing:

docker login quay.io
username #> # input your username
password #> # input your password

4 - Use PostgreSQL as Ansible Config Inventory CMDB

Use PostgreSQL instead of static YAML config file as Ansible config inventory

You can use PostgreSQL as a configuration source for Pigsty, replacing the static YAML configuration file.

There are some advantages to using CMDB as a dynamic inventory: metadata is presented in a highly structured way in the form of data tables, and consistency is ensured through database constraints. At the same time, using CMDB allows you to use third-party tools to edit and manage Pigsty metadata, making it easier to integrate with external systems.


Ansible Inventory

Pigsty’s default configuration file path is specified in ansible.cfg as inventory = pigsty.yml.

Changing this parameter will change the default configuration file path used by Ansible. If you point it to an executable script file, Ansible will use the dynamic inventory mechanism, execute the script, and expect the script to return a configuration file.

Using CMDB is implemented by editing the ansible.cfg in the Pigsty directory:

---
inventory = pigsty.yml
+++
inventory = inventory.sh

And the inventory.sh is a simple script that generates equivalent YAML/JSON configuration files from the records in the PostgreSQL CMDB.

You can use bin/inventory_cmdb to switch to CMDB inventory, and use bin/inventory_conf to switch back to the local YAML configuration file.

You may also need bin/inventory_load to load the YAML configuration file into the CMDB.


Load Configuration

Pigsty will init a CMDB (optional) on the default pg-meta cluster’s meta database, under pigsty schema.

the CMDB is available only after infra.yml is fully executed on the admin node.

You can load YAML config file into CMDB with bin/inventory_load:

usage: inventory_load [-h] [-p PATH] [-d CMDB_URL]

load config arguments

optional arguments:
  -h, --help            show this help message and exit„
  -p PATH, --path PATH  config path, ${PIGSTY_HOME}/pigsty.yml by default
  -d DATA, --data DATA  postgres cmdb pgurl, ${METADB_URL} by default

If you run bin/inventory_load without any arguments, it will load the default pigsty.yml into the default CMDB.

bin/inventory_load
bin/inventory_load -p conf/demo.yml
bin/inventory_load -p conf/prod.yml -d postgresql://dbuser_meta:[email protected]:5432/meta

You can switch to dynamic CMDB inventory with:

bin/inventory_cmdb

To switch back to local YAML config file:

bin/inventory_conf

5 - Use PG as Grafana Backend

Use PostgreSQL instead of default SQLite, as the backend storage for Grafana

You can use postgres as the database used by the Grafana backend.

In this tutorial, you will learn about the following.


TL; DR

vi pigsty.yml   # uncomment user/db definition: dbuser_grafana  grafana 
bin/pgsql-user  pg-meta  dbuser_grafana
bin/pgsql-db    pg-meta  grafana

psql postgres://dbuser_grafana:DBUser.Grafana@meta:5436/grafana -c \
  'CREATE TABLE t(); DROP TABLE t;'    # check pgurl connectivity
  
vi /etc/grafana/grafana.ini            # edit [database] section: type & url
systemctl restart grafana-server

Create Postgres Cluster

We can define a new database grafana on pg-meta. A Grafana-specific database cluster can also be created on a new machine node: pg-grafana.

Define Cluster

To create a new dedicated database cluster pg-grafana on two bare nodes 10.10.10.11, 10.10.10.12, define it in the config file.

pg-grafana: 
  hosts: 
    10.10.10.11: {pg_seq: 1, pg_role: primary}
    10.10.10.12: {pg_seq: 2, pg_role: replica}
  vars:
    pg_cluster: pg-grafana
    pg_databases:
      - name: grafana
        owner: dbuser_grafana
        revokeconn: true
        comment: grafana primary database
    pg_users:
      - name: dbuser_grafana
        password: DBUser.Grafana
        pgbouncer: true
        roles: [dbrole_admin]
        comment: admin user for grafana database

Create Cluster

Complete the creation of the database cluster pg-grafana with the following command: pgsql.yml.

bin/createpg pg-grafana # Initialize the pg-grafana cluster

This command calls Ansible Playbook pgsql.yml to create the database cluster.

. /pgsql.yml -l pg-grafana # The actual equivalent Ansible playbook command executed 

The business users and databases defined in pg_users and pg_databases are created automatically when the cluster is initialized. After creating the cluster using this configuration, the following connection string access database can be used.

postgres://dbuser_grafana:[email protected]:5432/grafana # direct connection to the primary
postgres://dbuser_grafana:[email protected]:5436/grafana # direct connection to the default service
postgres://dbuser_grafana:[email protected]:5433/grafana # Connect to the string read/write service

postgres://dbuser_grafana:[email protected]:5432/grafana # direct connection to the primary
postgres://dbuser_grafana:[email protected]:5436/grafana # Direct connection to default service
postgres://dbuser_grafana:[email protected]:5433/grafana # Connected string read/write service

By default, Pigsty is installed on a single meta node. Then the required users and databases for Grafana are created on the existing pg-meta database cluster instead of using the pg-grafana cluster.


Create Biz User

The convention for business object management is to create users first and then create the database.

Define User

To create a user dbuser_grafana on a pg-meta cluster, add the following user definition to pg-meta’s cluster definition.

Add location: all.children.pg-meta.vars.pg_users.

- name: dbuser_grafana
  password: DBUser.Grafana
  comment: admin user for grafana database
  pgbouncer: true
  roles: [ dbrole_admin ]

If you have defined a different password here, replace the corresponding parameter with the new password.

Create User

Complete the creation of the dbuser_grafana user with the following command.

bin/pgsql-user pg-meta dbuser_grafana # Create the `dbuser_grafana` user on the pg-meta cluster

Calls Ansible Playbook pgsql-user.yml to create the user

. /pgsql-user.yml -l pg-meta -e pg_user=dbuser_grafana # Ansible

The dbrole_admin role has the privilege to perform DDL changes in the database, which is precisely what Grafana needs.


Create Biz Database

Define database

Create business databases in the same way as business users. First, add the definition of the new database grafana to the cluster definition of pg-meta.

Add location: all.children.pg-meta.vars.pg_databases.

- { name: grafana, owner: dbuser_grafana, revokeconn: true }

Create database

Use the following command to complete the creation of the grafana database.

bin/pgsql-db pg-meta grafana # Create the `grafana` database on the `pg-meta` cluster

Calls Ansible Playbook pgsql-db.yml to create the database.

. /pgsql-db.yml -l pg-meta -e pg_database=grafana # The actual Ansible playbook to execute

Access Database

Check Connectivity

You can access the database using different services or access methods.

postgres://dbuser_grafana:DBUser.Grafana@meta:5432/grafana # Direct connection
postgres://dbuser_grafana:DBUser.Grafana@meta:5436/grafana # default service
postgres://dbuser_grafana:DBUser.Grafana@meta:5433/grafana # primary service

We will use the default service that accesses the database directly from the primary through the LB.

First, check if the connection string is reachable and if you have privileges to execute DDL commands.

psql postgres://dbuser_grafana:DBUser.Grafana@meta:5436/grafana -c \
  'CREATE TABLE t(); DROP TABLE t;'

Config Grafana

For Grafana to use the Postgres data source, you need to edit /etc/grafana/grafana.ini and modify the config entries.

[database]
;type = sqlite3
;host = 127.0.0.1:3306
;name = grafana
;user = root
# If the password contains # or ; you have to wrap it with triple quotes. Ex """#password;"""
;password =
;url =

Change the default config entries.

[database]
type = postgres
url = postgres://dbuser_grafana:DBUser.Grafana@meta/grafana

Subsequently, restart Grafana.

systemctl restart grafana-server

See from the monitor system that the new grafana database is already active, then Grafana has started using Postgres as the primary backend database. However, the original Dashboards and Datasources in Grafana have disappeared. You need to re-import Dashboards and Postgres Datasources.


Manage Dashboard

You can reload the Pigsty monitor dashboard by going to the files/ui dir in the Pigsty dir using the admin user and executing grafana.py init.

cd ~/pigsty/files/ui
. /grafana.py init # Initialize the Grafana monitor dashboard using the Dashboards in the current directory

Execution results in:

vagrant@meta:~/pigsty/files/ui
$ ./grafana.py init
Grafana API: admin:pigsty @ http://10.10.10.10:3000
init dashboard : home.json
init folder pgcat
init dashboard: pgcat / pgcat-table.json
init dashboard: pgcat / pgcat-bloat.json
init dashboard: pgcat / pgcat-query.json
init folder pgsql
init dashboard: pgsql / pgsql-replication.json
init dashboard: pgsql / pgsql-table.json
init dashboard: pgsql / pgsql-activity.json
init dashboard: pgsql / pgsql-cluster.json
init dashboard: pgsql / pgsql-node.json
init dashboard: pgsql / pgsql-database.json
init dashboard: pgsql / pgsql-xacts.json
init dashboard: pgsql / pgsql-overview.json
init dashboard: pgsql / pgsql-session.json
init dashboard: pgsql / pgsql-tables.json
init dashboard: pgsql / pgsql-instance.json
init dashboard: pgsql / pgsql-queries.json
init dashboard: pgsql / pgsql-alert.json
init dashboard: pgsql / pgsql-service.json
init dashboard: pgsql / pgsql-persist.json
init dashboard: pgsql / pgsql-proxy.json
init dashboard: pgsql / pgsql-query.json
init folder pglog
init dashboard: pglog / pglog-instance.json
init dashboard: pglog / pglog-analysis.json
init dashboard: pglog / pglog-session.json

This script detects the current environment (defined at ~/pigsty during installation), gets Grafana access information, and replaces the URL connection placeholder domain name (*.pigsty) in the monitor dashboard with the real one in use.

export GRAFANA_ENDPOINT=http://10.10.10.10:3000
export GRAFANA_USERNAME=admin
export GRAFANA_PASSWORD=pigsty

export NGINX_UPSTREAM_YUMREPO=yum.pigsty
export NGINX_UPSTREAM_CONSUL=c.pigsty
export NGINX_UPSTREAM_PROMETHEUS=p.pigsty
export NGINX_UPSTREAM_ALERTMANAGER=a.pigsty
export NGINX_UPSTREAM_GRAFANA=g.pigsty
export NGINX_UPSTREAM_HAPROXY=h.pigsty

As a reminder, using grafana.py clean will clear the target monitor dashboard, and using grafana.py load will load all the monitor dashboards in the current dir. When Pigsty’s monitor dashboard changes, you can use these two commands to upgrade all the monitor dashboards.


Manage DataSources

When creating a new PostgreSQL cluster with pgsql.yml or a new business database with pgsql-db.yml, Pigsty will register the new PostgreSQL data source in Grafana, and you can access the target database instance directly through Grafana using the default admin user. Most of the functionality of the application pgcat relies on this.

To register a Postgres database, you can use the register_grafana task in pgsql.yml.

./pgsql.yml -t register_grafana # Re-register all Postgres data sources in the current environment
./pgsql.yml -t register_grafana -l pg-test # Re-register all the databases in the pg-test cluster

Update Grafana Database

You can directly change the backend data source used by Grafana by modifying the Pigsty config file. Edit the grafana_database and grafana_pgurl parameters in pigsty.yml and change them.

grafana_database: postgres
grafana_pgurl: postgres://dbuser_grafana:DBUser.Grafana@meta:5436/grafana

Then re-execute the grafana task in infral.yml to complete the Grafana upgrade.

./infra.yml -t grafana

6 - Use PG as Prometheus Backend

Persist prometheus metrics with PostgreSQL + TimescaleDB through Promscale

It is not recommended to use PostgreSQL as a backend for Prometheus, but it is a good opportunity to understand the Pigsty deployment system.


Postgres Preparation

vi pigsty.yml # dbuser_prometheus  prometheus

pg_databases:                           # define business users/roles on this cluster, array of user definition
  - { name: prometheus, owner: dbuser_prometheus , revokeconn: true, comment: prometheus primary database }
pg_users:                           # define business users/roles on this cluster, array of user definition
  - {name: dbuser_prometheus , password: DBUser.Prometheus ,pgbouncer: true , createrole: true,  roles: [dbrole_admin], comment: admin user for prometheus database }

Create the database and user:

bin/pgsql-user  pg-meta  dbuser_prometheus
bin/pgsql-db    pg-meta  prometheus

Check the connection string:

psql postgres://dbuser_prometheus:[email protected]:5432/prometheus -c 'CREATE EXTENSION timescaledb;'

Promscale Configuration

Install promscale package:

yum install -y promscale 

If the package is not available in the default repository, you can download it directly:

wget https://github.com/timescale/promscale/releases/download/0.6.1/promscale_0.6.1_Linux_x86_64.rpm
sudo rpm -ivh promscale_0.6.1_Linux_x86_64.rpm

Edit the promscale configuration file /etc/sysconfig/promscale.conf

PROMSCALE_DB_HOST="127.0.0.1"
PROMSCALE_DB_NAME="prometheus"
PROMSCALE_DB_PASSWORD="DBUser.Prometheus"
PROMSCALE_DB_PORT="5432"
PROMSCALE_DB_SSL_MODE="disable"
PROMSCALE_DB_USER="dbuser_prometheus"

Launch promscale service, it will create schema in prometheus database.

# launch 
cat /usr/lib/systemd/system/promscale.service
systemctl start promscale && systemctl status promscale

Prometheus Configuration

Prometheus can use Remote Write/ Remote Read to store metrics in Postgres via Promscale.

Edit the Prometheus configuration file:

vi /etc/prometheus/prometheus.yml

Add the following configuration to the remote_write and remote_read sections:

remote_write:
  - url: "http://127.0.0.1:9201/write"
remote_read:
  - url: "http://127.0.0.1:9201/read"

Metrics are loaded into Postgres after Prometheus is restarted.

systemctl restart prometheus

7 - Bind a L2 VIP to Node Cluster with Keepalived

You can bind an optional L2 VIP on a node cluster with vip_enabled.

proxy:
  hosts:
    10.10.10.29: { nodename: proxy-1 } 
    10.10.10.30: { nodename: proxy-2 } # , vip_role: master }
  vars:
    node_cluster: proxy
    vip_enabled: true
    vip_vrid: 128
    vip_address: 10.10.10.99
    vip_interface: eth1
./node.yml -l proxy -t node_vip     # enable for the first time
./node.yml -l proxy -t vip_refresh  # refresh vip config (e.g. designated master) 

8 - Bind a L2 VIP to PostgreSQL Primary with VIP-Manager

You can define an OPTIONAL L2 VIP on a PostgreSQL cluster, provided that all nodes in the cluster are in the same L2 network.

This VIP works on Master-Backup mode and always points to the node where the primary instance of the database cluster is located.

This VIP is managed by the VIP-Manager, which reads the Leader Key written by Patroni from DCS (etcd) to determine whether it is the master.


Enable VIP

Define pg_vip_enabled parameter as true in the cluster level:

# pgsql 3 node ha cluster: pg-test
pg-test:
  hosts:
    10.10.10.11: { pg_seq: 1, pg_role: primary }   # primary instance, leader of cluster
    10.10.10.12: { pg_seq: 2, pg_role: replica }   # replica instance, follower of leader
    10.10.10.13: { pg_seq: 3, pg_role: replica, pg_offline_query: true } # replica with offline access
  vars:
    pg_cluster: pg-test           # define pgsql cluster name
    pg_users:  [{ name: test , password: test , pgbouncer: true , roles: [ dbrole_admin ] }]
    pg_databases: [{ name: test }]

    # 启用 L2 VIP
    pg_vip_enabled: true
    pg_vip_address: 10.10.10.3/24
    pg_vip_interface: eth1

Beware that pg_vip_address must be a valid IP address with subnet and available in the current L2 network.

Beware that pg_vip_interface must be a valid network interface name and should be the same as the one using IPv4 address in the inventory.

If the network interface name is different among cluster members, users should explicitly specify the pg_vip_interface parameter for each instance, for example:

pg-test:
  hosts:
    10.10.10.11: { pg_seq: 1, pg_role: primary , pg_vip_interface: eth0  }
    10.10.10.12: { pg_seq: 2, pg_role: replica , pg_vip_interface: eth1  }
    10.10.10.13: { pg_seq: 3, pg_role: replica , pg_vip_interface: ens33 }
  vars:
    pg_cluster: pg-test           # define pgsql cluster name
    pg_users:  [{ name: test , password: test , pgbouncer: true , roles: [ dbrole_admin ] }]
    pg_databases: [{ name: test }]

    # 启用 L2 VIP
    pg_vip_enabled: true
    pg_vip_address: 10.10.10.3/24
    #pg_vip_interface: eth1

To refresh the VIP configuration and restart the VIP-Manager, use the following command:

./pgsql.yml -t pg_vip 

9 - Citus: HA Cluster

How to deploy a native HA citus cluster with Pigsty & Patroni?

Citus is a PostgreSQL extension that transforms PostgreSQL into a distributed database, enabling horizontal scaling across multiple nodes to handle large amounts of data and queries.

Since Patroni v3.0, native support for Citus high availability has been provided, simplifying the setup of Citus clusters. Pigsty also offers native support for this.

Note: The latest version of Citus (12.1.6) supports PostgreSQL versions 16, 15, and 14, but does not support PostgreSQL 17 and lacks official ARM64 support. Pigsty’s extension repository provides ARM64 packages for Citus, but caution is advised when using it on ARM architecture.


Citus Cluster

Pigsty natively supports Citus. Refer to conf/citus.yml.

This example uses a four-node sandbox with a Citus cluster named pg-citus, consisting of a two-node coordinator cluster pg-citus0 and two worker clusters pg-citus1 and pg-citus2.

pg-citus:
  hosts:
    10.10.10.10: { pg_group: 0, pg_cluster: pg-citus0 ,pg_vip_address: 10.10.10.2/24 ,pg_seq: 1, pg_role: primary }
    10.10.10.11: { pg_group: 0, pg_cluster: pg-citus0 ,pg_vip_address: 10.10.10.2/24 ,pg_seq: 2, pg_role: replica }
    10.10.10.12: { pg_group: 1, pg_cluster: pg-citus1 ,pg_vip_address: 10.10.10.3/24 ,pg_seq: 1, pg_role: primary }
    10.10.10.13: { pg_group: 2, pg_cluster: pg-citus2 ,pg_vip_address: 10.10.10.4/24 ,pg_seq: 1, pg_role: primary }
  vars:
    pg_mode: citus                            # pgsql cluster mode: citus
    pg_version: 16                            # Citus does not support pg16 yet
    pg_shard: pg-citus                        # Citus shard name: pg-citus
    pg_primary_db: citus                      # primary database used by Citus
    pg_vip_enabled: true                      # enable VIP for Citus cluster
    pg_vip_interface: eth1                    # VIP interface for all members
    pg_dbsu_password: DBUser.Postgres         # all DBSU passwords for Citus cluster
    pg_extensions: [ citus, postgis, pgvector, topn, pg_cron, hll ]  # install these extensions
    pg_libs: 'citus, pg_cron, pg_stat_statements' # Citus will be added automatically by Patroni
    pg_users: [{ name: dbuser_citus ,password: DBUser.Citus ,pgbouncer: true ,roles: [ dbrole_admin ]    }]
    pg_databases: [{ name: citus ,owner: dbuser_citus ,extensions: [ citus, vector, topn, pg_cron, hll ] }]
    pg_parameters:
      cron.database_name: citus
      citus.node_conninfo: 'sslmode=require sslrootcert=/pg/cert/ca.crt sslmode=verify-full'
    pg_hba_rules:
      - { user: 'all' ,db: all  ,addr: 127.0.0.1/32  ,auth: ssl   ,title: 'all user ssl access from localhost' }
      - { user: 'all' ,db: all  ,addr: intra         ,auth: ssl   ,title: 'all user ssl access from intranet'  }

Compared to a standard PostgreSQL cluster, Citus cluster configuration has some specific requirements. First, ensure that the Citus extension is downloaded, installed, loaded, and enabled. This involves the following four parameters:

  • repo_packages: Must include the citus extension, or you need to use a PostgreSQL offline package with the Citus extension.
  • pg_extensions: Must include the citus extension, meaning you need to install the citus extension on each node.
  • pg_libs: Must include the citus extension, and it must be first in the list, but now Patroni will automatically handle this.
  • pg_databases: Define a primary database with the citus extension installed.

Additionally, ensure the configuration for the Citus cluster is correct:

  • pg_mode: Must be set to citus to inform Patroni to use the Citus mode.
  • pg_primary_db: Specify the primary database name, which must have the citus extension (named citus here).
  • pg_shard: Specify a unified name as a prefix for all horizontal shard PG clusters (e.g., pg-citus).
  • pg_group: Specify a shard number, starting from zero for the coordinator cluster and incrementing for worker clusters.
  • pg_cluster: Must match the combination of [pg_shard] and [pg_group].
  • pg_dbsu_password: Set a non-empty plain-text password for proper Citus functionality.
  • pg_parameters: It is recommended to set the citus.node_conninfo parameter, which enforces SSL access and requires node-to-node client certificate verification.

Once configured, deploy the Citus cluster just like a regular PostgreSQL cluster using pgsql.yml.


Managing Citus Clusters

After defining the Citus cluster, use the same playbook pgsql.yml to deploy the Citus cluster:

./pgsql.yml -l pg-citus    # Deploy Citus cluster pg-citus

Any DBSU user (postgres) can use patronictl (alias: pg) to list the status of the Citus cluster:

$ pg list
+ Citus cluster: pg-citus ----------+---------+-----------+----+-----------+--------------------+
| Group | Member      | Host        | Role    | State     | TL | Lag in MB | Tags               |
+-------+-------------+-------------+---------+-----------+----+-----------+--------------------+
|     0 | pg-citus0-1 | 10.10.10.10 | Leader  | running   |  1 |           | clonefrom: true    |
|       |             |             |         |           |    |           | conf: tiny.yml     |
|       |             |             |         |           |    |           | spec: 20C.40G.125G |
|       |             |             |         |           |    |           | version: '16'      |
+-------+-------------+-------------+---------+-----------+----+-----------+--------------------+
|     1 | pg-citus1-1 | 10.10.10.11 | Leader  | running   |  1 |           | clonefrom: true    |
|       |             |             |         |           |    |           | conf: tiny.yml     |
|       |             |             |         |           |    |           | spec: 10C.20G.125G |
|       |             |             |         |           |    |           | version: '16'      |
+-------+-------------+-------------+---------+-----------+----+-----------+--------------------+
|     2 | pg-citus2-1 | 10.10.10.12 | Leader  | running   |  1 |           | clonefrom: true    |
|       |             |             |         |           |    |           | conf: tiny.yml     |
|       |             |             |         |           |    |           | spec: 10C.20G.125G |
|       |             |             |         |           |    |           | version: '16'      |
+-------+-------------+-------------+---------+-----------+----+-----------+--------------------+
|     2 | pg-citus2-2 | 10.10.10.13 | Replica | streaming |  1 |         0 | clonefrom: true    |
|       |             |             |         |           |    |           | conf: tiny.yml     |
|       |             |             |         |           |    |           | spec: 10C.20G.125G |
|       |             |             |         |           |    |           | version: '16'      |
+-------+-------------+-------------+---------+-----------+----+-----------+--------------------+

Each horizontal shard cluster can be treated as a separate PGSQL cluster, managed with the pg (patronictl) command. Note that when using pg to manage the Citus cluster, the --group parameter must be used to specify the cluster shard number:

pg list pg-citus --group 0   # Use --group 0 to specify the shard number

Citus has a system table called pg_dist_node to record node information, which Patroni automatically maintains.

PGURL=postgres://postgres:[email protected]/citus

psql $PGURL -c 'SELECT * FROM pg_dist_node;'       # View node information

Additionally, you can view user authentication information (restricted to superusers):

$ psql $PGURL -c 'SELECT * FROM pg_dist_authinfo;'   # View node authentication info (superuser only)

You can then access the Citus cluster with regular business users (e.g., dbuser_citus with DDL permissions):

psql postgres://dbuser_citus:[email protected]/citus -c 'SELECT * FROM pg_dist_node;'

Using the Citus Cluster

When using a Citus cluster, we highly recommend reading the Citus Official Documentation to understand its architecture and core concepts.

Key to this is understanding the five types of tables in Citus, their characteristics, and use cases:

  • Distributed Table
  • Reference Table
  • Local Table
  • Local Management Table
  • Schema Table

On the coordinator node, you can create distributed and reference tables and query them from any data node. Since version 11.2, any Citus database node can act as a coordinator.

We can use pgbench to create some tables, distributing the main table (pgbench_accounts) across the nodes, and using other smaller tables as reference tables:

PGURL=postgres://dbuser_citus:[email protected]/citus
pgbench -i $PGURL

psql $PGURL <<-EOF
SELECT create_distributed_table('pgbench_accounts', 'aid'); SELECT truncate_local_data_after_distributing_table('public.pgbench_accounts');
SELECT create_reference_table('pgbench_branches')         ; SELECT truncate_local_data_after_distributing_table('public.pgbench_branches');
SELECT create_reference_table('pgbench_history')          ; SELECT truncate_local_data_after_distributing_table('public.pgbench_history');
SELECT create_reference_table('pgbench_tellers')          ; SELECT truncate_local_data_after_distributing_table('public.pgbench_tellers');
EOF

Run read-write bench:

pgbench -nv -P1 -c10 -T500 postgres://dbuser_citus:[email protected]/citus      # 直连协调者 5432 端口
pgbench -nv -P1 -c10 -T500 postgres://dbuser_citus:[email protected]:6432/citus # 通过连接池,减少客户端连接数压力,可以有效提高整体吞吐。
pgbench -nv -P1 -c10 -T500 postgres://dbuser_citus:[email protected]/citus      # 任意 primary 节点都可以作为 coordinator
pgbench --select-only -nv -P1 -c10 -T500 postgres://dbuser_citus:[email protected]/citus # 可以发起只读查询

Production Deployment

Production citus deployment usually requires physical replication for both coordinator and each worker cluster.

For example, in simu.yml there’s a 10-node cluster cluster:

pg-citus: # citus group
  hosts:
    10.10.10.50: { pg_group: 0, pg_cluster: pg-citus0 ,pg_vip_address: 10.10.10.60/24 ,pg_seq: 0, pg_role: primary }
    10.10.10.51: { pg_group: 0, pg_cluster: pg-citus0 ,pg_vip_address: 10.10.10.60/24 ,pg_seq: 1, pg_role: replica }
    10.10.10.52: { pg_group: 1, pg_cluster: pg-citus1 ,pg_vip_address: 10.10.10.61/24 ,pg_seq: 0, pg_role: primary }
    10.10.10.53: { pg_group: 1, pg_cluster: pg-citus1 ,pg_vip_address: 10.10.10.61/24 ,pg_seq: 1, pg_role: replica }
    10.10.10.54: { pg_group: 2, pg_cluster: pg-citus2 ,pg_vip_address: 10.10.10.62/24 ,pg_seq: 0, pg_role: primary }
    10.10.10.55: { pg_group: 2, pg_cluster: pg-citus2 ,pg_vip_address: 10.10.10.62/24 ,pg_seq: 1, pg_role: replica }
    10.10.10.56: { pg_group: 3, pg_cluster: pg-citus3 ,pg_vip_address: 10.10.10.63/24 ,pg_seq: 0, pg_role: primary }
    10.10.10.57: { pg_group: 3, pg_cluster: pg-citus3 ,pg_vip_address: 10.10.10.63/24 ,pg_seq: 1, pg_role: replica }
    10.10.10.58: { pg_group: 4, pg_cluster: pg-citus4 ,pg_vip_address: 10.10.10.64/24 ,pg_seq: 0, pg_role: primary }
    10.10.10.59: { pg_group: 4, pg_cluster: pg-citus4 ,pg_vip_address: 10.10.10.64/24 ,pg_seq: 1, pg_role: replica }
  vars:
    pg_mode: citus                            # pgsql cluster mode: citus
    pg_version: 16                            # citus does not have pg16 available
    pg_shard: pg-citus                        # citus shard name: pg-citus
    pg_primary_db: citus                      # primary database used by citus
    pg_vip_enabled: true                      # enable vip for citus cluster
    pg_vip_interface: eth1                    # vip interface for all members
    pg_dbsu_password: DBUser.Postgres         # enable dbsu password access for citus
    pg_extensions: [ citus, postgis, pgvector, topn, pg_cron, hll ]  # install these extensions
    pg_libs: 'citus, pg_cron, pg_stat_statements' # citus will be added by patroni automatically
    pg_users: [{ name: dbuser_citus ,password: DBUser.Citus ,pgbouncer: true ,roles: [ dbrole_admin ]    }]
    pg_databases: [{ name: citus ,owner: dbuser_citus ,extensions: [ citus, vector, topn, pg_cron, hll ] }]
    pg_parameters:
      cron.database_name: citus
      citus.node_conninfo: 'sslrootcert=/pg/cert/ca.crt sslmode=verify-full'
    pg_hba_rules:
      - { user: 'all' ,db: all  ,addr: 127.0.0.1/32  ,auth: ssl   ,title: 'all user ssl access from localhost' }
      - { user: 'all' ,db: all  ,addr: intra         ,auth: ssl   ,title: 'all user ssl access from intranet'  }

We’ll cover a range of advanced topics in subsequent tutorials:

  • Read-write separation
  • Failover handling
  • Consistent backup and restore
  • Advanced monitoring and troubleshooting
  • Connection pool

10 - Certbot: Free HTTPS Certs

How to use Certbot to apply for free Let’s Encrypt certificates

Pigsty comes with the Certbot tool pre-installed and enabled by default on the Infra node.

This means you can directly use the certbot command-line tool to request a genuine Let’s Encrypt free HTTPS certificate for your Nginx server and public domain, instead of using the self-signed HTTPS certificates provided by Pigsty.

To achieve this, you need to:

  1. Determine which domains require certificates
  2. Point these domains to your server
  3. Use Certbot to apply for certificates
  4. Configure a scheduled task to renew the certificates
  5. Be aware of some important considerations when applying for certificates

Here are the detailed steps:


Determine Which Domains Need Certificates

First, decide which “upstream services” need genuine public certificates.

infra_portal:
  home         : { domain: h.pigsty.cc }
  grafana      : { domain: g.pigsty.cc ,endpoint: "${admin_ip}:3000" ,websocket: true  }
  prometheus   : { domain: p.pigsty.cc ,endpoint: "${admin_ip}:9090" }
  alertmanager : { domain: a.pigsty.cc ,endpoint: "${admin_ip}:9093" }
  blackbox     : { endpoint: "${admin_ip}:9115" }
  loki         : { endpoint: "${admin_ip}:3100" }
  minio        : { domain: m.pigsty.cc    ,endpoint: "${admin_ip}:9001" ,scheme: https ,websocket: true }
  web          : { domain: pigsty.cc      ,path: "/www/web.cc" }
  repo         : { domain: repo.pigsty.cc ,path: "/www/repo"   }

For example, in infra_portal, suppose we need to expose the following five services:

  • Grafana monitoring dashboard at g.pigsty.cc
  • Prometheus time-series database at p.pigsty.cc
  • AlertManager alert dashboard at a.pigsty.cc
  • Pigsty documentation site at pigsty.cc, pointing to the local documentation directory
  • Pigsty software repository at repo.pigsty.cc, pointing to the software repository

In this example, we intentionally did not choose to apply for a genuine Let’s Encrypt certificate for the home page. The reason will be explained in the last section.


Point These Domains to Your Server

Next, you need to point the selected domains to your server’s public IP address. For example, if the Pigsty CC site’s IP address is 47.83.172.23, you can set the following A records for domain resolution in your domain registrar’s DNS control panel (e.g., Alibaba Cloud DNS Console):

47.83.172.23 pigsty.cc
47.83.172.23 g.pigsty.cc
47.83.172.23 p.pigsty.cc
47.83.172.23 a.pigsty.cc
47.83.172.23 repo.pigsty.cc 

After making the changes, you can verify using


Use Certbot to Apply for Certificates

The first time you apply, certbot will prompt you to enter your email and agree to the terms of service. Just follow the instructions.

$ certbot --nginx -d pigsty.cc -d repo.pigsty.cc -d g.pigsty.cc -d p.pigsty.cc -d a.pigsty.cc
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Enter email address (used for urgent renewal and security notices)
 (Enter 'c' to cancel): [email protected]

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Please read the Terms of Service at
https://letsencrypt.org/documents/LE-SA-v1.4-April-3-2024.pdf. You must agree in
order to register with the ACME server. Do you agree?
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
(Y)es/(N)o: Y

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Would you be willing, once your first certificate is successfully issued, to
share your email address with the Electronic Frontier Foundation, a founding
partner of the Let's Encrypt project and the non-profit organization that
develops Certbot? We'd like to send you email about our work encrypting the web,
EFF news, campaigns, and ways to support digital freedom.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
(Y)es/(N)o: N
Account registered.
Requesting a certificate for pigsty.cc and 4 more domains

Successfully received certificate.
Certificate is saved at: /etc/letsencrypt/live/pigsty.cc/fullchain.pem
Key is saved at:         /etc/letsencrypt/live/pigsty.cc/privkey.pem
This certificate expires on 2025-05-18.
These files will be updated when the certificate renews.
Certbot has set up a scheduled task to automatically renew this certificate in the background.

Deploying certificate
Successfully deployed certificate for pigsty.cc to /etc/nginx/conf.d/web.conf
Successfully deployed certificate for repo.pigsty.cc to /etc/nginx/conf.d/repo.conf
Successfully deployed certificate for g.pigsty.cc to /etc/nginx/conf.d/grafana.conf
Successfully deployed certificate for p.pigsty.cc to /etc/nginx/conf.d/prometheus.conf
Successfully deployed certificate for a.pigsty.cc to /etc/nginx/conf.d/alertmanager.conf
Congratulations! You have successfully enabled HTTPS on https://pigsty.cc, https://repo.pigsty.cc, https://g.pigsty.cc, https://p.pigsty.cc, and https://a.pigsty.cc

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
If you like Certbot, please consider supporting our work by:
 * Donating to ISRG / Let's Encrypt:   https://letsencrypt.org/donate
 * Donating to EFF:                    https://eff.org/donate-le
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

After the first application, you can skip these steps and use the command directly for future applications.


Configure a Scheduled Task to Renew the Certificate

By default, certificates are valid for three months, so you should renew them before they expire by using certbot renew.

To renew the certificate, run the following command:

certbot renew

Before executing for real, you can use the DryRun mode to test if the renewal works correctly:

certbot renew --dry-run

If you’ve modified the Nginx configuration file, make sure that Certbot’s changes do not interfere with your configuration.

You can set this command as a crontab job to run on the first day of each month at midnight to renew the certificate and print the log.


Caveats

Special attention should be paid to the SSL certificate for home. When you apply for a certificate for it, Certbot will modify the Nginx configuration file to redirect HTTP on port 80 to HTTPS on port 443. However, this will affect the default repo_upstream local software repository unless you make corresponding adjustments.