This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Kernel Forks

How to use other PostgreSQL kernel forks in Pigsty? Such as Citus, Babelfish, IvorySQL, PolarDB, etc.

In Pigsty, you can replace the “native PG kernel” with different “flavors” of PostgreSQL forks to achieve special features and effects.

Pigsty supports various PostgreSQL kernels and compatible forks, enabling you to simulate different database systems while leveraging PostgreSQL’s ecosystem. Each kernel provides unique capabilities and compatibility layers.

KernelKey FeatureDescription
PostgreSQLOriginal FlavorVanilla PostgreSQL with 440 extensions
CitusHorizontal ScalingDistributed PostgreSQL via native extension
WiltonDBSQL Server CompatibleSQL Server wire-protocol compatibility
IvorySQLOracle CompatibleOracle syntax and PL/SQL compatibility
OpenHaloMySQL CompatibleMySQL wire-protocol compatibility
PerconaTransparent EncryptionPercona Distribution with pg_tde
FerretDBMongoDB MigrationMongoDB wire-protocol compatibility
OrioleDBOLTP OptimizationZheap, No bloat, S3 Storage
PolarDBAurora-style RACRAC, China domestic compliance
SupabaseBackend as a ServiceBaaS based on PostgreSQL, Firebase alternative
CloudberryMPP DW & AnalyticsMassively parallel processing data warehouse

1 - PostgreSQL

Vanilla PostgreSQL kernel with 440 extensions

PostgreSQL is the world’s most advanced and popular open-source database.

Pigsty supports PostgreSQL 13 ~ 18 and provides 440 PG extensions.


Quick Start

Install Pigsty using the pgsql configuration template.

./configure -c pgsql     # Use postgres kernel
./deploy.yml             # Set up everything with pigsty

Most configuration templates use PostgreSQL kernel by default, for example:

  • meta : Default, postgres with core extensions (vector, postgis, timescale)
  • rich : postgres with all extensions installed
  • slim : postgres only, no monitoring infrastructure
  • full : 4-node sandbox for HA demonstration
  • pgsql : minimal postgres kernel configuration example

Configuration

Vanilla PostgreSQL kernel requires no special adjustments:

pg-meta:
  hosts:
    10.10.10.10: { pg_seq: 1, pg_role: primary }
  vars:
    pg_cluster: pg-meta
    pg_users:
      - { name: dbuser_meta ,password: DBUser.Meta   ,pgbouncer: true ,roles: [dbrole_admin   ] ,comment: pigsty admin user }
      - { name: dbuser_view ,password: DBUser.Viewer ,pgbouncer: true ,roles: [dbrole_readonly] ,comment: read-only viewer  }
    pg_databases:
      - { name: meta, baseline: cmdb.sql ,comment: pigsty meta database ,schemas: [pigsty] ,extensions: [ vector ]}
    pg_hba_rules:
      - { user: dbuser_view , db: all ,addr: infra ,auth: pwd ,title: 'allow grafana dashboard access cmdb from infra nodes' }
    node_crontab: [ '00 01 * * * postgres /pg/bin/pg-backup full' ] # Full backup at 1 AM daily
    pg_packages: [ pgsql-main, pgsql-common ]   # pg kernel and common utilities
    #pg_extensions: [ pg18-time ,pg18-gis ,pg18-rag ,pg18-fts ,pg18-olap ,pg18-feat ,pg18-lang ,pg18-type ,pg18-util ,pg18-func ,pg18-admin ,pg18-stat ,pg18-sec ,pg18-fdw ,pg18-sim ,pg18-etl]

Version Selection

To use a different PostgreSQL major version, you can configure it using the -v parameter:

./configure -c pgsql            # Default is postgresql 18, no need to specify explicitly
./configure -c pgsql -v 17      # Use postgresql 17
./configure -c pgsql -v 16      # Use postgresql 16
./configure -c pgsql -v 15      # Use postgresql 15
./configure -c pgsql -v 14      # Use postgresql 14
./configure -c pgsql -v 13      # Use postgresql 13

If a PostgreSQL cluster is already installed, you need to uninstall it before installing a new version:

./pgsql-rm.yml # -l pg-meta

Extension Ecosystem

Pigsty provides a rich extension ecosystem for PostgreSQL, including:

  • Time-series: timescaledb, pg_cron, periods
  • Geospatial: postgis, h3, pgrouting
  • Vector: pgvector, pgml, vchord
  • Search: pg_trgm, zhparser, pgroonga
  • Analytics: citus, pg_duckdb, pg_analytics
  • Features: age, pg_graphql, rum
  • Languages: plpython3u, pljava, plv8
  • Types: hstore, ltree, citext
  • Utilities: http, pg_net, pgjwt
  • Functions: pgcrypto, uuid-ossp, pg_uuidv7
  • Administration: pg_repack, pgagent, pg_squeeze
  • Statistics: pg_stat_statements, pg_qualstats, auto_explain
  • Security: pgaudit, pgcrypto, pgsodium
  • Foreign: postgres_fdw, mysql_fdw, oracle_fdw
  • Compatibility: orafce, babelfishpg_tds
  • Data: pglogical, wal2json, decoderbufs

For details, please refer to Extension Catalog.

2 - Supabase

How to self-host Supabase with Pigsty, deploy an open-source Firebase alternative with a complete backend stack in one click.

Supabase — Build in a weekend, Scale to millions

Supabase is an open-source Firebase alternative that wraps PostgreSQL and provides authentication, out-of-the-box APIs, edge functions, real-time subscriptions, object storage, and vector embedding capabilities. This is a low-code all-in-one backend platform that lets you skip most backend development work, requiring only database design and frontend knowledge to quickly ship products!

Supabase’s motto is: “Build in a weekend, Scale to millions”. Indeed, Supabase is extremely cost-effective at small to micro scales (4c8g), like a cyber bodhisattva. — But when you really scale to millions of users — you should seriously consider self-hosting Supabase — whether for functionality, performance, or cost considerations.

Pigsty provides you with a complete one-click self-hosting solution for Supabase. Self-hosted Supabase enjoys full PostgreSQL monitoring, IaC, PITR, and high availability, and compared to Supabase cloud services, it provides up to 440 out-of-the-box PostgreSQL extensions and can more fully utilize the performance and cost advantages of modern hardware.

For the complete self-hosting tutorial, please refer to: Supabase Self-Hosting Manual


Quick Start

Pigsty’s default supa.yml configuration template defines a single-node Supabase.

First, use Pigsty’s standard installation process to install the MinIO and PostgreSQL instances required for Supabase:

 curl -fsSL https://repo.pigsty.io/get | bash
./bootstrap          # environment check, install dependencies
./configure -c supa  # Important: please modify passwords and other key information in the configuration file!
./deploy.yml         # install Pigsty, deploy PGSQL and MINIO!

Before deploying Supabase, please modify the Supabase parameters in the pigsty.yml configuration file according to your actual situation (mainly passwords!)

Then, run supabase.yml to complete the remaining work and deploy Supabase containers

./supabase.yml       # install Docker and deploy stateless Supabase components!

For users in China, please configure appropriate Docker mirror sites or proxy servers to bypass GFW to pull DockerHub images. For professional subscriptions, we provide the ability to offline install Pigsty and Supabase without internet access.

Pigsty exposes web services through Nginx on the admin node/INFRA node by default. You can add DNS resolution for supa.pigsty pointing to this node locally, then access https://supa.pigsty through a browser to enter the Supabase Studio management interface.

Default username and password: supabase / pigsty


Architecture Overview

Pigsty uses the Docker Compose template provided by Supabase as a blueprint, extracting the stateless components to be handled by Docker Compose. The stateful database and object storage containers are replaced with external PostgreSQL clusters and MinIO services managed by Pigsty.

Supabase: Self-Hosting with Docker

After transformation, Supabase itself is stateless, so you can run, stop, or even run multiple stateless Supabase containers simultaneously on the same PGSQL/MINIO for scaling.

Pigsty uses a single-node PostgreSQL instance on the local machine as Supabase’s core backend database by default. For serious production deployments, we recommend using Pigsty to deploy a PG high-availability cluster with at least three nodes. Or at least use external object storage as a PITR backup repository for failover.

Pigsty uses the SNSD MinIO service on the local machine as file storage by default. For serious production environment deployments, you can use external S3-compatible object storage services, or use other multi-node multi-drive MinIO clusters independently deployed by Pigsty.


Configuration Details

When self-hosting Supabase, the directory app/supabase containing resources required for Docker Compose will be copied entirely to the target node (default supabase group) at /opt/supabase, and deployed in the background using docker compose up -d.

All configuration parameters are defined in the .env file and docker-compose.yml template. But you usually don’t need to modify these two templates directly. You can specify parameters in .env in supa_config, and these configurations will automatically override or append to the final /opt/supabase/.env core configuration file.

The most critical parameters here are jwt_secret, and the corresponding anon_key and service_role_key. For serious production use, please be sure to refer to the instructions and tools in the Supabase Self-Hosting Manual for settings. If you want to provide services using a domain name, you can specify your domain name in site_url, api_external_url, and supabase_public_url.

Pigsty uses local MinIO by default. If you want to use S3 or MinIO as file storage, you need to configure parameters such as s3_bucket, s3_endpoint, s3_access_key, s3_secret_key.

Generally speaking, you also need to use an external SMTP service to send emails. Email services are not recommended for self-hosting, please consider using mature third-party services such as Mailchimp, Aliyun Mail Push, etc.

For users in mainland China, we recommend you configure docker_registry_mirrors mirror sites, or use proxy_env to specify available proxy servers to bypass GFW, otherwise pulling images from DockerHub may fail or be extremely slow!

# launch supabase stateless part with docker compose:
# ./supabase.yml
supabase:
  hosts:
    10.10.10.10: { supa_seq: 1 }  # instance id
  vars:
    supa_cluster: supa            # cluster name
    docker_enabled: true          # enable docker

    # use these to pull docker images via proxy and mirror registries
    #docker_registry_mirrors: ['https://docker.xxxxx.io']
    #proxy_env:   # add [OPTIONAL] proxy env to /etc/docker/daemon.json configuration file
    #  no_proxy: "localhost,127.0.0.1,10.0.0.0/8,192.168.0.0/16,*.pigsty,*.aliyun.com,mirrors.*,*.myqcloud.com,*.tsinghua.edu.cn"
    #  #all_proxy: http://user:pass@host:port

    # these configuration entries will OVERWRITE or APPEND to /opt/supabase/.env file (src template: app/supabase/.env)
    # check https://github.com/pgsty/pigsty/blob/main/app/supabase/.env for default values
    supa_config:

      # IMPORTANT: CHANGE JWT_SECRET AND REGENERATE CREDENTIAL ACCORDING!!!!!!!!!!!
      # https://supabase.com/docs/guides/self-hosting/docker#securing-your-services
      jwt_secret: your-super-secret-jwt-token-with-at-least-32-characters-long
      anon_key: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyAgCiAgICAicm9sZSI6ICJhbm9uIiwKICAgICJpc3MiOiAic3VwYWJhc2UtZGVtbyIsCiAgICAiaWF0IjogMTY0MTc2OTIwMCwKICAgICJleHAiOiAxNzk5NTM1NjAwCn0.dc_X5iR_VP_qT0zsiyj_I_OZ2T9FtRU2BBNWN8Bu4GE
      service_role_key: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyAgCiAgICAicm9sZSI6ICJzZXJ2aWNlX3JvbGUiLAogICAgImlzcyI6ICJzdXBhYmFzZS1kZW1vIiwKICAgICJpYXQiOiAxNjQxNzY5MjAwLAogICAgImV4cCI6IDE3OTk1MzU2MDAKfQ.DaYlNEoUrrEn2Ig7tqibS-PHK5vgusbcbo7X36XVt4Q
      dashboard_username: supabase
      dashboard_password: pigsty

      # postgres connection string (use the correct ip and port)
      postgres_host: 10.10.10.10
      postgres_port: 5436             # access via the 'default' service, which always route to the primary postgres
      postgres_db: postgres
      postgres_password: DBUser.Supa  # password for supabase_admin and multiple supabase users

      # expose supabase via domain name
      site_url: http://supa.pigsty
      api_external_url: http://supa.pigsty
      supabase_public_url: http://supa.pigsty

      # if using s3/minio as file storage
      s3_bucket: supa
      s3_endpoint: https://sss.pigsty:9000
      s3_access_key: supabase
      s3_secret_key: S3User.Supabase
      s3_force_path_style: true
      s3_protocol: https
      s3_region: stub
      minio_domain_ip: 10.10.10.10  # sss.pigsty domain name will resolve to this ip statically

      # if using SMTP (optional)
      #smtp_admin_email: [email protected]
      #smtp_host: supabase-mail
      #smtp_port: 2500
      #smtp_user: fake_mail_user
      #smtp_pass: fake_mail_password
      #smtp_sender_name: fake_sender
      #enable_anonymous_users: false



3 - Percona

Percona Postgres distribution with TDE transparent encryption support

Percona Postgres is a patched Postgres kernel with pg_tde (Transparent Data Encryption) extension.

It’s compatible with PostgreSQL 18.1 and available on all Pigsty-supported platforms.


Quick Start

Use Pigsty’s standard installation process with the pgtde configuration template.

curl -fsSL https://repo.pigsty.io/get | bash; cd ~/pigsty;
./configure -c pgtde     # Use percona postgres kernel
./deploy.yml             # Set up everything with pigsty

Configuration

The following parameters need to be adjusted to deploy a Percona cluster:

pg-meta:
  hosts:
    10.10.10.10: { pg_seq: 1, pg_role: primary }
  vars:
    pg_cluster: pg-meta
    pg_users:
      - { name: dbuser_meta ,password: DBUser.Meta   ,pgbouncer: true ,roles: [dbrole_admin   ] ,comment: pgsql admin user }
      - { name: dbuser_view ,password: DBUser.Viewer ,pgbouncer: true ,roles: [dbrole_readonly] ,comment: read-only viewer  }
    pg_databases:
      - name: meta
        baseline: cmdb.sql
        comment: pigsty tde database
        schemas: [pigsty]
        extensions: [ vector, postgis, pg_tde ,pgaudit, { name: pg_stat_monitor, schema: monitor } ]
    pg_hba_rules:
      - { user: dbuser_view , db: all ,addr: infra ,auth: pwd ,title: 'allow grafana dashboard access cmdb from infra nodes' }
    node_crontab: [ '00 01 * * * postgres /pg/bin/pg-backup full' ] # Full backup at 1 AM daily

    # Percona PostgreSQL TDE specific settings
    pg_packages: [ percona-main, pgsql-common ]  # Install percona postgres packages
    pg_libs: 'pg_tde, pgaudit, pg_stat_statements, pg_stat_monitor, auto_explain'

Extensions

Percona provides 80 available extensions, including pg_tde, pgvector, postgis, pgaudit, set_user, pg_stat_monitor, and other useful third-party extensions.

ExtensionVersionDescription
pg_tde2.1Percona transparent data encryption access method
vector0.8.1Vector data type and ivfflat and hnsw access methods
postgis3.5.4PostGIS geometry and geography types and functions
pgaudit18.0Provides auditing functionality
pg_stat_monitor2.3PostgreSQL query performance monitoring tool
set_user4.2.0Similar to SET ROLE but with additional logging
pg_repack1.5.3Reorganize tables in PostgreSQL databases with minimal locks
hstore1.8Data type for storing sets of (key, value) pairs
ltree1.3Data type for hierarchical tree-like structures
pg_trgm1.6Text similarity measurement and index searching based on trigrams

For the complete list of 80 extensions, please refer to the Percona Postgres official documentation.


Key Features

  • Transparent Data Encryption: Provides data-at-rest encryption using the pg_tde extension
  • PostgreSQL 18 Compatible: Based on the latest PostgreSQL 18 version
  • Enterprise Extensions: Includes enterprise-grade features like pgaudit, pg_stat_monitor
  • Complete Ecosystem: Supports popular extensions like pgvector, PostGIS

Note: Currently in stable stage - thoroughly evaluate before production use.

4 - OpenHalo

MySQL compatible Postgres 14 fork

OpenHalo is an open-source PostgreSQL kernel that provides MySQL wire protocol compatibility.

OpenHalo is based on PostgreSQL 14.10 kernel version and provides wire protocol compatibility with MySQL 5.7.32-log / 8.0 versions.

Pigsty provides deployment support for OpenHalo on all supported Linux platforms.


Quick Start

Use Pigsty’s standard installation process with the mysql configuration template.

curl -fsSL https://repo.pigsty.io/get | bash; cd ~/pigsty;
./configure -c mysql    # Use MySQL (openHalo) configuration template
./deploy.yml            # Install, for production deployment please modify passwords in pigsty.yml first

For production deployment, ensure you modify the password parameters in the pigsty.yml configuration file before running the install playbook.


Configuration

pg-meta:
  hosts:
    10.10.10.10: { pg_seq: 1, pg_role: primary }
  vars:
    pg_cluster: pg-meta
    pg_users:
      - {name: dbuser_meta ,password: DBUser.Meta   ,pgbouncer: true ,roles: [dbrole_admin]    ,comment: pigsty admin user }
      - {name: dbuser_view ,password: DBUser.Viewer ,pgbouncer: true ,roles: [dbrole_readonly] ,comment: read-only viewer for meta database }
    pg_databases:
      - {name: postgres, extensions: [aux_mysql]} # mysql compatible database
      - {name: meta ,baseline: cmdb.sql ,comment: pigsty meta database ,schemas: [pigsty]}
    pg_hba_rules:
      - {user: dbuser_view , db: all ,addr: infra ,auth: pwd ,title: 'allow grafana dashboard access cmdb from infra nodes'}
    node_crontab: [ '00 01 * * * postgres /pg/bin/pg-backup full' ] # Full backup at 1 AM daily

    # OpenHalo specific settings
    pg_mode: mysql                    # HaloDB's MySQL compatibility mode
    pg_version: 14                    # Current HaloDB compatible PG major version 14
    pg_packages: [ openhalodb, pgsql-common ]  # Install openhalodb instead of postgresql kernel

Usage

When accessing MySQL, the actual connection uses the postgres database. Please note that the concept of “database” in MySQL actually corresponds to “Schema” in PostgreSQL. Therefore, use mysql actually uses the mysql Schema within the postgres database.

The username and password for MySQL are the same as in PostgreSQL. You can manage users and permissions using standard PostgreSQL methods.

Client Access

OpenHalo provides MySQL wire protocol compatibility, listening on port 3306 by default, allowing MySQL clients and drivers to connect directly.

Pigsty’s conf/mysql configuration installs the mysql client tool by default.

You can access MySQL using the following command:

mysql -h 127.0.0.1 -u dbuser_dba

Currently, OpenHalo officially ensures Navicat can properly access this MySQL port, but Intellij IDEA’s DataGrip access will cause errors.


Modification Notes

The OpenHalo kernel installed by Pigsty is based on the HaloTech-Co-Ltd/openHalo kernel with minor modifications:

  • Changed the default database name from halo0root back to postgres
  • Removed the 1.0. prefix from the default version number, restoring it to 14.10
  • Modified the default configuration file to enable MySQL compatibility and listen on port 3306 by default

Please note that Pigsty does not provide any warranty for using the OpenHalo kernel. Any issues or requirements encountered when using this kernel should be addressed with the original vendor.

Warning: Currently experimental - thoroughly evaluate before production use.

5 - OrioleDB

Next-generation OLTP engine for PostgreSQL

OrioleDB is a PostgreSQL storage engine extension that claims to provide 4x OLTP performance, no xid wraparound and table bloat issues, and “cloud-native” (data stored in S3) capabilities.

OrioleDB’s latest version is based on a patched PostgreSQL 17.0 and an additional extension

You can run OrioleDB as an RDS using Pigsty. It’s compatible with PG 17 and available on all supported Linux platforms. The latest version is beta12, based on PG 17_11 patch.


Quick Start

Follow Pigsty’s standard installation process using the oriole configuration template.

curl -fsSL https://repo.pigsty.io/get | bash; cd ~/pigsty;
./configure -c oriole    # Use OrioleDB configuration template
./deploy.yml             # Install Pigsty with OrioleDB

For production deployment, ensure you modify the password parameters in the pigsty.yml configuration before running the install playbook.


Configuration

pg-meta:
  hosts:
    10.10.10.10: { pg_seq: 1, pg_role: primary }
  vars:
    pg_cluster: pg-meta
    pg_users:
      - {name: dbuser_meta ,password: DBUser.Meta   ,pgbouncer: true ,roles: [dbrole_admin]    ,comment: pigsty admin user }
      - {name: dbuser_view ,password: DBUser.Viewer ,pgbouncer: true ,roles: [dbrole_readonly] ,comment: read-only viewer for meta database }
    pg_databases:
      - {name: meta ,baseline: cmdb.sql ,comment: pigsty meta database ,schemas: [pigsty], extensions: [orioledb]}
    pg_hba_rules:
      - {user: dbuser_view , db: all ,addr: infra ,auth: pwd ,title: 'allow grafana dashboard access cmdb from infra nodes'}
    node_crontab: [ '00 01 * * * postgres /pg/bin/pg-backup full' ] # Full backup at 1 AM daily

    # OrioleDB specific settings
    pg_mode: oriole                                         # oriole compatibility mode
    pg_packages: [ orioledb, pgsql-common ]                 # Install OrioleDB kernel
    pg_libs: 'orioledb, pg_stat_statements, auto_explain'   # Load OrioleDB extension

Usage

To use OrioleDB, you need to install the orioledb_17 and oriolepg_17 packages (currently only RPM versions are available).

Initialize TPC-B-like tables with pgbench using 100 warehouses:

pgbench -is 100 meta
pgbench -nv -P1 -c10 -S -T1000 meta
pgbench -nv -P1 -c50 -S -T1000 meta
pgbench -nv -P1 -c10    -T1000 meta
pgbench -nv -P1 -c50    -T1000 meta

Next, you can rebuild these tables using the orioledb storage engine and observe the performance difference:

-- Create OrioleDB tables
CREATE TABLE pgbench_accounts_o (LIKE pgbench_accounts INCLUDING ALL) USING orioledb;
CREATE TABLE pgbench_branches_o (LIKE pgbench_branches INCLUDING ALL) USING orioledb;
CREATE TABLE pgbench_history_o (LIKE pgbench_history INCLUDING ALL) USING orioledb;
CREATE TABLE pgbench_tellers_o (LIKE pgbench_tellers INCLUDING ALL) USING orioledb;

-- Copy data from regular tables to OrioleDB tables
INSERT INTO pgbench_accounts_o SELECT * FROM pgbench_accounts;
INSERT INTO pgbench_branches_o SELECT * FROM pgbench_branches;
INSERT INTO pgbench_history_o SELECT  * FROM pgbench_history;
INSERT INTO pgbench_tellers_o SELECT * FROM pgbench_tellers;

-- Drop original tables and rename OrioleDB tables
DROP TABLE pgbench_accounts, pgbench_branches, pgbench_history, pgbench_tellers;
ALTER TABLE pgbench_accounts_o RENAME TO pgbench_accounts;
ALTER TABLE pgbench_branches_o RENAME TO pgbench_branches;
ALTER TABLE pgbench_history_o RENAME TO pgbench_history;
ALTER TABLE pgbench_tellers_o RENAME TO pgbench_tellers;

Key Features

  • No XID Wraparound: Eliminates transaction ID wraparound maintenance
  • No Table Bloat: Advanced storage management prevents table bloat
  • Cloud Storage: Native support for S3-compatible object storage
  • OLTP Optimized: Designed for transactional workloads
  • Improved Performance: Better space utilization and query performance

Note: Currently in Beta stage - thoroughly evaluate before production use.

6 - Citus

Deploy native high-availability Citus horizontally sharded clusters with Pigsty, seamlessly scaling PostgreSQL across multiple shards and accelerating OLTP/OLAP queries.

Pigsty natively supports Citus. This is a distributed horizontal scaling extension based on the native PostgreSQL kernel.


Installation

Citus is a PostgreSQL extension plugin that can be installed and enabled on a native PostgreSQL cluster following the standard plugin installation process.

./pgsql.yml -t pg_extension -e '{"pg_extensions":["citus"]}'

Configuration

To define a citus cluster, you need to specify the following parameters:

  • pg_mode must be set to citus instead of the default pgsql
  • You must define the shard name pg_shard and shard number pg_group on each shard cluster
  • You must define pg_primary_db to specify the database managed by Patroni
  • If you want to use postgres from pg_dbsu instead of the default pg_admin_username to execute admin commands, then pg_dbsu_password must be set to a non-empty plaintext password

Additionally, you need extra hba rules to allow SSL access from localhost and other data nodes.

You can define each Citus cluster as a separate group, like standard PostgreSQL clusters, as shown in conf/dbms/citus.yml:

all:
  children:
    pg-citus0: # citus shard 0
      hosts: { 10.10.10.10: { pg_seq: 1, pg_role: primary } }
      vars: { pg_cluster: pg-citus0 , pg_group: 0 }
    pg-citus1: # citus shard 1
      hosts: { 10.10.10.11: { pg_seq: 1, pg_role: primary } }
      vars: { pg_cluster: pg-citus1 , pg_group: 1 }
    pg-citus2: # citus shard 2
      hosts: { 10.10.10.12: { pg_seq: 1, pg_role: primary } }
      vars: { pg_cluster: pg-citus2 , pg_group: 2 }
    pg-citus3: # citus shard 3
      hosts:
        10.10.10.13: { pg_seq: 1, pg_role: primary }
        10.10.10.14: { pg_seq: 2, pg_role: replica }
      vars: { pg_cluster: pg-citus3 , pg_group: 3 }
  vars:                               # Global parameters for all Citus clusters
    pg_mode: citus                    # pgsql cluster mode must be set to: citus
    pg_shard: pg-citus                # citus horizontal shard name: pg-citus
    pg_primary_db: meta               # citus database name: meta
    pg_dbsu_password: DBUser.Postgres # If using dbsu, you need to configure a password for it
    pg_users: [ { name: dbuser_meta ,password: DBUser.Meta ,pgbouncer: true ,roles: [ dbrole_admin ] } ]
    pg_databases: [ { name: meta ,extensions: [ { name: citus }, { name: postgis }, { name: timescaledb } ] } ]
    pg_hba_rules:
      - { user: 'all' ,db: all  ,addr: 127.0.0.1/32 ,auth: ssl ,title: 'all user ssl access from localhost' }
      - { user: 'all' ,db: all  ,addr: intra        ,auth: ssl ,title: 'all user ssl access from intranet'  }

You can also specify identity parameters for all Citus cluster members within a single group, as shown in prod.yml:

#==========================================================#
# pg-citus: 10 node citus cluster (5 x primary-replica pair)
#==========================================================#
pg-citus: # citus group
  hosts:
    10.10.10.50: { pg_group: 0, pg_cluster: pg-citus0 ,pg_vip_address: 10.10.10.60/24 ,pg_seq: 0, pg_role: primary }
    10.10.10.51: { pg_group: 0, pg_cluster: pg-citus0 ,pg_vip_address: 10.10.10.60/24 ,pg_seq: 1, pg_role: replica }
    10.10.10.52: { pg_group: 1, pg_cluster: pg-citus1 ,pg_vip_address: 10.10.10.61/24 ,pg_seq: 0, pg_role: primary }
    10.10.10.53: { pg_group: 1, pg_cluster: pg-citus1 ,pg_vip_address: 10.10.10.61/24 ,pg_seq: 1, pg_role: replica }
    10.10.10.54: { pg_group: 2, pg_cluster: pg-citus2 ,pg_vip_address: 10.10.10.62/24 ,pg_seq: 0, pg_role: primary }
    10.10.10.55: { pg_group: 2, pg_cluster: pg-citus2 ,pg_vip_address: 10.10.10.62/24 ,pg_seq: 1, pg_role: replica }
    10.10.10.56: { pg_group: 3, pg_cluster: pg-citus3 ,pg_vip_address: 10.10.10.63/24 ,pg_seq: 0, pg_role: primary }
    10.10.10.57: { pg_group: 3, pg_cluster: pg-citus3 ,pg_vip_address: 10.10.10.63/24 ,pg_seq: 1, pg_role: replica }
    10.10.10.58: { pg_group: 4, pg_cluster: pg-citus4 ,pg_vip_address: 10.10.10.64/24 ,pg_seq: 0, pg_role: primary }
    10.10.10.59: { pg_group: 4, pg_cluster: pg-citus4 ,pg_vip_address: 10.10.10.64/24 ,pg_seq: 1, pg_role: replica }
  vars:
    pg_mode: citus                    # pgsql cluster mode: citus
    pg_shard: pg-citus                # citus shard name: pg-citus
    pg_primary_db: test               # primary database used by citus
    pg_dbsu_password: DBUser.Postgres # all dbsu password access for citus cluster
    pg_vip_enabled: true
    pg_vip_interface: eth1
    pg_extensions: [ 'citus postgis timescaledb pgvector' ]
    pg_libs: 'citus, timescaledb, pg_stat_statements, auto_explain' # citus will be added by patroni automatically
    pg_users: [ { name: test ,password: test ,pgbouncer: true ,roles: [ dbrole_admin ] } ]
    pg_databases: [ { name: test ,owner: test ,extensions: [ { name: citus }, { name: postgis } ] } ]
    pg_hba_rules:
      - { user: 'all' ,db: all  ,addr: 10.10.10.0/24 ,auth: trust ,title: 'trust citus cluster members'        }
      - { user: 'all' ,db: all  ,addr: 127.0.0.1/32  ,auth: ssl   ,title: 'all user ssl access from localhost' }
      - { user: 'all' ,db: all  ,addr: intra         ,auth: ssl   ,title: 'all user ssl access from intranet'  }

Usage

You can access any node just like accessing a regular cluster:

pgbench -i postgres://test:test@pg-citus0/test
pgbench -nv -P1 -T1000 -c 2 postgres://test:test@pg-citus0/test

By default, changes you make to one Shard only occur on that cluster and are not synchronized to other Shards.

If you want to distribute writes across all Shards, you can use the API functions provided by Citus to mark tables as:

  • Distributed tables (automatic partitioning, requires specifying partition key)
  • Reference tables (full replication: does not require specifying partition key)

Starting from Citus 11.2, any Citus database node can play the role of coordinator, meaning any primary node can write:

psql -h pg-citus0 -d test -c "SELECT create_distributed_table('pgbench_accounts', 'aid'); SELECT truncate_local_data_after_distributing_table('public.pgbench_accounts');"
psql -h pg-citus0 -d test -c "SELECT create_reference_table('pgbench_branches')         ; SELECT truncate_local_data_after_distributing_table('public.pgbench_branches');"
psql -h pg-citus0 -d test -c "SELECT create_reference_table('pgbench_history')          ; SELECT truncate_local_data_after_distributing_table('public.pgbench_history');"
psql -h pg-citus0 -d test -c "SELECT create_reference_table('pgbench_tellers')          ; SELECT truncate_local_data_after_distributing_table('public.pgbench_tellers');"

After distributing the tables, you can also access them on other nodes:

psql -h pg-citus1 -d test -c '\dt+'

For example, a full table scan will show that the execution plan has become a distributed plan:

vagrant@meta-1:~$ psql -h pg-citus3 -d test -c 'explain select * from pgbench_accounts'
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Custom Scan (Citus Adaptive)  (cost=0.00..0.00 rows=100000 width=352)
   Task Count: 32
   Tasks Shown: One of 32
   ->  Task
         Node: host=10.10.10.52 port=5432 dbname=test
         ->  Seq Scan on pgbench_accounts_102008 pgbench_accounts  (cost=0.00..81.66 rows=3066 width=97)
(6 rows)

You can initiate writes from several different primary nodes:

pgbench -nv -P1 -T1000 -c 2 postgres://test:test@pg-citus1/test
pgbench -nv -P1 -T1000 -c 2 postgres://test:test@pg-citus2/test
pgbench -nv -P1 -T1000 -c 2 postgres://test:test@pg-citus3/test
pgbench -nv -P1 -T1000 -c 2 postgres://test:test@pg-citus4/test

When a node fails, the native high availability support provided by Patroni will promote the standby node and automatically take over.

test=# select * from  pg_dist_node;
 nodeid | groupid |  nodename   | nodeport | noderack | hasmetadata | isactive | noderole | nodecluster | metadatasynced | shouldhaveshards
--------+---------+-------------+----------+----------+-------------+----------+----------+-------------+----------------+------------------
      1 |       0 | 10.10.10.51 |     5432 | default  | t           | t        | primary  | default     | t              | f
      2 |       2 | 10.10.10.54 |     5432 | default  | t           | t        | primary  | default     | t              | t
      5 |       1 | 10.10.10.52 |     5432 | default  | t           | t        | primary  | default     | t              | t
      3 |       4 | 10.10.10.58 |     5432 | default  | t           | t        | primary  | default     | t              | t
      4 |       3 | 10.10.10.56 |     5432 | default  | t           | t        | primary  | default     | t              | t

7 - Babelfish

Create Microsoft SQL Server compatible PostgreSQL clusters using WiltonDB and Babelfish! (Wire protocol level compatibility)

Babelfish is an MSSQL (Microsoft SQL Server) compatibility solution based on PostgreSQL, open-sourced by AWS.


Overview

Pigsty allows users to create Microsoft SQL Server compatible PostgreSQL clusters using Babelfish and WiltonDB!

  • Babelfish: An MSSQL (Microsoft SQL Server) compatibility extension plugin open-sourced by AWS
  • WiltonDB: A PostgreSQL kernel distribution focusing on integrating Babelfish

Babelfish is a PostgreSQL extension, but it only works on a slightly modified PostgreSQL kernel fork. WiltonDB provides compiled fork kernel binaries and extension binary packages on EL/Ubuntu systems.

Pigsty can replace the native PostgreSQL kernel with WiltonDB, providing an out-of-the-box MSSQL compatible cluster. Using and managing an MSSQL cluster is no different from a standard PostgreSQL 15 cluster. You can use all the features provided by Pigsty, such as high availability, backup, monitoring, etc.

WiltonDB comes with several extension plugins including Babelfish, but cannot use native PostgreSQL extension plugins.

After the MSSQL compatible cluster starts, in addition to listening on the PostgreSQL default port, it also listens on the MSSQL default port 1433, providing MSSQL services via the TDS Wire Protocol on this port. You can connect to the MSSQL service provided by Pigsty using any MSSQL client, such as SQL Server Management Studio, or using the sqlcmd command-line tool.


Installation

WiltonDB conflicts with the native PostgreSQL kernel. Only one kernel can be installed on a node. Use the following command to install the WiltonDB kernel online.

./node.yml -t node_install -e '{"node_repo_modules":"local,mssql","node_packages":["wiltondb"]}'

Please note that WiltonDB is only available on EL and Ubuntu systems. Debian support is not currently provided.

The Pigsty Professional Edition provides offline installation packages for WiltonDB, which can be installed from local software sources.


Configuration

When installing and deploying the MSSQL module, please pay special attention to the following:

  • WiltonDB is available on EL (7/8/9) and Ubuntu (20.04/22.04), but not available on Debian systems.
  • WiltonDB is currently compiled based on PostgreSQL 15, so you need to specify pg_version: 15.
  • On EL systems, the wiltondb binary is installed by default in the /usr/bin/ directory, while on Ubuntu systems it is installed in the /usr/lib/postgresql/15/bin/ directory, which is different from the official PostgreSQL binary placement.
  • In WiltonDB compatibility mode, the HBA password authentication rule needs to use md5 instead of scram-sha-256. Therefore, you need to override Pigsty’s default HBA rule set and insert the md5 authentication rule required by SQL Server before the dbrole_readonly wildcard authentication rule.
  • WiltonDB can only be enabled for one primary database, and you should designate a user as the Babelfish superuser, allowing Babelfish to create databases and users. The default is mssql and dbuser_mssql. If you change this, please also modify the user in files/mssql.sql.
  • The WiltonDB TDS wire protocol compatibility plugin babelfishpg_tds needs to be enabled in shared_preload_libraries.
  • After enabling the WiltonDB extension, it listens on the MSSQL default port 1433. You can override Pigsty’s default service definitions to point the primary and replica services to port 1433 instead of 5432 / 6432.

The following parameters need to be configured for the MSSQL database cluster:

#----------------------------------#
# PGSQL & MSSQL (Babelfish & Wilton)
#----------------------------------#
# PG Installation
node_repo_modules: local,node,mssql # add mssql and os upstream repos
pg_mode: mssql                      # Microsoft SQL Server Compatible Mode
pg_libs: 'babelfishpg_tds, pg_stat_statements, auto_explain' # add timescaledb to shared_preload_libraries
pg_version: 15                      # The current WiltonDB major version is 15
pg_packages:
  - wiltondb                        # install forked version of postgresql with babelfishpg support
  - patroni pgbouncer pgbackrest pg_exporter pgbadger vip-manager
pg_extensions: []                   # do not install any vanilla postgresql extensions

# PG Provision
pg_default_hba_rules:               # overwrite default HBA rules for babelfish cluster
- {user: '${dbsu}'    ,db: all         ,addr: local     ,auth: ident ,title: 'dbsu access via local os user ident'  }
- {user: '${dbsu}'    ,db: replication ,addr: local     ,auth: ident ,title: 'dbsu replication from local os ident' }
- {user: '${repl}'    ,db: replication ,addr: localhost ,auth: pwd   ,title: 'replicator replication from localhost'}
- {user: '${repl}'    ,db: replication ,addr: intra     ,auth: pwd   ,title: 'replicator replication from intranet' }
- {user: '${repl}'    ,db: postgres    ,addr: intra     ,auth: pwd   ,title: 'replicator postgres db from intranet' }
- {user: '${monitor}' ,db: all         ,addr: localhost ,auth: pwd   ,title: 'monitor from localhost with password' }
- {user: '${monitor}' ,db: all         ,addr: infra     ,auth: pwd   ,title: 'monitor from infra host with password'}
- {user: '${admin}'   ,db: all         ,addr: infra     ,auth: ssl   ,title: 'admin @ infra nodes with pwd & ssl'   }
- {user: '${admin}'   ,db: all         ,addr: world     ,auth: ssl   ,title: 'admin @ everywhere with ssl & pwd'    }
- {user: dbuser_mssql ,db: mssql       ,addr: intra     ,auth: md5   ,title: 'allow mssql dbsu intranet access'     } # <--- use md5 auth method for mssql user
- {user: '+dbrole_readonly',db: all    ,addr: localhost ,auth: pwd   ,title: 'pgbouncer read/write via local socket'}
- {user: '+dbrole_readonly',db: all    ,addr: intra     ,auth: pwd   ,title: 'read/write biz user via password'     }
- {user: '+dbrole_offline' ,db: all    ,addr: intra     ,auth: pwd   ,title: 'allow etl offline tasks from intranet'}
pg_default_services:                # route primary & replica service to mssql port 1433
- { name: primary ,port: 5433 ,dest: 1433  ,check: /primary   ,selector: "[]" }
- { name: replica ,port: 5434 ,dest: 1433  ,check: /read-only ,selector: "[]" , backup: "[? pg_role == `primary` || pg_role == `offline` ]" }
- { name: default ,port: 5436 ,dest: postgres ,check: /primary   ,selector: "[]" }
- { name: offline ,port: 5438 ,dest: postgres ,check: /replica   ,selector: "[? pg_role == `offline` || pg_offline_query ]" , backup: "[? pg_role == `replica` && !pg_offline_query]"}

You can define MSSQL business databases and business users:

#----------------------------------#
# pgsql (singleton on current node)
#----------------------------------#
# this is an example single-node postgres cluster with postgis & timescaledb installed, with one biz database & two biz users
pg-meta:
  hosts:
    10.10.10.10: { pg_seq: 1, pg_role: primary } # <---- primary instance with read-write capability
  vars:
    pg_cluster: pg-test
    pg_users:                           # create MSSQL superuser
      - {name: dbuser_mssql ,password: DBUser.MSSQL ,superuser: true, pgbouncer: true ,roles: [dbrole_admin], comment: superuser & owner for babelfish  }
    pg_primary_db: mssql                # use `mssql` as the primary sql server database
    pg_databases:
      - name: mssql
        baseline: mssql.sql             # init babelfish database & user
        extensions:
          - { name: uuid-ossp          }
          - { name: babelfishpg_common }
          - { name: babelfishpg_tsql   }
          - { name: babelfishpg_tds    }
          - { name: babelfishpg_money  }
          - { name: pg_hint_plan       }
          - { name: system_stats       }
          - { name: tds_fdw            }
        owner: dbuser_mssql
        parameters: { 'babelfishpg_tsql.migration_mode' : 'multi-db' }
        comment: babelfish cluster, a MSSQL compatible pg cluster

Access

You can use any SQL Server compatible client tool to access this database cluster.

Microsoft provides sqlcmd as the official command-line tool.

In addition, they also provide a Go version command-line tool go-sqlcmd.

Install go-sqlcmd:

curl -LO https://github.com/microsoft/go-sqlcmd/releases/download/v1.4.0/sqlcmd-v1.4.0-linux-amd64.tar.bz2
tar xjvf sqlcmd-v1.4.0-linux-amd64.tar.bz2
sudo mv sqlcmd* /usr/bin/

Quick start with go-sqlcmd:

$ sqlcmd -S 10.10.10.10,1433 -U dbuser_mssql -P DBUser.MSSQL
1> select @@version
2> go
version
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Babelfish for PostgreSQL with SQL Server Compatibility - 12.0.2000.8
Oct 22 2023 17:48:32
Copyright (c) Amazon Web Services
PostgreSQL 15.4 (EL 1:15.4.wiltondb3.3_2-2.el8) on x86_64-redhat-linux-gnu (Babelfish 3.3.0)

(1 row affected)

Using the service mechanism provided by Pigsty, you can use ports 5433 / 5434 to always connect to port 1433 on the primary/replica.

# Access port 5433 on any cluster member, pointing to port 1433 MSSQL port on the primary
sqlcmd -S 10.10.10.11,5433 -U dbuser_mssql -P DBUser.MSSQL

# Access port 5434 on any cluster member, pointing to port 1433 MSSQL port on any readable replica
sqlcmd -S 10.10.10.11,5434 -U dbuser_mssql -P DBUser.MSSQL

Extensions

Most of the PGSQL module’s extension plugins (non-pure SQL class) cannot be directly used on the WiltonDB kernel of the MSSQL module and need to be recompiled.

Currently, WiltonDB comes with the following extension plugins. In addition to PostgreSQL Contrib extensions and the four BabelfishPG core extensions, it also provides three third-party extensions: pg_hint_plan, tds_fdw, and system_stats.

Extension NameVersionDescription
dblink1.2connect to other PostgreSQL databases from within a database
adminpack2.1administrative functions for PostgreSQL
dict_int1.0text search dictionary template for integers
intagg1.1integer aggregator and enumerator (obsolete)
dict_xsyn1.0text search dictionary template for extended synonym processing
amcheck1.3functions for verifying relation integrity
autoinc1.0functions for autoincrementing fields
bloom1.0bloom access method - signature file based index
fuzzystrmatch1.1determine similarities and distance between strings
intarray1.5functions, operators, and index support for 1-D arrays of integers
btree_gin1.3support for indexing common datatypes in GIN
btree_gist1.7support for indexing common datatypes in GiST
hstore1.8data type for storing sets of (key, value) pairs
hstore_plperl1.0transform between hstore and plperl
isn1.2data types for international product numbering standards
hstore_plperlu1.0transform between hstore and plperlu
jsonb_plperl1.0transform between jsonb and plperl
citext1.6data type for case-insensitive character strings
jsonb_plperlu1.0transform between jsonb and plperlu
jsonb_plpython3u1.0transform between jsonb and plpython3u
cube1.5data type for multidimensional cubes
hstore_plpython3u1.0transform between hstore and plpython3u
earthdistance1.1calculate great-circle distances on the surface of the Earth
lo1.1Large Object maintenance
file_fdw1.0foreign-data wrapper for flat file access
insert_username1.0functions for tracking who changed a table
ltree1.2data type for hierarchical tree-like structures
ltree_plpython3u1.0transform between ltree and plpython3u
pg_walinspect1.0functions to inspect contents of PostgreSQL Write-Ahead Log
moddatetime1.0functions for tracking last modification time
old_snapshot1.0utilities in support of old_snapshot_threshold
pgcrypto1.3cryptographic functions
pgrowlocks1.2show row-level locking information
pageinspect1.11inspect the contents of database pages at a low level
pg_surgery1.0extension to perform surgery on a damaged relation
seg1.4data type for representing line segments or floating-point intervals
pgstattuple1.5show tuple-level statistics
pg_buffercache1.3examine the shared buffer cache
pg_freespacemap1.2examine the free space map (FSM)
postgres_fdw1.1foreign-data wrapper for remote PostgreSQL servers
pg_prewarm1.2prewarm relation data
tcn1.0Triggered change notifications
pg_trgm1.6text similarity measurement and index searching based on trigrams
xml21.1XPath querying and XSLT
refint1.0functions for implementing referential integrity (obsolete)
pg_visibility1.2examine the visibility map (VM) and page-level visibility info
pg_stat_statements1.10track planning and execution statistics of all SQL statements executed
sslinfo1.2information about SSL certificates
tablefunc1.0functions that manipulate whole tables, including crosstab
tsm_system_rows1.0TABLESAMPLE method which accepts number of rows as a limit
tsm_system_time1.0TABLESAMPLE method which accepts time in milliseconds as a limit
unaccent1.1text search dictionary that removes accents
uuid-ossp1.1generate universally unique identifiers (UUIDs)
plpgsql1.0PL/pgSQL procedural language
babelfishpg_money1.1.0babelfishpg_money
system_stats2.0EnterpriseDB system statistics for PostgreSQL
tds_fdw2.0.3Foreign data wrapper for querying a TDS database (Sybase or Microsoft SQL Server)
babelfishpg_common3.3.3Transact SQL Datatype Support
babelfishpg_tds1.0.0TDS protocol extension
pg_hint_plan1.5.1
babelfishpg_tsql3.3.1Transact SQL compatibility
  • The Pigsty Professional Edition provides offline installation capabilities for MSSQL compatible modules
  • Pigsty Professional Edition provides optional MSSQL compatible kernel extension porting and customization services, which can port extensions available in the PGSQL module to MSSQL clusters.

8 - IvorySQL

Use HighGo’s open-source IvorySQL kernel to achieve Oracle syntax/PLSQL compatibility based on PostgreSQL clusters.

IvorySQL is an open-source PostgreSQL kernel fork that aims to provide “Oracle compatibility” based on PG.


Overview

The IvorySQL kernel is supported in the Pigsty open-source version. Your server needs internet access to download relevant packages directly from IvorySQL’s official repository.

Please note that adding IvorySQL directly to Pigsty’s default software repository will affect the installation of the native PostgreSQL kernel. Pigsty Professional Edition provides offline installation solutions including the IvorySQL kernel.

The current latest version of IvorySQL is 5.0, corresponding to PostgreSQL version 18. Please note that IvorySQL is currently only available on EL8/EL9.

The last IvorySQL version supporting EL7 was 3.3, corresponding to PostgreSQL 16.3; the last version based on PostgreSQL 17 is IvorySQL 4.4


Installation

If your environment has internet access, you can add the IvorySQL repository directly to the node using the following method, then execute the PGSQL playbook for installation:

./node.yml -t node_repo -e '{"node_repo_modules":"local,node,pgsql,ivory"}'

Configuration

The following parameters need to be configured for IvorySQL database clusters:

#----------------------------------#
# Ivory SQL Configuration
#----------------------------------#
node_repo_modules: local,node,pgsql,ivory  # add ivorysql upstream repo
pg_mode: ivory                    # IvorySQL Oracle Compatible Mode
pg_packages: [ 'ivorysql patroni pgbouncer pgbackrest pg_exporter pgbadger vip-manager' ]
pg_libs: 'liboracle_parser, pg_stat_statements, auto_explain'
pg_extensions: [ ]                # do not install any vanilla postgresql extensions

When using Oracle compatibility mode, you need to dynamically load the liboracle_parser extension plugin.


Client Access

IvorySQL is equivalent to PostgreSQL 16, and any client tool compatible with the PostgreSQL wire protocol can access IvorySQL clusters.


Extension List

Most of the PGSQL module’s extensions (non-pure SQL types) cannot be used directly on the IvorySQL kernel. If you need to use them, please recompile and install from source for the new kernel.

Currently, the IvorySQL kernel comes with the following 101 extension plugins.

(The extension table remains unchanged as it’s already in English)

Please note that Pigsty does not assume any warranty responsibility for using the IvorySQL kernel. Any issues or requirements encountered when using this kernel should be addressed with the original vendor.

9 - PolarDB PG

Using Alibaba Cloud’s open-source PolarDB for PostgreSQL kernel to provide domestic innovation qualification support, with Oracle RAC-like user experience.

Overview

Pigsty allows you to create PostgreSQL clusters with “domestic innovation qualification” credentials using PolarDB!

PolarDB for PostgreSQL is essentially equivalent to PostgreSQL 15. Any client tool compatible with the PostgreSQL wire protocol can access PolarDB clusters.

Pigsty’s PGSQL repository provides PolarDB PG open-source installation packages for EL7 / EL8, but they are not downloaded to the local software repository during Pigsty installation.

If you need offline installation support for PolarDB PG, please consider our professional subscription service


Installation

If your environment has internet access, you can add the Pigsty PGSQL and dependency repositories to the node using the following method:

node_repo_modules: local,node,pgsql

Then in pg_packages, replace the native postgresql package with polardb.


Configuration

The following parameters need special configuration for PolarDB database clusters:

#----------------------------------#
# PGSQL & PolarDB
#----------------------------------#
pg_version: 15
pg_packages: [ 'polardb patroni pgbouncer pgbackrest pg_exporter pgbadger vip-manager' ]
pg_extensions: [ ]                # do not install any vanilla postgresql extensions
pg_mode: polar                    # PolarDB Compatible Mode
pg_default_roles:                 # default roles and users in postgres cluster
  - { name: dbrole_readonly  ,login: false ,comment: role for global read-only access     }
  - { name: dbrole_offline   ,login: false ,comment: role for restricted read-only access }
  - { name: dbrole_readwrite ,login: false ,roles: [dbrole_readonly] ,comment: role for global read-write access }
  - { name: dbrole_admin     ,login: false ,roles: [pg_monitor, dbrole_readwrite] ,comment: role for object creation }
  - { name: postgres     ,superuser: true  ,comment: system superuser }
  - { name: replicator   ,superuser: true  ,replication: true ,roles: [pg_monitor, dbrole_readonly] ,comment: system replicator } # <- superuser is required for replication
  - { name: dbuser_dba   ,superuser: true  ,roles: [dbrole_admin]  ,pgbouncer: true ,pool_mode: session, pool_connlimit: 16 ,comment: pgsql admin user }
  - { name: dbuser_monitor ,roles: [pg_monitor] ,pgbouncer: true ,parameters: {log_min_duration_statement: 1000 } ,pool_mode: session ,pool_connlimit: 8 ,comment: pgsql monitor user }

Note particularly that PolarDB PG requires the replicator replication user to be a Superuser, unlike native PG.


Extension List

Most PGSQL module extension plugins (non-pure SQL types) cannot be used directly on the PolarDB kernel. If needed, please recompile and install from source for the new kernel.

Currently, the PolarDB kernel comes with the following 61 extension plugins. Apart from Contrib extensions, the additional extensions provided include:

  • polar_csn 1.0 : polar_csn
  • polar_monitor 1.2 : examine the polardb information
  • polar_monitor_preload 1.1 : examine the polardb information
  • polar_parameter_check 1.0 : kernel extension for parameter validation
  • polar_px 1.0 : Parallel Execution extension
  • polar_stat_env 1.0 : env stat functions for PolarDB
  • polar_stat_sql 1.3 : Kernel statistics gathering, and sql plan nodes information gathering
  • polar_tde_utils 1.0 : Internal extension for TDE
  • polar_vfs 1.0 : polar_vfs
  • polar_worker 1.0 : polar_worker
  • timetravel 1.0 : functions for implementing time travel
  • vector 0.5.1 : vector data type and ivfflat and hnsw access methods
  • smlar 1.0 : compute similary of any one-dimensional arrays

Complete list of available PolarDB plugins:

nameversioncomment
hstore_plpython2u1.0transform between hstore and plpython2u
dict_int1.0text search dictionary template for integers
adminpack2.0administrative functions for PostgreSQL
hstore_plpython3u1.0transform between hstore and plpython3u
amcheck1.1functions for verifying relation integrity
hstore_plpythonu1.0transform between hstore and plpythonu
autoinc1.0functions for autoincrementing fields
insert_username1.0functions for tracking who changed a table
bloom1.0bloom access method - signature file based index
file_fdw1.0foreign-data wrapper for flat file access
dblink1.2connect to other PostgreSQL databases from within a database
btree_gin1.3support for indexing common datatypes in GIN
fuzzystrmatch1.1determine similarities and distance between strings
lo1.1Large Object maintenance
intagg1.1integer aggregator and enumerator (obsolete)
btree_gist1.5support for indexing common datatypes in GiST
hstore1.5data type for storing sets of (key, value) pairs
intarray1.2functions, operators, and index support for 1-D arrays of integers
citext1.5data type for case-insensitive character strings
cube1.4data type for multidimensional cubes
hstore_plperl1.0transform between hstore and plperl
isn1.2data types for international product numbering standards
jsonb_plperl1.0transform between jsonb and plperl
dict_xsyn1.0text search dictionary template for extended synonym processing
hstore_plperlu1.0transform between hstore and plperlu
earthdistance1.1calculate great-circle distances on the surface of the Earth
pg_prewarm1.2prewarm relation data
jsonb_plperlu1.0transform between jsonb and plperlu
pg_stat_statements1.6track execution statistics of all SQL statements executed
jsonb_plpython2u1.0transform between jsonb and plpython2u
jsonb_plpython3u1.0transform between jsonb and plpython3u
jsonb_plpythonu1.0transform between jsonb and plpythonu
pg_trgm1.4text similarity measurement and index searching based on trigrams
pgstattuple1.5show tuple-level statistics
ltree1.1data type for hierarchical tree-like structures
ltree_plpython2u1.0transform between ltree and plpython2u
pg_visibility1.2examine the visibility map (VM) and page-level visibility info
ltree_plpython3u1.0transform between ltree and plpython3u
ltree_plpythonu1.0transform between ltree and plpythonu
seg1.3data type for representing line segments or floating-point intervals
moddatetime1.0functions for tracking last modification time
pgcrypto1.3cryptographic functions
pgrowlocks1.2show row-level locking information
pageinspect1.7inspect the contents of database pages at a low level
pg_buffercache1.3examine the shared buffer cache
pg_freespacemap1.2examine the free space map (FSM)
tcn1.0Triggered change notifications
plperl1.0PL/Perl procedural language
uuid-ossp1.1generate universally unique identifiers (UUIDs)
plperlu1.0PL/PerlU untrusted procedural language
refint1.0functions for implementing referential integrity (obsolete)
xml21.1XPath querying and XSLT
plpgsql1.0PL/pgSQL procedural language
plpython3u1.0PL/Python3U untrusted procedural language
pltcl1.0PL/Tcl procedural language
pltclu1.0PL/TclU untrusted procedural language
polar_csn1.0polar_csn
sslinfo1.2information about SSL certificates
polar_monitor1.2examine the polardb information
polar_monitor_preload1.1examine the polardb information
polar_parameter_check1.0kernel extension for parameter validation
polar_px1.0Parallel Execution extension
tablefunc1.0functions that manipulate whole tables, including crosstab
polar_stat_env1.0env stat functions for PolarDB
smlar1.0compute similary of any one-dimensional arrays
timetravel1.0functions for implementing time travel
tsm_system_rows1.0TABLESAMPLE method which accepts number of rows as a limit
polar_stat_sql1.3Kernel statistics gathering, and sql plan nodes information gathering
tsm_system_time1.0TABLESAMPLE method which accepts time in milliseconds as a limit
polar_tde_utils1.0Internal extension for TDE
polar_vfs1.0polar_vfs
polar_worker1.0polar_worker
unaccent1.1text search dictionary that removes accents
postgres_fdw1.0foreign-data wrapper for remote PostgreSQL servers
  • Pigsty Professional Edition provides PolarDB offline installation support, extension plugin compilation support, and monitoring and management support specifically adapted for PolarDB clusters.
  • Pigsty collaborates with the Alibaba Cloud kernel team and can provide paid kernel backup support services.

10 - PolarDB Oracle

Using Alibaba Cloud’s commercial PolarDB for Oracle kernel (closed source, PG14, only available in special enterprise edition customization)

Pigsty allows you to create PolarDB for Oracle clusters with “domestic innovation qualification” credentials using PolarDB!

According to the Security and Reliability Evaluation Results Announcement (No. 1, 2023), Appendix 3, Centralized Database. PolarDB v2.0 is an autonomous, controllable, secure, and reliable domestic innovation database.

PolarDB for Oracle is an Oracle-compatible version developed based on PolarDB for PostgreSQL. Both share the same kernel, distinguished by the --compatibility-mode parameter.

We collaborate with the Alibaba Cloud kernel team to provide a complete database solution based on PolarDB v2.0 kernel and Pigsty v3.0 RDS. Please contact sales for inquiries, or purchase on Alibaba Cloud Marketplace.

The PolarDB for Oracle kernel is currently only available on EL systems.


Extensions

Currently, the PolarDB 2.0 (Oracle compatible) kernel comes with the following 188 extension plugins:

namedefault_versioncomment
cube1.5data type for multidimensional cubes
ip4r2.4NULL
adminpack2.1administrative functions for PostgreSQL
dict_xsyn1.0text search dictionary template for extended synonym processing
amcheck1.4functions for verifying relation integrity
autoinc1.0functions for autoincrementing fields
hstore1.8data type for storing sets of (key, value) pairs
bloom1.0bloom access method - signature file based index
earthdistance1.1calculate great-circle distances on the surface of the Earth
hstore_plperl1.0transform between hstore and plperl
bool_plperl1.0transform between bool and plperl
file_fdw1.0foreign-data wrapper for flat file access
bool_plperlu1.0transform between bool and plperlu
fuzzystrmatch1.1determine similarities and distance between strings
hstore_plperlu1.0transform between hstore and plperlu
btree_gin1.3support for indexing common datatypes in GIN
hstore_plpython2u1.0transform between hstore and plpython2u
btree_gist1.6support for indexing common datatypes in GiST
hll2.17type for storing hyperloglog data
hstore_plpython3u1.0transform between hstore and plpython3u
citext1.6data type for case-insensitive character strings
hstore_plpythonu1.0transform between hstore and plpythonu
hypopg1.3.1Hypothetical indexes for PostgreSQL
insert_username1.0functions for tracking who changed a table
dblink1.2connect to other PostgreSQL databases from within a database
decoderbufs0.1.0Logical decoding plugin that delivers WAL stream changes using a Protocol Buffer format
intagg1.1integer aggregator and enumerator (obsolete)
dict_int1.0text search dictionary template for integers
intarray1.5functions, operators, and index support for 1-D arrays of integers
isn1.2data types for international product numbering standards
jsonb_plperl1.0transform between jsonb and plperl
jsonb_plperlu1.0transform between jsonb and plperlu
jsonb_plpython2u1.0transform between jsonb and plpython2u
jsonb_plpython3u1.0transform between jsonb and plpython3u
jsonb_plpythonu1.0transform between jsonb and plpythonu
lo1.1Large Object maintenance
log_fdw1.0foreign-data wrapper for csvlog
ltree1.2data type for hierarchical tree-like structures
ltree_plpython2u1.0transform between ltree and plpython2u
ltree_plpython3u1.0transform between ltree and plpython3u
ltree_plpythonu1.0transform between ltree and plpythonu
moddatetime1.0functions for tracking last modification time
old_snapshot1.0utilities in support of old_snapshot_threshold
oracle_fdw1.2foreign data wrapper for Oracle access
oss_fdw1.1foreign-data wrapper for OSS access
pageinspect2.1inspect the contents of database pages at a low level
pase0.0.1ant ai similarity search
pg_bigm1.2text similarity measurement and index searching based on bigrams
pg_freespacemap1.2examine the free space map (FSM)
pg_hint_plan1.4controls execution plan with hinting phrases in comment of special form
pg_buffercache1.5examine the shared buffer cache
pg_prewarm1.2prewarm relation data
pg_repack1.4.8-1Reorganize tables in PostgreSQL databases with minimal locks
pg_sphere1.0spherical objects with useful functions, operators and index support
pg_cron1.5Job scheduler for PostgreSQL
pg_jieba1.1.0a parser for full-text search of Chinese
pg_stat_kcache2.2.1Kernel statistics gathering
pg_stat_statements1.9track planning and execution statistics of all SQL statements executed
pg_surgery1.0extension to perform surgery on a damaged relation
pg_trgm1.6text similarity measurement and index searching based on trigrams
pg_visibility1.2examine the visibility map (VM) and page-level visibility info
pg_wait_sampling1.1sampling based statistics of wait events
pgaudit1.6.2provides auditing functionality
pgcrypto1.3cryptographic functions
pgrowlocks1.2show row-level locking information
pgstattuple1.5show tuple-level statistics
pgtap1.2.0Unit testing for PostgreSQL
pldbgapi1.1server-side support for debugging PL/pgSQL functions
plperl1.0PL/Perl procedural language
plperlu1.0PL/PerlU untrusted procedural language
plpgsql1.0PL/pgSQL procedural language
plpython2u1.0PL/Python2U untrusted procedural language
plpythonu1.0PL/PythonU untrusted procedural language
plsql1.0Oracle compatible PL/SQL procedural language
pltcl1.0PL/Tcl procedural language
pltclu1.0PL/TclU untrusted procedural language
polar_bfile1.0The BFILE data type enables access to binary file LOBs that are stored in file systems outside Database
polar_bpe1.0polar_bpe
polar_builtin_cast1.1Internal extension for builtin casts
polar_builtin_funcs2.0implement polar builtin functions
polar_builtin_type1.5polar_builtin_type for PolarDB
polar_builtin_view1.5polar_builtin_view
polar_catalog1.2polardb pg extend catalog
polar_channel1.0polar_channel
polar_constraint1.0polar_constraint
polar_csn1.0polar_csn
polar_dba_views1.0polar_dba_views
polar_dbms_alert1.2implement polar_dbms_alert - supports asynchronous notification of database events.
polar_dbms_application_info1.0implement polar_dbms_application_info - record names of executing modules or transactions in the database.
polar_dbms_pipe1.1implements polar_dbms_pipe - package lets two or more sessions in the same instance communicate.
polar_dbms_aq1.2implement dbms_aq - provides an interface to Advanced Queuing.
polar_dbms_lob1.3implement dbms_lob - provides subprograms to operate on BLOBs, CLOBs, and NCLOBs.
polar_dbms_output1.2implement polar_dbms_output - enables you to send messages from stored procedures.
polar_dbms_lock1.0implement polar_dbms_lock - provides an interface to Oracle Lock Management services.
polar_dbms_aqadm1.3polar_dbms_aqadm - procedures to manage Advanced Queuing configuration and administration information.
polar_dbms_assert1.0implement polar_dbms_assert - provide an interface to validate properties of the input value.
polar_dbms_metadata1.0implement polar_dbms_metadata - provides a way for you to retrieve metadata from the database dictionary.
polar_dbms_random1.0implement polar_dbms_random - a built-in random number generator, not intended for cryptography
polar_dbms_crypto1.1implement dbms_crypto - provides an interface to encrypt and decrypt stored data.
polar_dbms_redact1.0implement polar_dbms_redact - provides an interface to mask data from queries by an application.
polar_dbms_debug1.1server-side support for debugging PL/SQL functions
polar_dbms_job1.0polar_dbms_job
polar_dbms_mview1.1implement polar_dbms_mview - enables to refresh materialized views.
polar_dbms_job_preload1.0polar_dbms_job_preload
polar_dbms_obfuscation_toolkit1.1implement polar_dbms_obfuscation_toolkit - enables an application to get data md5.
polar_dbms_rls1.1implement polar_dbms_rls - a fine-grained access control administrative built-in package
polar_multi_toast_utils1.0polar_multi_toast_utils
polar_dbms_session1.2implement polar_dbms_session - support to set preferences and security levels.
polar_odciconst1.0implement ODCIConst - Provide some built-in constants in Oracle.
polar_dbms_sql1.2implement polar_dbms_sql - provides an interface to execute dynamic SQL.
polar_osfs_toolkit1.0osfs library tools and functions extension
polar_dbms_stats14.0stabilize plans by fixing statistics
polar_monitor1.5monitor functions for PolarDB
polar_osfs_utils1.0osfs library utils extension
polar_dbms_utility1.3implement polar_dbms_utility - provides various utility subprograms.
polar_parameter_check1.0kernel extension for parameter validation
polar_dbms_xmldom1.0implement dbms_xmldom and dbms_xmlparser - support standard DOM interface and xml parser object
polar_parameter_manager1.1Extension to select parameters for manger.
polar_faults1.0.0simulate some database faults for end user or testing system.
polar_monitor_preload1.1examine the polardb information
polar_proxy_utils1.0Extension to provide operations about proxy.
polar_feature_utils1.2PolarDB feature utilization
polar_global_awr1.0PolarDB Global AWR Report
polar_publication1.0support polardb pg logical replication
polar_global_cache1.0polar_global_cache
polar_px1.0Parallel Execution extension
polar_serverless1.0polar serverless extension
polar_resource_manager1.0a background process that forcibly frees user session process memory
polar_sys_context1.1implement polar_sys_context - returns the value of parameter associated with the context namespace at the current instant.
polar_gpc1.3polar_gpc
polar_tde_utils1.0Internal extension for TDE
polar_gtt1.1polar_gtt
polar_utl_encode1.2implement polar_utl_encode - provides functions that encode RAW data into a standard encoded format
polar_htap1.1extension for PolarDB HTAP
polar_htap_db1.0extension for PolarDB HTAP database level operation
polar_io_stat1.0polar io stat in multi dimension
polar_utl_file1.0implement utl_file - support PL/SQL programs can read and write operating system text files
polar_ivm1.0polar_ivm
polar_sql_mapping1.2Record error sqls and mapping them to correct one
polar_stat_sql1.0Kernel statistics gathering, and sql plan nodes information gathering
tds_fdw2.0.2Foreign data wrapper for querying a TDS database (Sybase or Microsoft SQL Server)
xml21.1XPath querying and XSLT
polar_upgrade_catalogs1.1Upgrade catalogs for old version instance
polar_utl_i18n1.1polar_utl_i18n
polar_utl_raw1.0implement utl_raw - provides SQL functions for manipulating RAW datatypes.
timescaledb2.9.2Enables scalable inserts and complex queries for time-series data
polar_vfs1.0polar virtual file system for different storage
polar_worker1.0polar_worker
postgres_fdw1.1foreign-data wrapper for remote PostgreSQL servers
refint1.0functions for implementing referential integrity (obsolete)
roaringbitmap0.5support for Roaring Bitmaps
tsm_system_time1.0TABLESAMPLE method which accepts time in milliseconds as a limit
vector0.5.0vector data type and ivfflat and hnsw access methods
rum1.3RUM index access method
unaccent1.1text search dictionary that removes accents
seg1.4data type for representing line segments or floating-point intervals
sequential_uuids1.0.2generator of sequential UUIDs
uuid-ossp1.1generate universally unique identifiers (UUIDs)
smlar1.0compute similary of any one-dimensional arrays
varbitx1.1varbit functions pack
sslinfo1.2information about SSL certificates
tablefunc1.0functions that manipulate whole tables, including crosstab
tcn1.0Triggered change notifications
zhparser1.0a parser for full-text search of Chinese
address_standardizer3.3.2Ganos PostGIS address standardizer
address_standardizer_data_us3.3.2Ganos PostGIS address standardizer data us
ganos_fdw6.0Ganos Spatial FDW extension for POLARDB
ganos_geometry6.0Ganos geometry lite extension for POLARDB
ganos_geometry_pyramid6.0Ganos Geometry Pyramid extension for POLARDB
ganos_geometry_sfcgal6.0Ganos geometry lite sfcgal extension for POLARDB
ganos_geomgrid6.0Ganos geometry grid extension for POLARDB
ganos_importer6.0Ganos Spatial importer extension for POLARDB
ganos_networking6.0Ganos networking
ganos_pointcloud6.0Ganos pointcloud extension For POLARDB
ganos_pointcloud_geometry6.0Ganos_pointcloud LIDAR data and ganos_geometry data for POLARDB
ganos_raster6.0Ganos raster extension for POLARDB
ganos_scene6.0Ganos scene extension for POLARDB
ganos_sfmesh6.0Ganos surface mesh extension for POLARDB
ganos_spatialref6.0Ganos spatial reference extension for POLARDB
ganos_trajectory6.0Ganos trajectory extension for POLARDB
ganos_vomesh6.0Ganos volumn mesh extension for POLARDB
postgis_tiger_geocoder3.3.2Ganos PostGIS tiger geocoder
postgis_topology3.3.2Ganos PostGIS topology

11 - PostgresML

How to deploy PostgresML with Pigsty: ML, training, inference, Embedding, RAG inside DB.

PostgresML is a PostgreSQL extension that supports the latest large language models (LLM), vector operations, classical machine learning, and traditional Postgres application workloads.

PostgresML (pgml) is a PostgreSQL extension written in Rust. You can run standalone Docker images, but this documentation is not a docker-compose template introduction, for reference only.

PostgresML officially supports Ubuntu 22.04, but we also maintain RPM versions for EL 8/9, if you don’t need CUDA and NVIDIA-related features.

You need internet access on database nodes to download Python dependencies from PyPI and models from HuggingFace.


Configuration

PostgresML is an extension written in Rust, officially supporting Ubuntu. Pigsty maintains RPM versions of PostgresML on EL8 and EL9.

Creating a New Cluster

PostgresML 2.7.9 is available for PostgreSQL 15, supporting Ubuntu 22.04 (official), Debian 12, and EL 8/9 (maintained by Pigsty). To enable pgml, you first need to install the extension:

pg-meta:
  hosts: { 10.10.10.10: { pg_seq: 1, pg_role: primary } }
  vars:
    pg_cluster: pg-meta
    pg_users:
      - {name: dbuser_meta     ,password: DBUser.Meta     ,pgbouncer: true ,roles: [dbrole_admin]    ,comment: pigsty admin user }
      - {name: dbuser_view     ,password: DBUser.Viewer   ,pgbouncer: true ,roles: [dbrole_readonly] ,comment: read-only viewer for meta database }
    pg_databases:
      - { name: meta ,baseline: cmdb.sql ,comment: pigsty meta database ,schemas: [pigsty] ,extensions: [{name: postgis, schema: public}, {name: timescaledb}]}
    pg_hba_rules:
      - {user: dbuser_view , db: all ,addr: infra ,auth: pwd ,title: 'allow grafana dashboard access cmdb from infra nodes'}
    pg_libs: 'pgml, pg_stat_statements, auto_explain'
    pg_extensions: [ 'pgml_15 pgvector_15 wal2json_15 repack_15' ]  # ubuntu
    #pg_extensions: [ 'postgresql-pgml-15 postgresql-15-pgvector postgresql-15-wal2json postgresql-15-repack' ]  # ubuntu

On EL 8/9, the extension name is pgml_15, corresponding to the Ubuntu/Debian name postgresql-pgml-15. You also need to add pgml to pg_libs.

Enabling on an Existing Cluster

To enable pgml on an existing cluster, you can install it using Ansible’s package module:

ansible pg-meta -m package -b -a 'name=pgml_15'
# ansible el8,el9 -m package -b -a 'name=pgml_15'           # EL 8/9
# ansible u22 -m package -b -a 'name=postgresql-pgml-15'    # Ubuntu 22.04 jammy

Python Dependencies

You also need to install PostgresML’s Python dependencies on cluster nodes. Official tutorial: Installation Guide

Install Python and PIP

Ensure python3, pip, and venv are installed:

# Ubuntu 22.04 (python3.10), need to install pip and venv using apt
sudo apt install -y python3 python3-pip python3-venv

For EL 8 / EL9 and compatible distributions, you can use python3.11:

# EL 8/9, can upgrade the default pip and virtualenv
sudo yum install -y python3.11 python3.11-pip       # install latest python3.11
python3.11 -m pip install --upgrade pip virtualenv  # use python3.11 on EL8 / EL9
Using PyPI Mirrors

For users in mainland China, we recommend using Tsinghua University’s PyPI mirror.

pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple    # set global mirror (recommended)
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple some-package        # use for single installation

Install Dependencies

Create a Python virtual environment and use pip to install dependencies from requirements.txt and requirements-xformers.txt.

If you’re using EL 8/9, replace python3 with python3.11 in the following commands.

su - postgres;                          # create virtual environment as database superuser
mkdir -p /data/pgml; cd /data/pgml;     # create virtual environment directory
python3    -m venv /data/pgml           # create virtual environment directory (Ubuntu 22.04)
source /data/pgml/bin/activate          # activate virtual environment

# write Python dependencies and install with pip
cat > /data/pgml/requirments.txt <<EOF
accelerate==0.22.0
auto-gptq==0.4.2
bitsandbytes==0.41.1
catboost==1.2
ctransformers==0.2.27
datasets==2.14.5
deepspeed==0.10.3
huggingface-hub==0.17.1
InstructorEmbedding==1.0.1
lightgbm==4.1.0
orjson==3.9.7
pandas==2.1.0
rich==13.5.2
rouge==1.0.1
sacrebleu==2.3.1
sacremoses==0.0.53
scikit-learn==1.3.0
sentencepiece==0.1.99
sentence-transformers==2.2.2
tokenizers==0.13.3
torch==2.0.1
torchaudio==2.0.2
torchvision==0.15.2
tqdm==4.66.1
transformers==4.33.1
xgboost==2.0.0
langchain==0.0.287
einops==0.6.1
pynvml==11.5.0
EOF

# install dependencies using pip in the virtual environment
python3 -m pip install -r /data/pgml/requirments.txt
python3 -m pip install xformers==0.0.21 --no-dependencies

# additionally, 3 Python packages need to be installed globally using sudo!
sudo python3 -m pip install xgboost lightgbm scikit-learn

Enable PostgresML

After installing the pgml extension and Python dependencies on all cluster nodes, you can enable pgml on the PostgreSQL cluster.

Use the patronictl command to configure the cluster, add pgml to shared_preload_libraries, and specify your virtual environment directory in pgml.venv:

shared_preload_libraries: pgml, timescaledb, pg_stat_statements, auto_explain
pgml.venv: '/data/pgml'

Then restart the database cluster and create the extension using SQL commands:

CREATE EXTENSION vector;        -- also recommend installing pgvector!
CREATE EXTENSION pgml;          -- create PostgresML in the current database
SELECT pgml.version();          -- print PostgresML version information

If everything is normal, you should see output similar to the following:

# create extension pgml;
INFO:  Python version: 3.11.2 (main, Oct  5 2023, 16:06:03) [GCC 8.5.0 20210514 (Red Hat 8.5.0-18)]
INFO:  Scikit-learn 1.3.0, XGBoost 2.0.0, LightGBM 4.1.0, NumPy 1.26.1
CREATE EXTENSION

# SELECT pgml.version(); -- print PostgresML version information
 version
---------
 2.7.8

Done! For more details, please refer to the official PostgresML documentation: https://postgresml.org/docs/guides/use-cases/

12 - Greenplum

Deploy/Monitor Greenplum clusters with Pigsty, build Massively Parallel Processing (MPP) PostgreSQL data warehouse clusters!

Pigsty supports deploying Greenplum clusters and its derivative distribution YMatrixDB, and provides the capability to integrate existing Greenplum deployments into Pigsty monitoring.


Overview

Greenplum / YMatrix cluster deployment capabilities are only available in the professional/enterprise editions and are not currently open source.


Installation

Pigsty provides installation packages for Greenplum 6 (@el7) and Greenplum 7 (@el8). Open source users can install and configure them manually.

# EL 7 Only (Greenplum6)
./node.yml -t node_install  -e '{"node_repo_modules":"pgsql","node_packages":["open-source-greenplum-db-6"]}'

# EL 8 Only (Greenplum7)
./node.yml -t node_install  -e '{"node_repo_modules":"pgsql","node_packages":["open-source-greenplum-db-7"]}'

Configuration

To define a Greenplum cluster, you need to use pg_mode = gpsql and additional identity parameters pg_shard and gp_role.

#================================================================#
#                        GPSQL Clusters                          #
#================================================================#

#----------------------------------#
# cluster: mx-mdw (gp master)
#----------------------------------#
mx-mdw:
  hosts:
    10.10.10.10: { pg_seq: 1, pg_role: primary , nodename: mx-mdw-1 }
  vars:
    gp_role: master          # this cluster is used as greenplum master
    pg_shard: mx             # pgsql sharding name & gpsql deployment name
    pg_cluster: mx-mdw       # this master cluster name is mx-mdw
    pg_databases:
      - { name: matrixmgr , extensions: [ { name: matrixdbts } ] }
      - { name: meta }
    pg_users:
      - { name: meta , password: DBUser.Meta , pgbouncer: true }
      - { name: dbuser_monitor , password: DBUser.Monitor , roles: [ dbrole_readonly ], superuser: true }

    pgbouncer_enabled: true                # enable pgbouncer for greenplum master
    pgbouncer_exporter_enabled: false      # enable pgbouncer_exporter for greenplum master
    pg_exporter_params: 'host=127.0.0.1&sslmode=disable'  # use 127.0.0.1 as local monitor host

#----------------------------------#
# cluster: mx-sdw (gp master)
#----------------------------------#
mx-sdw:
  hosts:
    10.10.10.11:
      nodename: mx-sdw-1        # greenplum segment node
      pg_instances:             # greenplum segment instances
        6000: { pg_cluster: mx-seg1, pg_seq: 1, pg_role: primary , pg_exporter_port: 9633 }
        6001: { pg_cluster: mx-seg2, pg_seq: 2, pg_role: replica , pg_exporter_port: 9634 }
    10.10.10.12:
      nodename: mx-sdw-2
      pg_instances:
        6000: { pg_cluster: mx-seg2, pg_seq: 1, pg_role: primary , pg_exporter_port: 9633  }
        6001: { pg_cluster: mx-seg3, pg_seq: 2, pg_role: replica , pg_exporter_port: 9634  }
    10.10.10.13:
      nodename: mx-sdw-3
      pg_instances:
        6000: { pg_cluster: mx-seg3, pg_seq: 1, pg_role: primary , pg_exporter_port: 9633 }
        6001: { pg_cluster: mx-seg1, pg_seq: 2, pg_role: replica , pg_exporter_port: 9634 }
  vars:
    gp_role: segment               # these are nodes for gp segments
    pg_shard: mx                   # pgsql sharding name & gpsql deployment name
    pg_cluster: mx-sdw             # these segment clusters name is mx-sdw
    pg_preflight_skip: true        # skip preflight check (since pg_seq & pg_role & pg_cluster not exists)
    pg_exporter_config: pg_exporter_basic.yml                             # use basic config to avoid segment server crash
    pg_exporter_params: 'options=-c%20gp_role%3Dutility&sslmode=disable'  # use gp_role = utility to connect to segments

Additionally, PG Exporter requires extra connection parameters to connect to Greenplum Segment instances for metric collection.

13 - Cloudberry

Deploy/Monitor Cloudberry clusters with Pigsty, an MPP data warehouse cluster forked from Greenplum!

Installation

Pigsty provides installation packages for Greenplum 6 (@el7) and Greenplum 7 (@el8). Open source users can install and configure them manually.

# EL 7 Only (Greenplum6)
./node.yml -t node_install  -e '{"node_repo_modules":"pgsql","node_packages":["cloudberrydb"]}'

# EL 8 Only (Greenplum7)
./node.yml -t node_install  -e '{"node_repo_modules":"pgsql","node_packages":["cloudberrydb"]}'

14 - Neon

Use Neon’s open-source Serverless PostgreSQL kernel to build flexible, scale-to-zero, forkable PG services.

Neon adopts a storage and compute separation architecture, providing seamless autoscaling, scale to zero, and unique database branching capabilities.

Neon official website: https://neon.tech/

The compiled binaries of Neon are excessively large and are currently not available to open-source users. It is currently in the pilot stage. If you have requirements, please contact Pigsty sales.