Cloudberry
gpsql mode.Cloudberry is an open-source MPP data warehouse kernel derived from the Greenplum ecosystem, suitable for large-scale parallel analytics workloads.
Overview
In Pigsty, Cloudberry uses gpsql mode and shares the same identity model, monitoring logic, and directory conventions as Greenplum / MatrixDB.
- Kernel package:
cloudberry - Mode identifier:
pg_mode: gpsql - Role flag:
gp_role: master | segment - Current repo version:
Cloudberry 2.0.0 - Current version string:
PostgreSQL 14.4 (Apache Cloudberry 2.0.0-incubating build 1) - Default binary directory:
/usr/local/cloudberry
The important boundary is this: Pigsty currently focuses on package delivery, node management, monitoring onboarding, access control, and configuration orchestration for Cloudberry. For MPP cluster initialization, scale-out, rebalance, and other upstream-specific operational actions, you should still use the official Cloudberry toolchain.
According to the current release notes, Cloudberry is currently an RPM-only kernel in the Pigsty repository.
Installation
There is no standalone cloudberry one-click template yet. The more common workflow is:
- Enroll the target nodes into Pigsty.
- Install the
cloudberrykernel package. - Describe the coordinator / segment topology with
gpsqlmode. - Use Pigsty to unify monitoring, accounts, access control, and backup integration.
If you only need to install the kernel package on a node:
./node.yml -t node_repo -e '{"node_repo_modules":"local,node,pgsql"}'
./node.yml -t node_install -e '{"node_packages":["cloudberry"]}'
If you are onboarding an existing Cloudberry cluster, it is usually better to keep the original initialization workflow and add Pigsty inventory plus monitoring configuration incrementally.
Configuration
Cloudberry uses gpsql mode rather than a dedicated cloudberry mode. Compared with vanilla PostgreSQL, you at least need to care about the extra identity parameters pg_shard and gp_role; if you want to label shard groups explicitly, you can also add pg_group.
Here is a minimal readable topology example:
all:
children:
cb-mdw:
hosts:
10.10.10.10: { pg_seq: 1, pg_role: primary }
vars:
pg_cluster: cb-mdw
pg_mode: gpsql
pg_shard: cb
gp_role: master
pg_packages: [ cloudberry, pgsql-common ]
cb-sdw:
hosts:
10.10.10.11:
nodename: cb-sdw-1
pg_instances:
6000: { pg_cluster: cb-seg1, pg_seq: 1, pg_role: primary, pg_exporter_port: 9633 }
10.10.10.12:
nodename: cb-sdw-2
pg_instances:
6000: { pg_cluster: cb-seg2, pg_seq: 1, pg_role: primary, pg_exporter_port: 9633 }
vars:
pg_cluster: cb-sdw
pg_mode: gpsql
pg_shard: cb
gp_role: segment
pg_preflight_skip: true
pg_packages: [ cloudberry, pgsql-common ]
pg_exporter_config: pg_exporter_basic.yml
pg_exporter_params: 'options=-c%20gp_role%3Dutility&sslmode=disable'
Two details are easy to miss:
gp_role: masteris for the coordinator / master node, and business access usually lands there.gp_role: segmentnodes usually needpg_exporterto connect inutilitymode for monitoring.
Client Access
For application and BI access, Cloudberry still exposes the PostgreSQL wire protocol, so most PostgreSQL-compatible clients, drivers, and BI tools can connect without special handling.
But keep the following in mind:
- Applications and analytics queries should connect to the master / coordinator, not directly to segment nodes.
- Segment nodes are better treated as data/compute shards and monitoring targets.
- If you want a unified access endpoint, you can still use Pigsty’s HAProxy / PgBouncer / DNS service abstractions.
Extensions and Ecosystem
Cloudberry comes from the PostgreSQL ecosystem, but it is not simply “vanilla PostgreSQL plus a few extensions”. For the extension packages already available in Pigsty, it is better to think in two categories:
- Pure SQL objects or components with weak ABI coupling are usually easier to adapt.
- Extensions that depend on PGXS or the kernel C ABI often need separate validation or even recompilation against the Cloudberry version and toolchain.
If your workload depends on postgis, vector extensions, FDWs, auditing, or custom C extensions, validate them on the target Cloudberry version first rather than copying a vanilla PostgreSQL extension list unchanged.
Notes
- Cloudberry currently has no dedicated Pigsty template, so you should model it manually with
gpsqlmode. - The current delivery focus is packages, configuration, and monitoring; it does not replace the official Cloudberry MPP initialization and scale-out toolchain.
- Because this is an MPP distributed kernel, vanilla PostgreSQL operational assumptions do not automatically transfer to every Patroni / PgBouncer / PgBackRest node role.
- If you need horizontal PostgreSQL scaling rather than a full MPP warehouse, Citus is usually the better first choice.
Related Docs
Feedback
Was this page helpful?
Thanks for the feedback! Please let us know how we can improve.
Sorry to hear that. Please let us know how we can improve.