pg4ml

Machine learning framework for PostgreSQL

Overview

PackageVersionCategoryLicenseLanguage
pg4ml2.0RAGAGPL-3.0C
IDExtensionBinLibLoadCreateTrustRelocSchema
1880pg4mlNoNoNoYesYesYes-
Relatedplpgsql tablefunc cube plpython3u pgml vectorize pg_summarize pg_tiktoken vector vchord vectorscale pg_strom

require python3

Version

TypeRepoVersionPG VerPackageDeps
EXTPIGSTY2.01817161514pg4mlplpgsql, tablefunc, cube, plpython3u
RPMPIGSTY2.01817161514pg4ml_$v-
DEBPIGSTY2.01817161514postgresql-$v-pg4ml-
OS / PGPG18PG17PG16PG15PG14
el8.x86_64
el8.aarch64
el9.x86_64
el9.aarch64
el10.x86_64
el10.aarch64
d12.x86_64
d12.aarch64
d13.x86_64
d13.aarch64
u22.x86_64
u22.aarch64
PIGSTY 2.0
PIGSTY 2.0
PIGSTY 2.0
PIGSTY 2.0
PIGSTY 2.0
u24.x86_64
u24.aarch64
PIGSTY 2.0
PIGSTY 2.0
PIGSTY 2.0
PIGSTY 2.0
PIGSTY 2.0

Build

You can build the RPM / DEB packages for pg4ml using pig build:

pig build pkg pg4ml         # build RPM / DEB packages

Install

You can install pg4ml directly. First, make sure the PGDG and PIGSTY repositories are added and enabled:

pig repo add pgsql -u          # Add repo and update cache

Install the extension using pig or apt/yum/dnf:

pig install pg4ml;          # Install for current active PG version
pig ext install -y pg4ml -v 18  # PG 18
pig ext install -y pg4ml -v 17  # PG 17
pig ext install -y pg4ml -v 16  # PG 16
pig ext install -y pg4ml -v 15  # PG 15
pig ext install -y pg4ml -v 14  # PG 14
dnf install -y pg4ml_18       # PG 18
dnf install -y pg4ml_17       # PG 17
dnf install -y pg4ml_16       # PG 16
dnf install -y pg4ml_15       # PG 15
dnf install -y pg4ml_14       # PG 14
apt install -y postgresql-18-pg4ml   # PG 18
apt install -y postgresql-17-pg4ml   # PG 17
apt install -y postgresql-16-pg4ml   # PG 16
apt install -y postgresql-15-pg4ml   # PG 15
apt install -y postgresql-14-pg4ml   # PG 14

Create Extension:

CREATE EXTENSION pg4ml CASCADE;  -- requires: plpgsql, tablefunc, cube, plpython3u

Usage

pg4ml: Machine learning framework for PostgreSQL. Source: README.md

pg4ml is a PostgreSQL extension that implements a machine learning framework entirely within the database using PL/pgSQL and PL/Python. It provides matrix operations, neural network construction and training, clustering algorithms, and scientific computing – all through SQL.


Prerequisites

  • PostgreSQL >= 14 with Python3 support
  • Required extensions: plpgsql, tablefunc, cube, plpython3u

Getting Started

CREATE EXTENSION pg4ml CASCADE;
-- This will also create the required dependencies: plpgsql, tablefunc, cube, plpython3u

Features

Matrix Operations

The framework provides a comprehensive matrix operation library under the sm_sc schema:

  • Element-wise operations: arithmetic, comparison, rounding, concatenation, boolean, bitwise, complex number, and broadcast operations
  • Matrix operations: multiplication, transpose, flip, rotate, concatenation
  • Construction: sampling, replacement, padding, character matching, random generation
  • Trigonometric functions: broadcast operations on matrices
  • Aggregation: slice-level aggregation, matrix-level aggregation, sorting by slice values, locating extremum positions

Slice Aggregation Examples

Average over vertical slices (groups of 2):

SELECT sm_sc.fv_aggr_slice_avg(
    array[[1.5, 11.5],
          [2.1, 12.1],
          [3.3, 13.3],
          [4.3, 14.3],
          [5.5, 15.5],
          [6.1, 16.1]],
    array[2, 1]
);
-- Returns: array[[1.8, 11.8],[3.8, 13.8],[5.8, 15.8]]

Max pooling over 2x3 blocks:

SELECT sm_sc.fv_aggr_slice_max(
    array[[2.3, 5.1, 8.2, 2.56, 3.33, -1.9],
          [3.25, 6.4, 6.6, 6.9, -2.65, -4.6],
          [-2.3, 5.1, -8.2, 2.56, -3.33, -1.9],
          [3.25, -6.4, -6.6, 6.9, -2.65, -4.6]],
    array[2, 3]
);
-- Returns: array[[8.2, 6.9],[5.1, 6.9]]

Neural Networks

The framework supports deep neural network construction and training:

  • Node and Path tables: sm_sc.tb_nn_node / sm_sc.tb_nn_path for defining network structure
  • Training input buffer: sm_sc.tb_nn_train_input_buff for receiving training data
  • Task management: sm_sc.tb_classify_task for deploying and managing training tasks
  • Activation functions, convolution, pooling, lambda operations
  • Loss functions, derivative computation, backpropagation
  • Inference: sm_sc.ft_nn_in_out for running test/validation data through a trained model

Clustering

  • K-means++: via sm_sc.prc_kmeans_pp procedure
  • DBSCAN: via sm_sc.prc_dbscan_pp procedure

Both use sm_sc.tb_cluster_task for task deployment and management.

Scientific Computing

  • Waveform processing
  • Computational graph JSON serialization/deserialization
  • Complex number operations
  • Linear algebra

Performance Tips

  • Enable debug mode with: SET session pg4ml._v_is_debug_check = '1';
  • Matrix multiplication uses plpython3u to call numpy for optimization
  • Adjust PostgreSQL parallel parameters for multi-threaded training:
    • max_parallel_workers_per_gather
    • force_parallel_mode
    • parallel_setup_cost, parallel_tuple_cost
  • Consider using pg_strom extension for GPU acceleration

Last Modified 2026-03-12: add pg extension catalog (95749bf)