pg4ml
Overview
| Package | Version | Category | License | Language |
|---|---|---|---|---|
pg4ml | 2.0 | RAG | AGPL-3.0 | C |
| ID | Extension | Bin | Lib | Load | Create | Trust | Reloc | Schema |
|---|---|---|---|---|---|---|---|---|
| 1880 | pg4ml | No | No | No | Yes | Yes | Yes | - |
| Related | plpgsql tablefunc cube plpython3u pgml vectorize pg_summarize pg_tiktoken vector vchord vectorscale pg_strom |
|---|
require python3
Version
| Type | Repo | Version | PG Ver | Package | Deps |
|---|---|---|---|---|---|
| EXT | PIGSTY | 2.0 | 1817161514 | pg4ml | plpgsql, tablefunc, cube, plpython3u |
| RPM | PIGSTY | 2.0 | 1817161514 | pg4ml_$v | - |
| DEB | PIGSTY | 2.0 | 1817161514 | postgresql-$v-pg4ml | - |
Build
You can build the RPM / DEB packages for pg4ml using pig build:
pig build pkg pg4ml # build RPM / DEB packages
Install
You can install pg4ml directly. First, make sure the PGDG and PIGSTY repositories are added and enabled:
pig repo add pgsql -u # Add repo and update cache
Install the extension using pig or apt/yum/dnf:
pig install pg4ml; # Install for current active PG version
pig ext install -y pg4ml -v 18 # PG 18
pig ext install -y pg4ml -v 17 # PG 17
pig ext install -y pg4ml -v 16 # PG 16
pig ext install -y pg4ml -v 15 # PG 15
pig ext install -y pg4ml -v 14 # PG 14
dnf install -y pg4ml_18 # PG 18
dnf install -y pg4ml_17 # PG 17
dnf install -y pg4ml_16 # PG 16
dnf install -y pg4ml_15 # PG 15
dnf install -y pg4ml_14 # PG 14
apt install -y postgresql-18-pg4ml # PG 18
apt install -y postgresql-17-pg4ml # PG 17
apt install -y postgresql-16-pg4ml # PG 16
apt install -y postgresql-15-pg4ml # PG 15
apt install -y postgresql-14-pg4ml # PG 14
Create Extension:
CREATE EXTENSION pg4ml CASCADE; -- requires: plpgsql, tablefunc, cube, plpython3u
Usage
pg4ml: Machine learning framework for PostgreSQL. Source: README.md
pg4ml is a PostgreSQL extension that implements a machine learning framework entirely within the database using PL/pgSQL and PL/Python. It provides matrix operations, neural network construction and training, clustering algorithms, and scientific computing – all through SQL.
Prerequisites
- PostgreSQL >= 14 with Python3 support
- Required extensions:
plpgsql,tablefunc,cube,plpython3u
Getting Started
CREATE EXTENSION pg4ml CASCADE;
-- This will also create the required dependencies: plpgsql, tablefunc, cube, plpython3u
Features
Matrix Operations
The framework provides a comprehensive matrix operation library under the sm_sc schema:
- Element-wise operations: arithmetic, comparison, rounding, concatenation, boolean, bitwise, complex number, and broadcast operations
- Matrix operations: multiplication, transpose, flip, rotate, concatenation
- Construction: sampling, replacement, padding, character matching, random generation
- Trigonometric functions: broadcast operations on matrices
- Aggregation: slice-level aggregation, matrix-level aggregation, sorting by slice values, locating extremum positions
Slice Aggregation Examples
Average over vertical slices (groups of 2):
SELECT sm_sc.fv_aggr_slice_avg(
array[[1.5, 11.5],
[2.1, 12.1],
[3.3, 13.3],
[4.3, 14.3],
[5.5, 15.5],
[6.1, 16.1]],
array[2, 1]
);
-- Returns: array[[1.8, 11.8],[3.8, 13.8],[5.8, 15.8]]
Max pooling over 2x3 blocks:
SELECT sm_sc.fv_aggr_slice_max(
array[[2.3, 5.1, 8.2, 2.56, 3.33, -1.9],
[3.25, 6.4, 6.6, 6.9, -2.65, -4.6],
[-2.3, 5.1, -8.2, 2.56, -3.33, -1.9],
[3.25, -6.4, -6.6, 6.9, -2.65, -4.6]],
array[2, 3]
);
-- Returns: array[[8.2, 6.9],[5.1, 6.9]]
Neural Networks
The framework supports deep neural network construction and training:
- Node and Path tables:
sm_sc.tb_nn_node/sm_sc.tb_nn_pathfor defining network structure - Training input buffer:
sm_sc.tb_nn_train_input_bufffor receiving training data - Task management:
sm_sc.tb_classify_taskfor deploying and managing training tasks - Activation functions, convolution, pooling, lambda operations
- Loss functions, derivative computation, backpropagation
- Inference:
sm_sc.ft_nn_in_outfor running test/validation data through a trained model
Clustering
- K-means++: via
sm_sc.prc_kmeans_ppprocedure - DBSCAN: via
sm_sc.prc_dbscan_ppprocedure
Both use sm_sc.tb_cluster_task for task deployment and management.
Scientific Computing
- Waveform processing
- Computational graph JSON serialization/deserialization
- Complex number operations
- Linear algebra
Performance Tips
- Enable debug mode with:
SET session pg4ml._v_is_debug_check = '1'; - Matrix multiplication uses
plpython3uto call numpy for optimization - Adjust PostgreSQL parallel parameters for multi-threaded training:
max_parallel_workers_per_gatherforce_parallel_modeparallel_setup_cost,parallel_tuple_cost
- Consider using
pg_stromextension for GPU acceleration
Feedback
Was this page helpful?
Thanks for the feedback! Please let us know how we can improve.
Sorry to hear that. Please let us know how we can improve.