This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
Category: RAG
RAG: Vector Database with IVFFLAT, HNSW, DiskANN Indexes, AI & ML in SQL interface, Similarity Funcs, etc…
RAG category has 10 available extensions:
Name (Detail) |
Package (Repo) |
Repo |
Version |
Category |
License |
LOAD |
DDL |
TRUST |
RELOC |
Description |
vector |
pgvector |
PGDG |
0.8.0 |
RAG |
PostgreSQL |
|
|
|
|
vector data type and ivfflat and hnsw access methods |
vchord |
vchord |
PIGSTY |
0.2.0 |
RAG |
AGPLv3 |
|
|
|
|
Vector database plugin for Postgres, written in Rust |
vectorscale |
pgvectorscale |
PIGSTY |
0.5.1 |
RAG |
PostgreSQL |
|
|
|
|
pgvectorscale: Advanced indexing for vector data |
vectorize |
pg_vectorize |
PIGSTY |
0.21.1 |
RAG |
PostgreSQL |
|
|
|
|
The simplest way to do vector search on Postgres |
pg_similarity |
pg_similarity |
MIXED |
1.0 |
RAG |
BSD 3 |
|
|
|
|
support similarity queries |
smlar |
smlar |
PIGSTY |
1.0 |
RAG |
PostgreSQL |
|
|
|
|
Effective similarity search |
pg_summarize |
pg_summarize |
PIGSTY |
0.0.1 |
RAG |
PostgreSQL |
|
|
|
|
Text Summarization using LLMs. Built using pgrx |
pg_tiktoken |
pg_tiktoken |
PIGSTY |
0.0.1 |
RAG |
Apache-2.0 |
|
|
|
|
pg_tictoken: tiktoken tokenizer for use with OpenAI models in postgres |
pg4ml |
pg4ml |
PIGSTY |
2.0 |
RAG |
AGPLv3 |
|
|
|
|
Machine learning framework for PostgreSQL |
pgml |
pgml |
PIGSTY |
2.10.0 |
RAG |
MIT |
|
|
|
|
PostgresML: Run AL/ML workloads with SQL interface |
1 - vector
vector data type and ivfflat and hnsw access methods
Overview
PGDG 1st Party Extension: pgvector
: vector data type and ivfflat and hnsw access methods
- Latest Version: 0.8.0
- Postgres Support:
17
,16
,15
,14
,13
- Need Load: Shared library do not need explicit loading
- Need DDL: Need
CREATE EXTENSION
DDL
- Relocatable: Can not install to arbitrary schema
- Trusted: Untrusted, Require Superuser to Create
- Schemas: N/A
- Requires: N/A
RPM / DEB
- RPM Repo: PGDG
- RPM Name:
pgvector_$v*
- RPM Ver :
0.8.0
- RPM Deps: N/A
- DEB Repo: PGDG
- DEB Name:
postgresql-$v-pgvector
- DEB Ver :
0.8.0
- DEB Deps: N/A
Packages
Installation
Install vector
via the pig
CLI tool:
pig ext install pgvector; # Extension Namepig ext install vector; # normalized package name
Install pgvector
via Pigsty playbook:
./pgsql.yml -t pg_extension -e '{"pg_extensions": ["pgvector"]}' # -l <cls>
Install pgvector
RPM from YUM repo directly:
dnf install pgvector_17*;
dnf install pgvector_16*;
dnf install pgvector_15*;
dnf install pgvector_14*;
dnf install pgvector_13*;
Install pgvector
DEB from APT repo directly:
apt install postgresql-17-pgvector;
apt install postgresql-16-pgvector;
apt install postgresql-15-pgvector;
apt install postgresql-14-pgvector;
apt install postgresql-13-pgvector;
Enable vector
extension on PostgreSQL cluster:
2 - vchord
Vector database plugin for Postgres, written in Rust
Overview
PIGSTY 3rd Party Extension: vchord
: Vector database plugin for Postgres, written in Rust
- Latest Version: 0.2.0
- Postgres Support:
17
,16
,15
,14
- Need Load: Explicit Loading Required
- Need DDL: Need
CREATE EXTENSION
DDL
- Relocatable: Can not install to arbitrary schema
- Trusted: Untrusted, Require Superuser to Create
- Schemas: N/A
- Requires:
vector
RPM / DEB
- RPM Repo: PIGSTY
- RPM Name:
vchord_$v
- RPM Ver :
0.2.0
- RPM Deps:
pgvector_$v
- DEB Repo: PIGSTY
- DEB Name:
postgresql-$v-vchord
- DEB Ver :
0.1.0
- DEB Deps:
postgresql-$v-pgvector
Packages
Installation
Install vchord
via the pig
CLI tool:
Install vchord
via Pigsty playbook:
./pgsql.yml -t pg_extension -e '{"pg_extensions": ["vchord"]}' # -l <cls>
Install vchord
RPM from YUM repo directly:
dnf install vchord_17;
dnf install vchord_16;
dnf install vchord_15;
dnf install vchord_14;
Install vchord
DEB from APT repo directly:
apt install postgresql-17-vchord;
apt install postgresql-16-vchord;
apt install postgresql-15-vchord;
apt install postgresql-14-vchord;
Extension vchord
has to be added to shared_preload_libraries
shared_preload_libraries = 'vchord'; # add to pg cluster config
Enable vchord
extension on PostgreSQL cluster:
CREATE EXTENSION vchord CASCADE;
Usage
Add this extension to shared_preload_libraries in postgresql.conf
CREATE EXTENSION vchord CASCADE;
Create Index on embedding:
CREATE INDEX ON gist_train USING vchordrq (embedding vector_l2_ops) WITH (options = $$
residual_quantization = true
[build.internal]
lists = [4096]
spherical_centroids = false
$$);
Docs
Query
The query statement is exactly the same as pgvector. VectorChord supports any filter operation and WHERE/JOIN clauses like pgvecto.rs with VBASE.
SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;
Supported distance functions are:
- <-> - L2 distance
- <#> - (negative) inner product
- <=> - cosine distance
You can fine-tune the search performance by adjusting the probes
and epsilon
parameters:
-- Set probes to control the number of lists scanned.
-- Recommended range: 3%–10% of the total `lists` value.
SET vchordrq.probes = 100;
-- Set epsilon to control the reranking precision.
-- Larger value means more rerank for higher recall rate.
-- Don't change it unless you only have limited memory.
-- Recommended range: 1.0–1.9. Default value is 1.9.
SET vchordrq.epsilon = 1.9;
-- vchordrq relies on a projection matrix to optimize performance.
-- Add your vector dimensions to the `prewarm_dim` list to reduce latency.
-- If this is not configured, the first query will have higher latency as the matrix is generated on demand.
-- Default value: '64,128,256,384,512,768,1024,1536'
-- Note: This setting requires a database restart to take effect.
ALTER SYSTEM SET vchordrq.prewarm_dim = '64,128,256,384,512,768,1024,1536';
And for postgres’s setting
-- If using SSDs, set `effective_io_concurrency` to 200 for faster disk I/O.
SET effective_io_concurrency = 200;
-- Disable JIT (Just-In-Time Compilation) as it offers minimal benefit (1–2%)
-- and adds overhead for single-query workloads.
SET jit = off;
-- Allocate at least 25% of total memory to `shared_buffers`.
-- For disk-heavy workloads, you can increase this to up to 90% of total memory. You may also want to disable swap with network storage to avoid io hang.
-- Note: A restart is required for this setting to take effect.
ALTER SYSTEM SET shared_buffers = '8GB';
Indexing prewarm
To prewarm the index, you can use the following SQL. It will significantly improve performance when using limited memory.
-- vchordrq_prewarm(index_name::regclass) to prewarm the index into the shared buffer
SELECT vchordrq_prewarm('gist_train_embedding_idx'::regclass)"
Index Build Time
Index building can parallelized, and with external centroid precomputation, the total time is primarily limited by disk speed. Optimize parallelism using the following settings:
-- Set this to the number of CPU cores available for parallel operations.
SET max_parallel_maintenance_workers = 8;
SET max_parallel_workers = 8;
-- Adjust the total number of worker processes.
-- Note: A restart is required for this setting to take effect.
ALTER SYSTEM SET max_worker_processes = 8;
Indexing Progress
You can check the indexing progress by querying the pg_stat_progress_create_index
view.
SELECT phase, round(100.0 * blocks_done / nullif(blocks_total, 0), 1) AS "%" FROM pg_stat_progress_create_index;
External Index Precomputation
Unlike pure SQL, an external index precomputation will first do clustering outside and insert centroids to a PostgreSQL table. Although it might be more complicated, external build is definitely much faster on larger dataset (>5M).
To get started, you need to do a clustering of vectors using faiss
, scikit-learn
or any other clustering library.
The centroids should be preset in a table of any name with 3 columns:
- id(integer): id of each centroid, should be unique
- parent(integer, nullable): parent id of each centroid, should be NULL for normal clustering
- vector(vector): representation of each centroid,
pgvector
vector type
And example could be like this:
-- Create table of centroids
CREATE TABLE public.centroids (id integer NOT NULL UNIQUE, parent integer, vector vector(768));
-- Insert centroids into it
INSERT INTO public.centroids (id, parent, vector) VALUES (1, NULL, '{0.1, 0.2, 0.3, ..., 0.768}');
INSERT INTO public.centroids (id, parent, vector) VALUES (2, NULL, '{0.4, 0.5, 0.6, ..., 0.768}');
INSERT INTO public.centroids (id, parent, vector) VALUES (3, NULL, '{0.7, 0.8, 0.9, ..., 0.768}');
-- ...
-- Create index using the centroid table
CREATE INDEX ON gist_train USING vchordrq (embedding vector_l2_ops) WITH (options = $$
[build.external]
table = 'public.centroids'
$$);
To simplify the workflow, we provide end-to-end scripts for external index pre-computation, see scripts.
Limitations
- Data Type Support: Currently, only the
f32
data type is supported for vectors.
- Architecture Compatibility: The fast-scan kernel is optimized for x86_64 architectures. While it runs on aarch64, performance may be lower.
- KMeans Clustering: The built-in KMeans clustering is not yet fully optimized and may require substantial memory. We strongly recommend using external centroid precomputation for efficient index construction.
3 - vectorscale
pgvectorscale: Advanced indexing for vector data
Overview
PIGSTY 3rd Party Extension: pgvectorscale
: pgvectorscale: Advanced indexing for vector data
- Latest Version: 0.5.1
- Postgres Support:
17
,16
,15
,14
,13
- Need Load: Shared library do not need explicit loading
- Need DDL: Need
CREATE EXTENSION
DDL
- Relocatable: Can be installed into other schemas
- Trusted: Untrusted, Require Superuser to Create
- Schemas: N/A
- Requires:
vector
RPM / DEB
- RPM Repo: PIGSTY
- RPM Name:
pgvectorscale_$v
- RPM Ver :
0.5.1
- RPM Deps: N/A
- DEB Repo: PIGSTY
- DEB Name:
postgresql-$v-pgvectorscale
- DEB Ver :
0.5.1
- DEB Deps: N/A
Packages
Installation
Install vectorscale
via the pig
CLI tool:
pig ext install pgvectorscale; # Extension Namepig ext install vectorscale; # normalized package name
Install pgvectorscale
via Pigsty playbook:
./pgsql.yml -t pg_extension -e '{"pg_extensions": ["pgvectorscale"]}' # -l <cls>
Install pgvectorscale
RPM from YUM repo directly:
dnf install pgvectorscale_17;
dnf install pgvectorscale_16;
dnf install pgvectorscale_15;
dnf install pgvectorscale_14;
dnf install pgvectorscale_13;
Install pgvectorscale
DEB from APT repo directly:
apt install postgresql-17-pgvectorscale;
apt install postgresql-16-pgvectorscale;
apt install postgresql-15-pgvectorscale;
apt install postgresql-14-pgvectorscale;
apt install postgresql-13-pgvectorscale;
Enable vectorscale
extension on PostgreSQL cluster:
CREATE EXTENSION vectorscale CASCADE;
4 - vectorize
The simplest way to do vector search on Postgres
Overview
PIGSTY 3rd Party Extension: pg_vectorize
: The simplest way to do vector search on Postgres
- Latest Version: 0.21.1
- Postgres Support:
17
,16
,15
,14
- Need Load: Shared library do not need explicit loading
- Need DDL: Need
CREATE EXTENSION
DDL
- Relocatable: Can be installed into other schemas
- Trusted: Untrusted, Require Superuser to Create
- Schemas:
vectorize
- Requires:
pg_cron
, pgmq
, vector
RPM / DEB
- RPM Repo: PIGSTY
- RPM Name:
pg_vectorize_$v
- RPM Ver :
0.21.1
- RPM Deps:
pgmq_$v
, pg_cron_$v
, pgvector_$v
- DEB Repo: PIGSTY
- DEB Name:
postgresql-$v-pg-vectorize
- DEB Ver :
0.21.1
- DEB Deps:
postgresql-$v-pgmq
, postgresql-$v-pg-cron
, postgresql-$v-pgvector
Packages
Installation
Install vectorize
via the pig
CLI tool:
pig ext install pg_vectorize; # Extension Namepig ext install vectorize; # normalized package name
Install pg_vectorize
via Pigsty playbook:
./pgsql.yml -t pg_extension -e '{"pg_extensions": ["pg_vectorize"]}' # -l <cls>
Install pg_vectorize
RPM from YUM repo directly:
dnf install pg_vectorize_17;
dnf install pg_vectorize_16;
dnf install pg_vectorize_15;
dnf install pg_vectorize_14;
Install pg_vectorize
DEB from APT repo directly:
apt install postgresql-17-pg-vectorize;
apt install postgresql-16-pg-vectorize;
apt install postgresql-15-pg-vectorize;
apt install postgresql-14-pg-vectorize;
Enable vectorize
extension on PostgreSQL cluster:
CREATE EXTENSION vectorize CASCADE;
5 - pg_similarity
support similarity queries
Overview
MIXED 3rd Party Extension: pg_similarity
: support similarity queries
- Latest Version: 1.0
- Postgres Support:
17
,16
,15
,14
,13
- Need Load: Shared library do not need explicit loading
- Need DDL: Need
CREATE EXTENSION
DDL
- Relocatable: Can not install to arbitrary schema
- Trusted: Untrusted, Require Superuser to Create
- Schemas: N/A
- Requires: N/A
RPM / DEB
- RPM Repo: PIGSTY
- RPM Name:
pg_similarity_$v*
- RPM Ver :
1.0
- RPM Deps: N/A
- DEB Repo: PGDG
- DEB Name:
postgresql-$v-similarity
- DEB Ver :
1.0
- DEB Deps: N/A
Packages
Installation
Install pg_similarity
via the pig
CLI tool:
pig ext install pg_similarity
Install pg_similarity
via Pigsty playbook:
./pgsql.yml -t pg_extension -e '{"pg_extensions": ["pg_similarity"]}' # -l <cls>
Install pg_similarity
RPM from YUM repo directly:
dnf install pg_similarity_17*;
dnf install pg_similarity_16*;
dnf install pg_similarity_15*;
dnf install pg_similarity_14*;
dnf install pg_similarity_13*;
Install pg_similarity
DEB from APT repo directly:
apt install postgresql-17-similarity;
apt install postgresql-16-similarity;
apt install postgresql-15-similarity;
apt install postgresql-14-similarity;
apt install postgresql-13-similarity;
Enable pg_similarity
extension on PostgreSQL cluster:
CREATE EXTENSION pg_similarity;
6 - smlar
Effective similarity search
Overview
PIGSTY 3rd Party Extension: smlar
: Effective similarity search
- Latest Version: 1.0
- Postgres Support:
17
,16
,15
,14
,13
- Need Load: Shared library do not need explicit loading
- Need DDL: Need
CREATE EXTENSION
DDL
- Relocatable: Can not install to arbitrary schema
- Trusted: Unknown
- Schemas: N/A
- Requires: N/A
RPM / DEB
- RPM Repo: PIGSTY
- RPM Name:
smlar_$v*
- RPM Ver :
1.0
- RPM Deps: N/A
- DEB Repo: PIGSTY
- DEB Name:
postgresql-$v-smlar
- DEB Ver :
1.0
- DEB Deps: N/A
Packages
Installation
Install smlar
via the pig
CLI tool:
Install smlar
via Pigsty playbook:
./pgsql.yml -t pg_extension -e '{"pg_extensions": ["smlar"]}' # -l <cls>
Install smlar
RPM from YUM repo directly:
dnf install smlar_17*;
dnf install smlar_16*;
dnf install smlar_15*;
dnf install smlar_14*;
dnf install smlar_13*;
Install smlar
DEB from APT repo directly:
apt install postgresql-17-smlar;
apt install postgresql-16-smlar;
apt install postgresql-15-smlar;
apt install postgresql-14-smlar;
apt install postgresql-13-smlar;
Enable smlar
extension on PostgreSQL cluster:
7 - pg_summarize
Text Summarization using LLMs. Built using pgrx
Overview
PIGSTY 3rd Party Extension: pg_summarize
: Text Summarization using LLMs. Built using pgrx
- Latest Version: 0.0.1
- Postgres Support:
17
,16
,15
,14
,13
- Need Load: Shared library do not need explicit loading
- Need DDL: Need
CREATE EXTENSION
DDL
- Relocatable: Can be installed into other schemas
- Trusted: Untrusted, Require Superuser to Create
- Schemas: N/A
- Requires: N/A
RPM / DEB
- RPM Repo: PIGSTY
- RPM Name:
pg_summarize_$v
- RPM Ver :
0.0.1
- RPM Deps: N/A
- DEB Repo: PIGSTY
- DEB Name:
postgresql-$v-pg-summarize
- DEB Ver :
0.0.1
- DEB Deps: N/A
Packages
Installation
Install pg_summarize
via the pig
CLI tool:
pig ext install pg_summarize
Install pg_summarize
via Pigsty playbook:
./pgsql.yml -t pg_extension -e '{"pg_extensions": ["pg_summarize"]}' # -l <cls>
Install pg_summarize
RPM from YUM repo directly:
dnf install pg_summarize_17;
dnf install pg_summarize_16;
dnf install pg_summarize_15;
dnf install pg_summarize_14;
dnf install pg_summarize_13;
Install pg_summarize
DEB from APT repo directly:
apt install postgresql-17-pg-summarize;
apt install postgresql-16-pg-summarize;
apt install postgresql-15-pg-summarize;
apt install postgresql-14-pg-summarize;
apt install postgresql-13-pg-summarize;
Enable pg_summarize
extension on PostgreSQL cluster:
CREATE EXTENSION pg_summarize;
8 - pg_tiktoken
pg_tictoken: tiktoken tokenizer for use with OpenAI models in postgres
Overview
PIGSTY 3rd Party Extension: pg_tiktoken
: pg_tictoken: tiktoken tokenizer for use with OpenAI models in postgres
- Latest Version: 0.0.1
- Postgres Support:
17
,16
,15
,14
,13
- Need Load: Shared library do not need explicit loading
- Need DDL: Need
CREATE EXTENSION
DDL
- Relocatable: Can be installed into other schemas
- Trusted: Untrusted, Require Superuser to Create
- Schemas: N/A
- Requires: N/A
RPM / DEB
- RPM Repo: PIGSTY
- RPM Name:
pg_tiktoken_$v
- RPM Ver :
0.0.1
- RPM Deps: N/A
- DEB Repo: PIGSTY
- DEB Name:
postgresql-$v-pg-tiktoken
- DEB Ver :
0.0.1
- DEB Deps: N/A
Packages
Installation
Install pg_tiktoken
via the pig
CLI tool:
pig ext install pg_tiktoken
Install pg_tiktoken
via Pigsty playbook:
./pgsql.yml -t pg_extension -e '{"pg_extensions": ["pg_tiktoken"]}' # -l <cls>
Install pg_tiktoken
RPM from YUM repo directly:
dnf install pg_tiktoken_17;
dnf install pg_tiktoken_16;
dnf install pg_tiktoken_15;
dnf install pg_tiktoken_14;
dnf install pg_tiktoken_13;
Install pg_tiktoken
DEB from APT repo directly:
apt install postgresql-17-pg-tiktoken;
apt install postgresql-16-pg-tiktoken;
apt install postgresql-15-pg-tiktoken;
apt install postgresql-14-pg-tiktoken;
apt install postgresql-13-pg-tiktoken;
Enable pg_tiktoken
extension on PostgreSQL cluster:
CREATE EXTENSION pg_tiktoken;
9 - pg4ml
Machine learning framework for PostgreSQL
Overview
PIGSTY 3rd Party Extension: pg4ml
: Machine learning framework for PostgreSQL
- Latest Version: 2.0
- Postgres Support:
17
,16
,15
,14
,13
- Need Load: Shared library do not need explicit loading
- Need DDL: Need
CREATE EXTENSION
DDL
- Relocatable: Can not install to arbitrary schema
- Trusted: Trusted, Can be created by user with
CREATE
Privilege
- Schemas: N/A
- Requires:
plpgsql
, tablefunc
, cube
, plpython3u
RPM / DEB
- RPM Repo: PIGSTY
- RPM Name:
pg4ml_$v
- RPM Ver :
2.0
- RPM Deps: N/A
- DEB Repo: PIGSTY
- DEB Name:
postgresql-$v-pg4ml
- DEB Ver :
2.0
- DEB Deps: N/A
Packages
Installation
Install pg4ml
via the pig
CLI tool:
Install pg4ml
via Pigsty playbook:
./pgsql.yml -t pg_extension -e '{"pg_extensions": ["pg4ml"]}' # -l <cls>
Install pg4ml
RPM from YUM repo directly:
dnf install pg4ml_17;
dnf install pg4ml_16;
dnf install pg4ml_15;
dnf install pg4ml_14;
dnf install pg4ml_13;
Install pg4ml
DEB from APT repo directly:
apt install postgresql-17-pg4ml;
apt install postgresql-16-pg4ml;
apt install postgresql-15-pg4ml;
apt install postgresql-14-pg4ml;
apt install postgresql-13-pg4ml;
Enable pg4ml
extension on PostgreSQL cluster:
CREATE EXTENSION pg4ml CASCADE;
10 - pgml
PostgresML: Run AL/ML workloads with SQL interface
Overview
PIGSTY 3rd Party Extension: pgml
: PostgresML: Run AL/ML workloads with SQL interface
- Latest Version: 2.10.0
- Postgres Support:
17
,16
,15
,14
- Need Load: Explicit Loading Required
- Need DDL: Need
CREATE EXTENSION
DDL
- Relocatable: Can be installed into other schemas
- Trusted: Untrusted, Require Superuser to Create
- Schemas:
pgml
- Requires: N/A
RPM / DEB
- RPM Repo: None
- RPM Name:
N/A
- RPM Ver : N/A
- RPM Deps: N/A
- DEB Repo: PIGSTY
- DEB Name:
postgresql-$v-pgml
- DEB Ver :
2.10.0
- DEB Deps: N/A
Packages
Installation
Install pgml
via the pig
CLI tool:
Install pgml
via Pigsty playbook:
./pgsql.yml -t pg_extension -e '{"pg_extensions": ["pgml"]}' # -l <cls>
Install pgml
DEB from APT repo directly:
apt install postgresql-17-pgml;
apt install postgresql-16-pgml;
apt install postgresql-15-pgml;
apt install postgresql-14-pgml;
Extension pgml
has to be added to shared_preload_libraries
shared_preload_libraries = 'pgml'; # add to pg cluster config
Enable pgml
extension on PostgreSQL cluster:
Usage
After installing the pgml
extension and python dependencies on all cluster nodes, you can enable pgml
on the PostgreSQL cluster.
Configure cluster with patronictl
command and add pgml
to shared_preload_libraries
, and specify your venv
dir in pgml.venv
:
shared_preload_libraries: pgml, timescaledb, pg_stat_statements, auto_explain
pgml.venv: '/data/pgml'
After that, restart database cluster, and create extension with SQL command:
CREATE EXTENSION vector; -- nice to have pgvector installed too!
CREATE EXTENSION pgml; -- create PostgresML in current database
SELECT pgml.version(); -- print PostgresML version string
If it works, you should see something like:
# create extension pgml;
INFO: Python version: 3.11.2 (main, Oct 5 2023, 16:06:03) [GCC 8.5.0 20210514 (Red Hat 8.5.0-18)]
INFO: Scikit-learn 1.3.0, XGBoost 2.0.0, LightGBM 4.1.0, NumPy 1.26.1
CREATE EXTENSION
# SELECT pgml.version(); -- print PostgresML version string
version
---------
2.7.8
You are all set! Check PostgresML for more details: https://postgresml.org/docs/guides/use-cases/