vchord
Module:
Categories:
Overview
PIGSTY 3rd Party Extension: vchord
: Vector database plugin for Postgres, written in Rust
Information
- Extension ID: 1810
- Extension Name:
vchord
- Package Name:
vchord
- Category:
RAG
- License: AGPLv3
- Website: https://github.com/tensorchord/VectorChord
- Language: Rust
- Extra Tags:
pgrx
- Comment:
Metadata
- Latest Version: 0.2.0
- Postgres Support:
17
,16
,15
,14
- Need Load: Explicit Loading Required
- Need DDL: Need
CREATE EXTENSION
DDL - Relocatable: Can not install to arbitrary schema
- Trusted: Untrusted, Require Superuser to Create
- Schemas: N/A
- Requires:
vector
RPM / DEB
- RPM Repo: PIGSTY
- RPM Name:
vchord_$v
- RPM Ver :
0.2.0
- RPM Deps:
pgvector_$v
- DEB Repo: PIGSTY
- DEB Name:
postgresql-$v-vchord
- DEB Ver :
0.1.0
- DEB Deps:
postgresql-$v-pgvector
Packages
OS | Arch | PG17 | PG16 | PG15 | PG14 | PG13 |
---|---|---|---|---|---|---|
el8 |
x86_64 |
vchord_17 PIGSTY 0.2.0 |
vchord_16 PIGSTY 0.2.0 |
vchord_15 PIGSTY 0.2.0 |
vchord_14 PIGSTY 0.2.0 |
|
el8 |
aarch64 |
vchord_17 PIGSTY 0.2.0 |
vchord_16 PIGSTY 0.2.0 |
vchord_15 PIGSTY 0.2.0 |
vchord_14 PIGSTY 0.2.0 |
|
el9 |
x86_64 |
vchord_17 PIGSTY 0.2.0 |
vchord_16 PIGSTY 0.2.0 |
vchord_15 PIGSTY 0.2.0 |
vchord_14 PIGSTY 0.2.0 |
|
el9 |
aarch64 |
vchord_17 PIGSTY 0.2.0 |
vchord_16 PIGSTY 0.2.0 |
vchord_15 PIGSTY 0.2.0 |
vchord_14 PIGSTY 0.2.0 |
|
d12 |
x86_64 |
postgresql-17-vchord PIGSTY 0.1.0 |
postgresql-16-vchord PIGSTY 0.1.0 |
postgresql-15-vchord PIGSTY 0.1.0 |
postgresql-14-vchord PIGSTY 0.1.0 |
|
d12 |
aarch64 |
postgresql-17-vchord PIGSTY 0.1.0 |
postgresql-16-vchord PIGSTY 0.1.0 |
postgresql-15-vchord PIGSTY 0.1.0 |
postgresql-14-vchord PIGSTY 0.1.0 |
|
u22 |
x86_64 |
postgresql-17-vchord PIGSTY 0.1.0 |
postgresql-16-vchord PIGSTY 0.1.0 |
postgresql-15-vchord PIGSTY 0.1.0 |
postgresql-14-vchord PIGSTY 0.1.0 |
|
u22 |
aarch64 |
postgresql-17-vchord PIGSTY 0.1.0 |
postgresql-16-vchord PIGSTY 0.1.0 |
postgresql-15-vchord PIGSTY 0.1.0 |
postgresql-14-vchord PIGSTY 0.1.0 |
|
u24 |
x86_64 |
postgresql-17-vchord PIGSTY 0.2.0 |
postgresql-16-vchord PIGSTY 0.2.0 |
postgresql-15-vchord PIGSTY 0.2.0 |
postgresql-14-vchord PIGSTY 0.2.0 |
|
u24 |
aarch64 |
postgresql-17-vchord PIGSTY 0.2.0 |
postgresql-16-vchord PIGSTY 0.2.0 |
postgresql-15-vchord PIGSTY 0.2.0 |
postgresql-14-vchord PIGSTY 0.2.0 |
Installation
Install vchord
via the pig
CLI tool:
pig ext install vchord
Install vchord
via Pigsty playbook:
./pgsql.yml -t pg_extension -e '{"pg_extensions": ["vchord"]}' # -l <cls>
Install vchord
RPM from YUM repo directly:
dnf install vchord_17;
dnf install vchord_16;
dnf install vchord_15;
dnf install vchord_14;
Install vchord
DEB from APT repo directly:
apt install postgresql-17-vchord;
apt install postgresql-16-vchord;
apt install postgresql-15-vchord;
apt install postgresql-14-vchord;
Extension vchord
has to be added to shared_preload_libraries
shared_preload_libraries = 'vchord'; # add to pg cluster config
Enable vchord
extension on PostgreSQL cluster:
CREATE EXTENSION vchord CASCADE;
Usage
- https://github.com/tensorchord/VectorChord
- Launch Blog: VectorChord: Store 400k Vectors for $1 in PostgreSQL
Add this extension to shared_preload_libraries in postgresql.conf
CREATE EXTENSION vchord CASCADE;
Create Index on embedding:
CREATE INDEX ON gist_train USING vchordrq (embedding vector_l2_ops) WITH (options = $$
residual_quantization = true
[build.internal]
lists = [4096]
spherical_centroids = false
$$);
Docs
Query
The query statement is exactly the same as pgvector. VectorChord supports any filter operation and WHERE/JOIN clauses like pgvecto.rs with VBASE.
SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;
Supported distance functions are:
- <-> - L2 distance
- <#> - (negative) inner product
- <=> - cosine distance
Query Performance Tuning
You can fine-tune the search performance by adjusting the probes
and epsilon
parameters:
-- Set probes to control the number of lists scanned.
-- Recommended range: 3%–10% of the total `lists` value.
SET vchordrq.probes = 100;
-- Set epsilon to control the reranking precision.
-- Larger value means more rerank for higher recall rate.
-- Don't change it unless you only have limited memory.
-- Recommended range: 1.0–1.9. Default value is 1.9.
SET vchordrq.epsilon = 1.9;
-- vchordrq relies on a projection matrix to optimize performance.
-- Add your vector dimensions to the `prewarm_dim` list to reduce latency.
-- If this is not configured, the first query will have higher latency as the matrix is generated on demand.
-- Default value: '64,128,256,384,512,768,1024,1536'
-- Note: This setting requires a database restart to take effect.
ALTER SYSTEM SET vchordrq.prewarm_dim = '64,128,256,384,512,768,1024,1536';
And for postgres’s setting
-- If using SSDs, set `effective_io_concurrency` to 200 for faster disk I/O.
SET effective_io_concurrency = 200;
-- Disable JIT (Just-In-Time Compilation) as it offers minimal benefit (1–2%)
-- and adds overhead for single-query workloads.
SET jit = off;
-- Allocate at least 25% of total memory to `shared_buffers`.
-- For disk-heavy workloads, you can increase this to up to 90% of total memory. You may also want to disable swap with network storage to avoid io hang.
-- Note: A restart is required for this setting to take effect.
ALTER SYSTEM SET shared_buffers = '8GB';
Indexing prewarm
To prewarm the index, you can use the following SQL. It will significantly improve performance when using limited memory.
-- vchordrq_prewarm(index_name::regclass) to prewarm the index into the shared buffer
SELECT vchordrq_prewarm('gist_train_embedding_idx'::regclass)"
Index Build Time
Index building can parallelized, and with external centroid precomputation, the total time is primarily limited by disk speed. Optimize parallelism using the following settings:
-- Set this to the number of CPU cores available for parallel operations.
SET max_parallel_maintenance_workers = 8;
SET max_parallel_workers = 8;
-- Adjust the total number of worker processes.
-- Note: A restart is required for this setting to take effect.
ALTER SYSTEM SET max_worker_processes = 8;
Indexing Progress
You can check the indexing progress by querying the pg_stat_progress_create_index
view.
SELECT phase, round(100.0 * blocks_done / nullif(blocks_total, 0), 1) AS "%" FROM pg_stat_progress_create_index;
External Index Precomputation
Unlike pure SQL, an external index precomputation will first do clustering outside and insert centroids to a PostgreSQL table. Although it might be more complicated, external build is definitely much faster on larger dataset (>5M).
To get started, you need to do a clustering of vectors using faiss
, scikit-learn
or any other clustering library.
The centroids should be preset in a table of any name with 3 columns:
- id(integer): id of each centroid, should be unique
- parent(integer, nullable): parent id of each centroid, should be NULL for normal clustering
- vector(vector): representation of each centroid,
pgvector
vector type
And example could be like this:
-- Create table of centroids
CREATE TABLE public.centroids (id integer NOT NULL UNIQUE, parent integer, vector vector(768));
-- Insert centroids into it
INSERT INTO public.centroids (id, parent, vector) VALUES (1, NULL, '{0.1, 0.2, 0.3, ..., 0.768}');
INSERT INTO public.centroids (id, parent, vector) VALUES (2, NULL, '{0.4, 0.5, 0.6, ..., 0.768}');
INSERT INTO public.centroids (id, parent, vector) VALUES (3, NULL, '{0.7, 0.8, 0.9, ..., 0.768}');
-- ...
-- Create index using the centroid table
CREATE INDEX ON gist_train USING vchordrq (embedding vector_l2_ops) WITH (options = $$
[build.external]
table = 'public.centroids'
$$);
To simplify the workflow, we provide end-to-end scripts for external index pre-computation, see scripts.
Limitations
- Data Type Support: Currently, only the
f32
data type is supported for vectors. - Architecture Compatibility: The fast-scan kernel is optimized for x86_64 architectures. While it runs on aarch64, performance may be lower.
- KMeans Clustering: The built-in KMeans clustering is not yet fully optimized and may require substantial memory. We strongly recommend using external centroid precomputation for efficient index construction.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.