smlar

Effective similarity search

Overview

PackageVersionCategoryLicenseLanguage
smlar1.0RAGPostgreSQLC
IDExtensionBinLibLoadCreateTrustRelocSchema
1850smlarNoYesNoYesNoYes-
Relatedpg_similarity fuzzystrmatch pg_trgm intarray vector pg_bigm unaccent vchord

fix pg18 break issue by https://github.com/Vonng/smlar

Version

TypeRepoVersionPG VerPackageDeps
EXTPIGSTY1.01817161514smlar-
RPMPIGSTY1.01817161514smlar_$v-
DEBPIGSTY1.01817161514postgresql-$v-smlar-
OS / PGPG18PG17PG16PG15PG14
el8.x86_64
el8.aarch64
el9.x86_64
el9.aarch64
el10.x86_64
el10.aarch64
d12.x86_64
d12.aarch64
d13.x86_64
d13.aarch64
PIGSTY 1.0
PIGSTY 1.0
PIGSTY 1.0
PIGSTY 1.0
PIGSTY 1.0
u22.x86_64
PIGSTY 1.0
PIGSTY 1.0
PIGSTY 1.0
PIGSTY 1.0
PIGSTY 1.0
u22.aarch64
PIGSTY 1.0
PIGSTY 1.0
PIGSTY 1.0
PIGSTY 1.0
PIGSTY 1.0
u24.x86_64
PIGSTY 1.0
PIGSTY 1.0
PIGSTY 1.0
PIGSTY 1.0
PIGSTY 1.0
u24.aarch64
PIGSTY 1.0
PIGSTY 1.0
PIGSTY 1.0
PIGSTY 1.0
PIGSTY 1.0

Build

You can build the RPM / DEB packages for smlar using pig build:

pig build pkg smlar         # build RPM / DEB packages

Install

You can install smlar directly. First, make sure the PGDG and PIGSTY repositories are added and enabled:

pig repo add pgsql -u          # Add repo and update cache

Install the extension using pig or apt/yum/dnf:

pig install smlar;          # Install for current active PG version
pig ext install -y smlar -v 18  # PG 18
pig ext install -y smlar -v 17  # PG 17
pig ext install -y smlar -v 16  # PG 16
pig ext install -y smlar -v 15  # PG 15
pig ext install -y smlar -v 14  # PG 14
dnf install -y smlar_18       # PG 18
dnf install -y smlar_17       # PG 17
dnf install -y smlar_16       # PG 16
dnf install -y smlar_15       # PG 15
dnf install -y smlar_14       # PG 14
apt install -y postgresql-18-smlar   # PG 18
apt install -y postgresql-17-smlar   # PG 17
apt install -y postgresql-16-smlar   # PG 16
apt install -y postgresql-15-smlar   # PG 15
apt install -y postgresql-14-smlar   # PG 14

Create Extension:

CREATE EXTENSION smlar;

Usage

smlar: Effective similarity search for PostgreSQL arrays. Source: README

The smlar extension provides effective similarity search on PostgreSQL arrays using configurable similarity formulas, GiST and GIN index support, and TF/IDF weighting.


Functions

float4 smlar(anyarray, anyarray)

Computes similarity of two arrays. Arrays should be the same type.

float4 smlar(anyarray, anyarray, bool useIntersect)

Computes similarity of two arrays of composite types. Composite type looks like:

CREATE TYPE type_name AS (element_name anytype, weight_name FLOAT4);

The useIntersect option points to use only intersected elements in the denominator.

float4 smlar(anyarray a, anyarray b, text formula)

Computes similarity of two arrays by a given formula. Predefined variables in formula:

  • N.i – number of common elements in both arrays (intersection)
  • N.a – number of unique elements in first array
  • N.b – number of unique elements in second array

Example:

SELECT smlar('{1,4,6}'::int[], '{5,4,6}');
SELECT smlar('{1,4,6}'::int[], '{5,4,6}', 'N.i / sqrt(N.a * N.b)');
-- These two calls are equivalent.
anyarray % anyarray

Returns true if similarity of the arrays is greater than the threshold limit.

text[] tsvector2textarray(tsvector)

Transforms tsvector type to text array.

anyarray array_unique(anyarray)

Sort and unique array.

float4 inarray(anyarray, anyelement)

Returns zero if second argument does not present in the first one and 1.0 in opposite case.

float4 inarray(anyarray, anyelement, float4, float4)

Returns fourth argument if second argument does not present in the first one and third argument in opposite case.


GUC Configuration Variables

smlar.threshold  FLOAT

Arrays with similarity lower than threshold are not similar by % operation.

smlar.persistent_cache  BOOL

Cache of global stat is stored in transaction-independent memory.

smlar.type  STRING

Type of similarity formula: cosine (default), tfidf, overlap.

smlar.stattable  STRING

Name of table storing set-wide statistic. Table should be defined as:

CREATE TABLE table_name (
    value   data_type UNIQUE,
    ndoc    int4 (or bigint)  NOT NULL CHECK (ndoc > 0)
);

A row with null value means total number of documents. Used only for smlar.type = 'tfidf'.

smlar.tf_method  STRING

Calculation method for term frequency. Values:

  • "n" – simple counting of entries (default)
  • "log" – 1 + log(n)
  • "const" – TF is equal to 1

Used only for smlar.type = 'tfidf'.

smlar.idf_plus_one  BOOL

If false (default), calculate idf as log(d/df). If true, as log(1+d/df). Used only for smlar.type = 'tfidf'.

It is highly recommended to add to postgresql.conf:

smlar.threshold = 0.6  # or any other value > 0 and < 1

GiST/GIN Index Support

The % and && operations are supported with GiST and GIN indexes for many array types:

Array TypeGIN operator classGiST operator class
bit[]_bit_sml_ops
bytea[]_bytea_sml_ops_bytea_sml_ops
char[]_char_sml_ops_char_sml_ops
cidr[]_cidr_sml_ops_cidr_sml_ops
date[]_date_sml_ops_date_sml_ops
float4[]_float4_sml_ops_float4_sml_ops
float8[]_float8_sml_ops_float8_sml_ops
inet[]_inet_sml_ops_inet_sml_ops
int2[]_int2_sml_ops_int2_sml_ops
int4[]_int4_sml_ops_int4_sml_ops
int8[]_int8_sml_ops_int8_sml_ops
interval[]_interval_sml_ops_interval_sml_ops
macaddr[]_macaddr_sml_ops_macaddr_sml_ops
money[]_money_sml_ops
numeric[]_numeric_sml_ops_numeric_sml_ops
oid[]_oid_sml_ops_oid_sml_ops
text[]_text_sml_ops_text_sml_ops
time[]_time_sml_ops_time_sml_ops
timestamp[]_timestamp_sml_ops_timestamp_sml_ops
timestamptz[]_timestamptz_sml_ops_timestamptz_sml_ops
timetz[]_timetz_sml_ops_timetz_sml_ops
varbit[]_varbit_sml_ops
varchar[]_varchar_sml_ops_varchar_sml_ops

Last Modified 2026-03-12: add pg extension catalog (95749bf)