datasketches

Approximate analytics sketches and aggregates for PostgreSQL

Overview

PackageVersionCategoryLicenseLanguage
datasketches1.7.0FUNCApache-2.0C++
IDExtensionBinLibLoadCreateTrustRelocSchema
4690datasketchesNoYesNoYesNoYes-

Built against Apache DataSketches C++ core 5.0.0.

Version

TypeRepoVersionPG VerPackageDeps
EXTPIGSTY1.7.01817161514datasketches-
RPMPIGSTY1.7.01817161514datasketches_$v-
DEBPIGSTY1.7.01817161514postgresql-$v-datasketches-
OS / PGPG18PG17PG16PG15PG14
el8.x86_64
el8.aarch64
el9.x86_64
el9.aarch64
el10.x86_64
el10.aarch64
d12.x86_64
d12.aarch64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
d13.x86_64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
d13.aarch64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
u22.x86_64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
u22.aarch64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
u24.x86_64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
u24.aarch64
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0
PIGSTY 1.7.0

Build

You can build the RPM / DEB packages for datasketches using pig build:

pig build pkg datasketches         # build RPM / DEB packages

Install

You can install datasketches directly. First, make sure the PGDG and PIGSTY repositories are added and enabled:

pig repo add pgsql -u          # Add repo and update cache

Install the extension using pig or apt/yum/dnf:

pig install datasketches;          # Install for current active PG version
pig ext install -y datasketches -v 18  # PG 18
pig ext install -y datasketches -v 17  # PG 17
pig ext install -y datasketches -v 16  # PG 16
pig ext install -y datasketches -v 15  # PG 15
pig ext install -y datasketches -v 14  # PG 14
dnf install -y datasketches_18       # PG 18
dnf install -y datasketches_17       # PG 17
dnf install -y datasketches_16       # PG 16
dnf install -y datasketches_15       # PG 15
dnf install -y datasketches_14       # PG 14
apt install -y postgresql-18-datasketches   # PG 18
apt install -y postgresql-17-datasketches   # PG 17
apt install -y postgresql-16-datasketches   # PG 16
apt install -y postgresql-15-datasketches   # PG 15
apt install -y postgresql-14-datasketches   # PG 14

Create Extension:

CREATE EXTENSION datasketches;

Usage

Sources: README, Apache DataSketches site PostgreSQL extension for approximate analytics sketches and aggregates.

CREATE EXTENSION datasketches;

The extension supports CPC, HLL, Theta, Array Of Doubles, KLL, Quantiles, and Frequent Strings sketches.

Sketch Families

  • CPC for compact distinct counting.
  • HLL for HyperLogLog-style distinct counting.
  • Theta for distinct counting with set operations such as union, intersection, and A-not-B.
  • Array Of Doubles for tuple sketches with arrays of double values per key.
  • KLL for quantiles, ranks, PMF, and CDF estimation.
  • Quantiles sketch for long-term support of distribution estimates.
  • Frequent strings for tracking the heaviest items by count or weight.

Examples

SELECT cpc_sketch_to_string(cpc_sketch_build(1));
SELECT cpc_sketch_distinct(id) FROM random_ints_100m;
SELECT cpc_sketch_get_estimate(cpc_sketch_union(sketch)) FROM cpc_sketch_test;
SELECT theta_sketch_get_estimate(theta_sketch_union(sketch)) FROM theta_sketch_test;
SELECT theta_sketch_get_estimate(theta_sketch_intersection(sketch1, sketch2)) FROM theta_set_op_test;
SELECT hll_sketch_get_estimate(hll_sketch_union(sketch)) FROM hll_sketch_test;
SELECT hll_sketch_get_estimate(hll_sketch_union(hll_sketch_build(1), hll_sketch_build(2)));
SELECT kll_float_sketch_get_quantile(kll_float_sketch_merge(sketch), 0.5) FROM kll_float_sketch_test;
SELECT frequent_strings_sketch_result_no_false_negatives(frequent_strings_sketch_build(9, value), 1000000) FROM zipf_1p1_8k_100m;

Core Operations

  • Build sketches with *_sketch_build(...).
  • Merge or aggregate them with *_sketch_union(...), *_sketch_merge(...), and sketch-specific set-operation helpers.
  • Read estimates with *_sketch_get_estimate(...) and distribution helpers such as kll_float_sketch_get_quantile(...).

Notes

  • The README says the extension targets PostgreSQL 9.6 and higher and depends on Boost 1.75 and DataSketches C++ core 5.0.0 or later.
  • The upstream examples emphasize additive analytics in data cubes, not exact replacement for normal aggregates.

Last Modified 2026-04-14: update extension catalog (29617e5)