pg_tiktoken

tiktoken tokenizer for use with OpenAI models in postgres

Overview

PackageVersionCategoryLicenseLanguage
pg_tiktoken0.0.1RAGApache-2.0Rust
IDExtensionBinLibLoadCreateTrustRelocSchema
1870pg_tiktokenNoYesNoYesNoNo-
Relatedvectorize pg_summarize pg4ml pgml vector vchord vectorscale pg_graphql

Version

TypeRepoVersionPG VerPackageDeps
EXTPIGSTY0.0.11817161514pg_tiktoken-
RPMPIGSTY0.0.11817161514pg_tiktoken_$v-
DEBPIGSTY0.0.11817161514postgresql-$v-pg-tiktoken-
OS / PGPG18PG17PG16PG15PG14
el8.x86_64
el8.aarch64
el9.x86_64
el9.aarch64
el10.x86_64
el10.aarch64
d12.x86_64
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
d12.aarch64
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
d13.x86_64
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
d13.aarch64
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
u22.x86_64
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
u22.aarch64
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
u24.x86_64
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
u24.aarch64
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1
PIGSTY 0.0.1

Build

You can build the RPM / DEB packages for pg_tiktoken using pig build:

pig build pkg pg_tiktoken         # build RPM / DEB packages

Install

You can install pg_tiktoken directly. First, make sure the PGDG and PIGSTY repositories are added and enabled:

pig repo add pgsql -u          # Add repo and update cache

Install the extension using pig or apt/yum/dnf:

pig install pg_tiktoken;          # Install for current active PG version
pig ext install -y pg_tiktoken -v 18  # PG 18
pig ext install -y pg_tiktoken -v 17  # PG 17
pig ext install -y pg_tiktoken -v 16  # PG 16
pig ext install -y pg_tiktoken -v 15  # PG 15
pig ext install -y pg_tiktoken -v 14  # PG 14
dnf install -y pg_tiktoken_18       # PG 18
dnf install -y pg_tiktoken_17       # PG 17
dnf install -y pg_tiktoken_16       # PG 16
dnf install -y pg_tiktoken_15       # PG 15
dnf install -y pg_tiktoken_14       # PG 14
apt install -y postgresql-18-pg-tiktoken   # PG 18
apt install -y postgresql-17-pg-tiktoken   # PG 17
apt install -y postgresql-16-pg-tiktoken   # PG 16
apt install -y postgresql-15-pg-tiktoken   # PG 15
apt install -y postgresql-14-pg-tiktoken   # PG 14

Create Extension:

CREATE EXTENSION pg_tiktoken;

Usage

pg_tiktoken: tiktoken tokenizer for use with OpenAI models in PostgreSQL. Source: README.md

pg_tiktoken is a PostgreSQL extension that provides input tokenization using OpenAI’s tiktoken library. It allows you to count and encode tokens directly in SQL, which is useful for managing input length limits when working with OpenAI models.


Functions

tiktoken_count

Count the number of tokens for a given encoding or model:

SELECT tiktoken_count('p50k_edit', 'A long time ago in a galaxy far, far away');
 tiktoken_count
----------------
             11
(1 row)

tiktoken_encode

Get the token IDs for a given encoding or model:

SELECT tiktoken_encode('cl100k_base', 'A long time ago in a galaxy far, far away');
                  tiktoken_encode
----------------------------------------------------
 {32,1317,892,4227,304,264,34261,3117,11,3117,3201}
(1 row)

Both tiktoken_count and tiktoken_encode accept either an encoding name or an OpenAI model name as the first argument.


Supported Models

Encoding nameOpenAI models
cl100k_baseChatGPT models, text-embedding-ada-002
p50k_baseCode models, text-davinci-002, text-davinci-003
p50k_editEdit models like text-davinci-edit-001, code-davinci-edit-001
r50k_base (or gpt2)GPT-3 models like davinci

Last Modified 2026-03-12: add pg extension catalog (95749bf)