pg_tiktoken
Overview
| Package | Version | Category | License | Language |
|---|---|---|---|---|
pg_tiktoken | 0.0.1 | RAG | Apache-2.0 | Rust |
| ID | Extension | Bin | Lib | Load | Create | Trust | Reloc | Schema |
|---|---|---|---|---|---|---|---|---|
| 1870 | pg_tiktoken | No | Yes | No | Yes | No | No | - |
| Related | vectorize pg_summarize pg4ml pgml vector vchord vectorscale pg_graphql |
|---|
Version
| Type | Repo | Version | PG Ver | Package | Deps |
|---|---|---|---|---|---|
| EXT | PIGSTY | 0.0.1 | 1817161514 | pg_tiktoken | - |
| RPM | PIGSTY | 0.0.1 | 1817161514 | pg_tiktoken_$v | - |
| DEB | PIGSTY | 0.0.1 | 1817161514 | postgresql-$v-pg-tiktoken | - |
Build
You can build the RPM / DEB packages for pg_tiktoken using pig build:
pig build pkg pg_tiktoken # build RPM / DEB packages
Install
You can install pg_tiktoken directly. First, make sure the PGDG and PIGSTY repositories are added and enabled:
pig repo add pgsql -u # Add repo and update cache
Install the extension using pig or apt/yum/dnf:
pig install pg_tiktoken; # Install for current active PG version
pig ext install -y pg_tiktoken -v 18 # PG 18
pig ext install -y pg_tiktoken -v 17 # PG 17
pig ext install -y pg_tiktoken -v 16 # PG 16
pig ext install -y pg_tiktoken -v 15 # PG 15
pig ext install -y pg_tiktoken -v 14 # PG 14
dnf install -y pg_tiktoken_18 # PG 18
dnf install -y pg_tiktoken_17 # PG 17
dnf install -y pg_tiktoken_16 # PG 16
dnf install -y pg_tiktoken_15 # PG 15
dnf install -y pg_tiktoken_14 # PG 14
apt install -y postgresql-18-pg-tiktoken # PG 18
apt install -y postgresql-17-pg-tiktoken # PG 17
apt install -y postgresql-16-pg-tiktoken # PG 16
apt install -y postgresql-15-pg-tiktoken # PG 15
apt install -y postgresql-14-pg-tiktoken # PG 14
Create Extension:
CREATE EXTENSION pg_tiktoken;
Usage
pg_tiktoken: tiktoken tokenizer for use with OpenAI models in PostgreSQL. Source: README.md
pg_tiktoken is a PostgreSQL extension that provides input tokenization using OpenAI’s tiktoken library. It allows you to count and encode tokens directly in SQL, which is useful for managing input length limits when working with OpenAI models.
Functions
tiktoken_count
Count the number of tokens for a given encoding or model:
SELECT tiktoken_count('p50k_edit', 'A long time ago in a galaxy far, far away');
tiktoken_count
----------------
11
(1 row)
tiktoken_encode
Get the token IDs for a given encoding or model:
SELECT tiktoken_encode('cl100k_base', 'A long time ago in a galaxy far, far away');
tiktoken_encode
----------------------------------------------------
{32,1317,892,4227,304,264,34261,3117,11,3117,3201}
(1 row)
Both tiktoken_count and tiktoken_encode accept either an encoding name or an OpenAI model name as the first argument.
Supported Models
| Encoding name | OpenAI models |
|---|---|
cl100k_base | ChatGPT models, text-embedding-ada-002 |
p50k_base | Code models, text-davinci-002, text-davinci-003 |
p50k_edit | Edit models like text-davinci-edit-001, code-davinci-edit-001 |
r50k_base (or gpt2) | GPT-3 models like davinci |
Feedback
Was this page helpful?
Thanks for the feedback! Please let us know how we can improve.
Sorry to hear that. Please let us know how we can improve.