zhparser

a parser for full-text search of Chinese

Overview

PackageVersionCategoryLicenseLanguage
zhparser2.3FTSPostgreSQLC
IDExtensionBinLibLoadCreateTrustRelocSchema
2130zhparserNoYesNoYesNoYes-
Relatedpg_trgm rum pg_search pgroonga pgroonga_database pg_bigm pg_tokenizer vchord_bm25

Version

TypeRepoVersionPG VerPackageDeps
EXTPIGSTY2.31817161514zhparser-
RPMPIGSTY2.31817161514zhparser_$v-
DEBPIGSTY2.31817161514postgresql-$v-zhparser-
OS / PGPG18PG17PG16PG15PG14
el8.x86_64
el8.aarch64
el9.x86_64
el9.aarch64
el10.x86_64
el10.aarch64
d12.x86_64
d12.aarch64
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
d13.x86_64
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
d13.aarch64
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
u22.x86_64
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
u22.aarch64
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
u24.x86_64
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
u24.aarch64
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3

Build

You can build the RPM / DEB packages for zhparser using pig build:

pig build pkg zhparser         # build RPM / DEB packages

Install

You can install zhparser directly. First, make sure the PGDG and PIGSTY repositories are added and enabled:

pig repo add pgsql -u          # Add repo and update cache

Install the extension using pig or apt/yum/dnf:

pig install zhparser;          # Install for current active PG version
pig ext install -y zhparser -v 18  # PG 18
pig ext install -y zhparser -v 17  # PG 17
pig ext install -y zhparser -v 16  # PG 16
pig ext install -y zhparser -v 15  # PG 15
pig ext install -y zhparser -v 14  # PG 14
dnf install -y zhparser_18       # PG 18
dnf install -y zhparser_17       # PG 17
dnf install -y zhparser_16       # PG 16
dnf install -y zhparser_15       # PG 15
dnf install -y zhparser_14       # PG 14
apt install -y postgresql-18-zhparser   # PG 18
apt install -y postgresql-17-zhparser   # PG 17
apt install -y postgresql-16-zhparser   # PG 16
apt install -y postgresql-15-zhparser   # PG 15
apt install -y postgresql-14-zhparser   # PG 14

Create Extension:

CREATE EXTENSION zhparser;

Usage

GitHub: amutu/zhparser

zhparser is a PostgreSQL extension for full-text search of Chinese, based on the Simple Chinese Word Segmentation (SCWS) library.

Features

  • Chinese text segmentation for PostgreSQL full-text search
  • Built on the SCWS (Simple Chinese Word Segmentation) library
  • Supports custom dictionaries (TXT and XDB formats)
  • Database-level custom word tables (since v2.1)
  • Multiple tunable parameters for segmentation behavior

Quick Start

-- Create the extension
CREATE EXTENSION zhparser;

-- Create a text search configuration using zhparser
CREATE TEXT SEARCH CONFIGURATION chinese (PARSER = zhparser);

-- Add token type mappings
ALTER TEXT SEARCH CONFIGURATION chinese ADD MAPPING FOR n,v,a,i,e,l WITH simple;

-- Test Chinese text segmentation
SELECT to_tsvector('chinese', '小明硕士毕业于中国科学院计算所,后在日本京都大学深造');

-- Create a table and index for Chinese full text search
CREATE TABLE articles (id serial PRIMARY KEY, title text, body text);

CREATE INDEX articles_body_idx ON articles
  USING gin (to_tsvector('chinese', body));

-- Query with Chinese full text search
SELECT * FROM articles
  WHERE to_tsvector('chinese', body) @@ to_tsquery('chinese', '中国');

Configuration Parameters

zhparser provides several GUC parameters to control segmentation behavior:

ParameterDefaultDescription
zhparser.punctuation_ignoreoffIgnore all punctuation
zhparser.seg_with_dualityoffPerform duality segmentation on long words
zhparser.dict_in_memoryoffLoad the whole dictionary into memory
zhparser.multi_shortoffShort word compound segmentation
zhparser.multi_dualityoffDuality compound segmentation
zhparser.multi_zmainoffKey word in first compound segmentation
zhparser.multi_zalloffUse all compound segmentation

Token Types

zhparser supports the following token types from SCWS:

CodeDescription
aAdjective
bDifferentiation (区别词)
cConjunction
dAdverb
eExclamation
fPosition word (方位词)
gRoot word (词根)
hPrefix
iIdiom
jAbbreviation
kSuffix
lTemporary idiom
mNumeral
nNoun
oOnomatopoeia
pPreposition
qClassifier
rPronoun
sSpace word (处所词)
tTime word
uAuxiliary
vVerb
wPunctuation
xUnknown
yModal particle
zStatus word (状态词)

Custom Dictionaries

File-based Dictionaries

Place custom dictionary files in the share directory (typically $SHAREDIR/tsearch_data/):

  • TXT format: one word per line
  • XDB format: compiled SCWS dictionary format

Custom dictionaries take precedence over built-in dictionaries.

Database-level Custom Words (v2.1+)

-- Add custom words via zhparser's built-in table
INSERT INTO zhparser.zhprs_custom_word VALUES ('中国科学院计算所');

-- Reload custom dictionary (reconnect after sync to take effect)
SELECT sync_zhprs_custom_word();

-- Verify segmentation with custom word
SELECT to_tsvector('chinese', '小明硕士毕业于中国科学院计算所');

Docker Quick Start

docker run --name pgzhparser -d \
  -e POSTGRES_PASSWORD=somepassword \
  zhparser/zhparser:bookworm-16

Last Modified 2026-03-12: add pg extension catalog (95749bf)