zhparser

a parser for full-text search of Chinese

Overview

Package	Version	Category	License	Language
`zhparser`	`2.3`	FTS	PostgreSQL	C

ID	Extension	Bin	Lib	Load	Create	Trust	Reloc	Schema
2130	`zhparser`	No	Yes	No	Yes	No	Yes	-

Related	`pg_trgm` `rum` `pg_search` `pgroonga` `pgroonga_database` `pg_bigm` `pg_tokenizer` `vchord_bm25`

Version

Type	Repo	Version	PG Ver	Package	Deps
EXT	PIGSTY	`2.3`	1817161514	`zhparser`	-
RPM	PIGSTY	`2.3`	1817161514	`zhparser_$v`	-
DEB	PIGSTY	`2.3`	1817161514	`postgresql-$v-zhparser`	-

OS / PG	PG18	PG17	PG16	PG15	PG14
el8.x86_64	PIGSTY 2.3 el8.x86_64.pg18 : zhparser_18 zhparser_18-2.3-1PIGSTY.el8.x86_64.rpm PIGSTY · 2.3 · 4.7MiB	PIGSTY 2.3 el8.x86_64.pg17 : zhparser_17 zhparser_17-2.3-1PIGSTY.el8.x86_64.rpm PIGSTY · 2.3 · 4.7MiB	PIGSTY 2.3 el8.x86_64.pg16 : zhparser_16 zhparser_16-2.3-1PIGSTY.el8.x86_64.rpm PIGSTY · 2.3 · 4.7MiB	PIGSTY 2.3 el8.x86_64.pg15 : zhparser_15 zhparser_15-2.3-1PIGSTY.el8.x86_64.rpm PIGSTY · 2.3 · 4.7MiB	PIGSTY 2.3 el8.x86_64.pg14 : zhparser_14 zhparser_14-2.3-1PIGSTY.el8.x86_64.rpm PIGSTY · 2.3 · 4.7MiB
el8.aarch64	PIGSTY 2.3 el8.aarch64.pg18 : zhparser_18 zhparser_18-2.3-1PIGSTY.el8.aarch64.rpm PIGSTY · 2.3 · 4.7MiB	PIGSTY 2.3 el8.aarch64.pg17 : zhparser_17 zhparser_17-2.3-1PIGSTY.el8.aarch64.rpm PIGSTY · 2.3 · 4.7MiB	PIGSTY 2.3 el8.aarch64.pg16 : zhparser_16 zhparser_16-2.3-1PIGSTY.el8.aarch64.rpm PIGSTY · 2.3 · 4.7MiB	PIGSTY 2.3 el8.aarch64.pg15 : zhparser_15 zhparser_15-2.3-1PIGSTY.el8.aarch64.rpm PIGSTY · 2.3 · 4.7MiB	PIGSTY 2.3 el8.aarch64.pg14 : zhparser_14 zhparser_14-2.3-1PIGSTY.el8.aarch64.rpm PIGSTY · 2.3 · 4.7MiB
el9.x86_64	PIGSTY 2.3 el9.x86_64.pg18 : zhparser_18 zhparser_18-2.3-1PIGSTY.el9.x86_64.rpm PIGSTY · 2.3 · 4.3MiB	PIGSTY 2.3 el9.x86_64.pg17 : zhparser_17 zhparser_17-2.3-1PIGSTY.el9.x86_64.rpm PIGSTY · 2.3 · 4.3MiB	PIGSTY 2.3 el9.x86_64.pg16 : zhparser_16 zhparser_16-2.3-1PIGSTY.el9.x86_64.rpm PIGSTY · 2.3 · 4.3MiB	PIGSTY 2.3 el9.x86_64.pg15 : zhparser_15 zhparser_15-2.3-1PIGSTY.el9.x86_64.rpm PIGSTY · 2.3 · 4.3MiB	PIGSTY 2.3 el9.x86_64.pg14 : zhparser_14 zhparser_14-2.3-1PIGSTY.el9.x86_64.rpm PIGSTY · 2.3 · 4.3MiB
el9.aarch64	PIGSTY 2.3 el9.aarch64.pg18 : zhparser_18 zhparser_18-2.3-1PIGSTY.el9.aarch64.rpm PIGSTY · 2.3 · 4.3MiB	PIGSTY 2.3 el9.aarch64.pg17 : zhparser_17 zhparser_17-2.3-1PIGSTY.el9.aarch64.rpm PIGSTY · 2.3 · 4.3MiB	PIGSTY 2.3 el9.aarch64.pg16 : zhparser_16 zhparser_16-2.3-1PIGSTY.el9.aarch64.rpm PIGSTY · 2.3 · 4.3MiB	PIGSTY 2.3 el9.aarch64.pg15 : zhparser_15 zhparser_15-2.3-1PIGSTY.el9.aarch64.rpm PIGSTY · 2.3 · 4.3MiB	PIGSTY 2.3 el9.aarch64.pg14 : zhparser_14 zhparser_14-2.3-1PIGSTY.el9.aarch64.rpm PIGSTY · 2.3 · 4.3MiB
el10.x86_64	PIGSTY 2.3 el10.x86_64.pg18 : zhparser_18 zhparser_18-2.3-1PIGSTY.el10.x86_64.rpm PIGSTY · 2.3 · 4.3MiB	PIGSTY 2.3 el10.x86_64.pg17 : zhparser_17 zhparser_17-2.3-1PIGSTY.el10.x86_64.rpm PIGSTY · 2.3 · 4.3MiB	PIGSTY 2.3 el10.x86_64.pg16 : zhparser_16 zhparser_16-2.3-1PIGSTY.el10.x86_64.rpm PIGSTY · 2.3 · 4.3MiB	PIGSTY 2.3 el10.x86_64.pg15 : zhparser_15 zhparser_15-2.3-1PIGSTY.el10.x86_64.rpm PIGSTY · 2.3 · 4.3MiB	PIGSTY 2.3 el10.x86_64.pg14 : zhparser_14 zhparser_14-2.3-1PIGSTY.el10.x86_64.rpm PIGSTY · 2.3 · 4.3MiB
el10.aarch64	PIGSTY 2.3 el10.aarch64.pg18 : zhparser_18 zhparser_18-2.3-1PIGSTY.el10.aarch64.rpm PIGSTY · 2.3 · 4.3MiB	PIGSTY 2.3 el10.aarch64.pg17 : zhparser_17 zhparser_17-2.3-1PIGSTY.el10.aarch64.rpm PIGSTY · 2.3 · 4.3MiB	PIGSTY 2.3 el10.aarch64.pg16 : zhparser_16 zhparser_16-2.3-1PIGSTY.el10.aarch64.rpm PIGSTY · 2.3 · 4.3MiB	PIGSTY 2.3 el10.aarch64.pg15 : zhparser_15 zhparser_15-2.3-1PIGSTY.el10.aarch64.rpm PIGSTY · 2.3 · 4.3MiB	PIGSTY 2.3 el10.aarch64.pg14 : zhparser_14 zhparser_14-2.3-1PIGSTY.el10.aarch64.rpm PIGSTY · 2.3 · 4.3MiB
d12.x86_64	PIGSTY 2.3 d12.x86_64.pg18 : postgresql-18-zhparser postgresql-18-zhparser_2.3-1PIGSTY~bookworm_amd64.deb PIGSTY · 2.3 · 4.0MiB	PIGSTY 2.3 d12.x86_64.pg17 : postgresql-17-zhparser postgresql-17-zhparser_2.3-1PIGSTY~bookworm_amd64.deb PIGSTY · 2.3 · 4.0MiB	PIGSTY 2.3 d12.x86_64.pg16 : postgresql-16-zhparser postgresql-16-zhparser_2.3-1PIGSTY~bookworm_amd64.deb PIGSTY · 2.3 · 4.0MiB	PIGSTY 2.3 d12.x86_64.pg15 : postgresql-15-zhparser postgresql-15-zhparser_2.3-1PIGSTY~bookworm_amd64.deb PIGSTY · 2.3 · 4.0MiB	PIGSTY 2.3 d12.x86_64.pg14 : postgresql-14-zhparser postgresql-14-zhparser_2.3-1PIGSTY~bookworm_amd64.deb PIGSTY · 2.3 · 4.0MiB
d12.aarch64	PIGSTY 2.3 d12.aarch64.pg18 : postgresql-18-zhparser postgresql-18-zhparser_2.3-1PIGSTY~bookworm_arm64.deb PIGSTY · 2.3 · 4.0MiB	PIGSTY 2.3 d12.aarch64.pg17 : postgresql-17-zhparser postgresql-17-zhparser_2.3-1PIGSTY~bookworm_arm64.deb PIGSTY · 2.3 · 4.0MiB	PIGSTY 2.3 d12.aarch64.pg16 : postgresql-16-zhparser postgresql-16-zhparser_2.3-1PIGSTY~bookworm_arm64.deb PIGSTY · 2.3 · 4.0MiB	PIGSTY 2.3 d12.aarch64.pg15 : postgresql-15-zhparser postgresql-15-zhparser_2.3-1PIGSTY~bookworm_arm64.deb PIGSTY · 2.3 · 4.0MiB	PIGSTY 2.3 d12.aarch64.pg14 : postgresql-14-zhparser postgresql-14-zhparser_2.3-1PIGSTY~bookworm_arm64.deb PIGSTY · 2.3 · 4.0MiB
d13.x86_64	PIGSTY 2.3 d13.x86_64.pg18 : postgresql-18-zhparser postgresql-18-zhparser_2.3-1PIGSTY~trixie_amd64.deb PIGSTY · 2.3 · 4.0MiB	PIGSTY 2.3 d13.x86_64.pg17 : postgresql-17-zhparser postgresql-17-zhparser_2.3-1PIGSTY~trixie_amd64.deb PIGSTY · 2.3 · 4.0MiB	PIGSTY 2.3 d13.x86_64.pg16 : postgresql-16-zhparser postgresql-16-zhparser_2.3-1PIGSTY~trixie_amd64.deb PIGSTY · 2.3 · 4.0MiB	PIGSTY 2.3 d13.x86_64.pg15 : postgresql-15-zhparser postgresql-15-zhparser_2.3-1PIGSTY~trixie_amd64.deb PIGSTY · 2.3 · 4.0MiB	PIGSTY 2.3 d13.x86_64.pg14 : postgresql-14-zhparser postgresql-14-zhparser_2.3-1PIGSTY~trixie_amd64.deb PIGSTY · 2.3 · 4.0MiB
d13.aarch64	PIGSTY 2.3 d13.aarch64.pg18 : postgresql-18-zhparser postgresql-18-zhparser_2.3-1PIGSTY~trixie_arm64.deb PIGSTY · 2.3 · 4.0MiB	PIGSTY 2.3 d13.aarch64.pg17 : postgresql-17-zhparser postgresql-17-zhparser_2.3-1PIGSTY~trixie_arm64.deb PIGSTY · 2.3 · 4.0MiB	PIGSTY 2.3 d13.aarch64.pg16 : postgresql-16-zhparser postgresql-16-zhparser_2.3-1PIGSTY~trixie_arm64.deb PIGSTY · 2.3 · 4.0MiB	PIGSTY 2.3 d13.aarch64.pg15 : postgresql-15-zhparser postgresql-15-zhparser_2.3-1PIGSTY~trixie_arm64.deb PIGSTY · 2.3 · 4.0MiB	PIGSTY 2.3 d13.aarch64.pg14 : postgresql-14-zhparser postgresql-14-zhparser_2.3-1PIGSTY~trixie_arm64.deb PIGSTY · 2.3 · 4.0MiB
u22.x86_64	PIGSTY 2.3 u22.x86_64.pg18 : postgresql-18-zhparser postgresql-18-zhparser_2.3-1PIGSTY~jammy_amd64.deb PIGSTY · 2.3 · 4.3MiB	PIGSTY 2.3 u22.x86_64.pg17 : postgresql-17-zhparser postgresql-17-zhparser_2.3-1PIGSTY~jammy_amd64.deb PIGSTY · 2.3 · 4.3MiB	PIGSTY 2.3 u22.x86_64.pg16 : postgresql-16-zhparser postgresql-16-zhparser_2.3-1PIGSTY~jammy_amd64.deb PIGSTY · 2.3 · 4.3MiB	PIGSTY 2.3 u22.x86_64.pg15 : postgresql-15-zhparser postgresql-15-zhparser_2.3-1PIGSTY~jammy_amd64.deb PIGSTY · 2.3 · 4.3MiB	PIGSTY 2.3 u22.x86_64.pg14 : postgresql-14-zhparser postgresql-14-zhparser_2.3-1PIGSTY~jammy_amd64.deb PIGSTY · 2.3 · 4.3MiB
u22.aarch64	PIGSTY 2.3 u22.aarch64.pg18 : postgresql-18-zhparser postgresql-18-zhparser_2.3-1PIGSTY~jammy_arm64.deb PIGSTY · 2.3 · 4.3MiB	PIGSTY 2.3 u22.aarch64.pg17 : postgresql-17-zhparser postgresql-17-zhparser_2.3-1PIGSTY~jammy_arm64.deb PIGSTY · 2.3 · 4.3MiB	PIGSTY 2.3 u22.aarch64.pg16 : postgresql-16-zhparser postgresql-16-zhparser_2.3-1PIGSTY~jammy_arm64.deb PIGSTY · 2.3 · 4.3MiB	PIGSTY 2.3 u22.aarch64.pg15 : postgresql-15-zhparser postgresql-15-zhparser_2.3-1PIGSTY~jammy_arm64.deb PIGSTY · 2.3 · 4.3MiB	PIGSTY 2.3 u22.aarch64.pg14 : postgresql-14-zhparser postgresql-14-zhparser_2.3-1PIGSTY~jammy_arm64.deb PIGSTY · 2.3 · 4.3MiB
u24.x86_64	PIGSTY 2.3 u24.x86_64.pg18 : postgresql-18-zhparser postgresql-18-zhparser_2.3-1PIGSTY~noble_amd64.deb PIGSTY · 2.3 · 4.3MiB	PIGSTY 2.3 u24.x86_64.pg17 : postgresql-17-zhparser postgresql-17-zhparser_2.3-1PIGSTY~noble_amd64.deb PIGSTY · 2.3 · 4.3MiB	PIGSTY 2.3 u24.x86_64.pg16 : postgresql-16-zhparser postgresql-16-zhparser_2.3-1PIGSTY~noble_amd64.deb PIGSTY · 2.3 · 4.3MiB	PIGSTY 2.3 u24.x86_64.pg15 : postgresql-15-zhparser postgresql-15-zhparser_2.3-1PIGSTY~noble_amd64.deb PIGSTY · 2.3 · 4.3MiB	PIGSTY 2.3 u24.x86_64.pg14 : postgresql-14-zhparser postgresql-14-zhparser_2.3-1PIGSTY~noble_amd64.deb PIGSTY · 2.3 · 4.3MiB
u24.aarch64	PIGSTY 2.3 u24.aarch64.pg18 : postgresql-18-zhparser postgresql-18-zhparser_2.3-1PIGSTY~noble_arm64.deb PIGSTY · 2.3 · 4.3MiB	PIGSTY 2.3 u24.aarch64.pg17 : postgresql-17-zhparser postgresql-17-zhparser_2.3-1PIGSTY~noble_arm64.deb PIGSTY · 2.3 · 4.3MiB	PIGSTY 2.3 u24.aarch64.pg16 : postgresql-16-zhparser postgresql-16-zhparser_2.3-1PIGSTY~noble_arm64.deb PIGSTY · 2.3 · 4.3MiB	PIGSTY 2.3 u24.aarch64.pg15 : postgresql-15-zhparser postgresql-15-zhparser_2.3-1PIGSTY~noble_arm64.deb PIGSTY · 2.3 · 4.3MiB	PIGSTY 2.3 u24.aarch64.pg14 : postgresql-14-zhparser postgresql-14-zhparser_2.3-1PIGSTY~noble_arm64.deb PIGSTY · 2.3 · 4.3MiB

Build

You can build the RPM / DEB packages for zhparser using pig build:

pig build pkg zhparser         # build RPM / DEB packages

Install

You can install zhparser directly. First, make sure the PGDG and PIGSTY repositories are added and enabled:

pig repo add pgsql -u          # Add repo and update cache

Install the extension using pig or apt/yum/dnf:

pig install zhparser;          # Install for current active PG version

pig ext install -y zhparser -v 18  # PG 18
pig ext install -y zhparser -v 17  # PG 17
pig ext install -y zhparser -v 16  # PG 16
pig ext install -y zhparser -v 15  # PG 15
pig ext install -y zhparser -v 14  # PG 14

dnf install -y zhparser_18       # PG 18
dnf install -y zhparser_17       # PG 17
dnf install -y zhparser_16       # PG 16
dnf install -y zhparser_15       # PG 15
dnf install -y zhparser_14       # PG 14

apt install -y postgresql-18-zhparser   # PG 18
apt install -y postgresql-17-zhparser   # PG 17
apt install -y postgresql-16-zhparser   # PG 16
apt install -y postgresql-15-zhparser   # PG 15
apt install -y postgresql-14-zhparser   # PG 14

Create Extension:

CREATE EXTENSION zhparser;

Usage

GitHub: amutu/zhparser

zhparser is a PostgreSQL extension for full-text search of Chinese, based on the Simple Chinese Word Segmentation (SCWS) library.

Features

Chinese text segmentation for PostgreSQL full-text search
Built on the SCWS (Simple Chinese Word Segmentation) library
Supports custom dictionaries (TXT and XDB formats)
Database-level custom word tables (since v2.1)
Multiple tunable parameters for segmentation behavior

Quick Start

-- Create the extension
CREATE EXTENSION zhparser;

-- Create a text search configuration using zhparser
CREATE TEXT SEARCH CONFIGURATION chinese (PARSER = zhparser);

-- Add token type mappings
ALTER TEXT SEARCH CONFIGURATION chinese ADD MAPPING FOR n,v,a,i,e,l WITH simple;

-- Test Chinese text segmentation
SELECT to_tsvector('chinese', '小明硕士毕业于中国科学院计算所，后在日本京都大学深造');

-- Create a table and index for Chinese full text search
CREATE TABLE articles (id serial PRIMARY KEY, title text, body text);

CREATE INDEX articles_body_idx ON articles
  USING gin (to_tsvector('chinese', body));

-- Query with Chinese full text search
SELECT * FROM articles
  WHERE to_tsvector('chinese', body) @@ to_tsquery('chinese', '中国');

Configuration Parameters

zhparser provides several GUC parameters to control segmentation behavior:

Parameter	Default	Description
`zhparser.punctuation_ignore`	`off`	Ignore all punctuation
`zhparser.seg_with_duality`	`off`	Perform duality segmentation on long words
`zhparser.dict_in_memory`	`off`	Load the whole dictionary into memory
`zhparser.multi_short`	`off`	Short word compound segmentation
`zhparser.multi_duality`	`off`	Duality compound segmentation
`zhparser.multi_zmain`	`off`	Key word in first compound segmentation
`zhparser.multi_zall`	`off`	Use all compound segmentation

Token Types

zhparser supports the following token types from SCWS:

Code	Description
`a`	Adjective
`b`	Differentiation (区别词)
`c`	Conjunction
`d`	Adverb
`e`	Exclamation
`f`	Position word (方位词)
`g`	Root word (词根)
`h`	Prefix
`i`	Idiom
`j`	Abbreviation
`k`	Suffix
`l`	Temporary idiom
`m`	Numeral
`n`	Noun
`o`	Onomatopoeia
`p`	Preposition
`q`	Classifier
`r`	Pronoun
`s`	Space word (处所词)
`t`	Time word
`u`	Auxiliary
`v`	Verb
`w`	Punctuation
`x`	Unknown
`y`	Modal particle
`z`	Status word (状态词)

Custom Dictionaries

File-based Dictionaries

Place custom dictionary files in the share directory (typically $SHAREDIR/tsearch_data/):

TXT format: one word per line
XDB format: compiled SCWS dictionary format

Custom dictionaries take precedence over built-in dictionaries.

Database-level Custom Words (v2.1+)

-- Add custom words via zhparser's built-in table
INSERT INTO zhparser.zhprs_custom_word VALUES ('中国科学院计算所');

-- Reload custom dictionary (reconnect after sync to take effect)
SELECT sync_zhprs_custom_word();

-- Verify segmentation with custom word
SELECT to_tsvector('chinese', '小明硕士毕业于中国科学院计算所');

Docker Quick Start

docker run --name pgzhparser -d \
  -e POSTGRES_PASSWORD=somepassword \
  zhparser/zhparser:bookworm-16

Feedback

Was this page helpful?

Thanks for the feedback! Please let us know how we can improve.

Sorry to hear that. Please let us know how we can improve.

Last Modified 2026-03-12: add pg extension catalog (95749bf)