ParadeDB: ElasticSearch Alternative in PG Ecosystem
ParadeDB: A New Player in the PostgreSQL Ecosystem
YC S23 invested in an exciting new project called ParadeDB. Their slogan? “Postgres for Search & Analytics — Modern Elasticsearch Alternative built on Postgres.” In essence, it’s PostgreSQL optimized for search and analytics, aiming to be a drop-in replacement for Elasticsearch.
The PostgreSQL ecosystem continues to flourish with innovative extensions and derivatives. We’ve already seen FerretDB as an open-source MongoDB alternative, Babelfish for SQL Server, Supabase for Firebase, and NocoDB for AirTable. Now, we can add ParadeDB to the list as an open-source Elasticsearch alternative.
ParadeDB consists of three PostgreSQL extensions: pg_bm25
, pg_analytics
, and pg_sparse
. Each extension can be used independently. I’ve packaged these extensions (v0.5.6) and will include them by default in the next Pigsty release, making them available out of the box for users.
I’ve translated ParadeDB’s official website introduction and four blog posts to introduce this rising star in the PostgreSQL ecosystem. Today’s post is the first one — an overview.
ParadeDB
We’re thrilled to introduce ParadeDB: a PostgreSQL database optimized for search scenarios. ParadeDB is the first PostgreSQL build designed to be an Elasticsearch alternative, offering lightning-fast full-text search, semantic search, and hybrid search capabilities on PostgreSQL tables.
What Problem Does ParadeDB Solve?
For many organizations, search remains an unsolved problem — despite giants like Elasticsearch in the market. Most developers who’ve worked with Elasticsearch know the pain of running, tuning, and managing it. While other search engine services exist, integrating external services with existing databases introduces complex challenges and costs associated with rebuilding indexes and data replication.
Developers seeking a unified source of truth and search capabilities have turned to PostgreSQL. While PG offers basic full-text search through tsvector
and semantic search through pgvector
, these tools fall short when dealing with large tables or complex queries:
- Sorting and keyword searches on large tables are painfully slow
- No BM25 scoring support
- No hybrid search capabilities combining vector and full-text search
- No real-time search — data must be manually reindexed or re-embedded
- Limited support for complex queries like faceting or relevance tuning
We’ve seen many engineering teams reluctantly layer Elasticsearch on top of PostgreSQL, only to abandon it due to its bloated nature, high costs, or complexity. We wondered: what if PostgreSQL had Elasticsearch-level search capabilities built-in? This would eliminate the dilemma of choosing between using PostgreSQL with limited search capabilities or maintaining separate services for source of truth and search.
Who Is ParadeDB For?
While Elasticsearch serves a wide range of use cases, we’re not trying to cover everything — at least not yet. We’re focusing on core scenarios, specifically serving users who want to perform search within PostgreSQL. ParadeDB is ideal for you if:
- You want to use PostgreSQL as your single source of truth and hate data replication between multiple services
- You need to perform full-text search on massive documents stored in PostgreSQL without compromising performance and scalability
- You want to combine ANN/similarity search with full-text search for more precise semantic matching
ParadeDB Product Overview
ParadeDB is a fully managed Postgres database with indexing and search capabilities for PostgreSQL tables that you won’t find in any other PostgreSQL provider:
Feature | Description |
---|---|
BM25 Full-Text Search | Full-text search supporting boolean, fuzzy, boosting, and keyword queries. Search results are scored using the BM25 algorithm. |
Faceted Search | PostgreSQL columns can be defined as facets for easy bucketing and metric collection. |
Hybrid Search | Search results can be scored considering both semantic relevance (vector search) and text relevance (BM25). |
Distributed Search | Tables can be sharded for parallel query acceleration. |
Generative Search | PostgreSQL columns can be fed into large language models (LLMs) for automatic summarization, classification, or text generation. |
Real-time Search | Text indexes and vector columns automatically stay in sync with underlying data. |
Unlike managed services like AWS RDS, ParadeDB is a PostgreSQL extension plugin that requires no setup, integrates with the entire PG ecosystem, and is fully customizable. ParadeDB is open-source (AGPLv3) and provides a simple Docker Compose template for developers who need to self-host or customize.
How ParadeDB Is Built
At its core, ParadeDB is a standard Postgres database with custom extensions written in Rust that introduce enhanced search capabilities.
ParadeDB’s search engine is built on top of Tantivy, an open-source Rust search library inspired by Apache Lucene. Its indexes are stored natively in PostgreSQL as PG indexes, eliminating the need for cumbersome data replication/ETL work while maintaining transaction ACID guarantees.
ParadeDB introduces a new extension to the Postgres ecosystem: pg_bm25
. This extension implements Rust-based full-text search in PostgreSQL using the BM25 scoring algorithm. ParadeDB comes pre-installed with this extension.
What’s Next?
ParadeDB’s managed cloud version is currently in PrivateBeta. We aim to launch a self-service cloud platform in early 2024. If you’d like to access the PrivateBeta version during this period, join our waitlist.
Our core team is focused on developing the open-source version of ParadeDB, which will be released in Winter 2023.
We’re building in public and are excited to share ParadeDB with the community. Stay tuned for future blog posts where we’ll dive deeper into the fascinating technical challenges behind ParadeDB.