What Makes PostgreSQL So Awesome?

By Ruohang Feng（@Vonng） | WeChat Official Account | 2018-06-10

Tags:

PostgreSQL’s slogan is “The World’s Most Advanced Open Source Relational Database”, but I find this tagline lacks punch. It also feels like a direct jab at MySQL’s “The World’s Most Popular Open Source Relational Database” - a bit too much like riding on their coattails. If you ask me, the description that truly captures PG’s essence would be: The Full-Stack Database That Does It All - one tool to rule them all.

pggood

The Full-Stack Database

Mature applications typically rely on numerous data components and functions: caching, OLTP, OLAP/batch processing/data warehousing, stream processing/message queuing, search indexing, NoSQL/document databases, geographic databases, spatial databases, time-series databases, and graph databases. Traditional architecture selection usually combines multiple components - typically something like Redis + MySQL + Greenplum/Hadoop + Kafka/Flink + ElasticSearch. This combo can handle most requirements, but the real headache comes from integrating these heterogeneous systems: endless boilerplate code just shuffling data from Component A to Component B.

In this ecosystem, MySQL can only play the role of an OLTP relational database. PostgreSQL, however, can wear multiple hats and handle them all:

OLTP: Transaction processing is PostgreSQL’s bread and butter
OLAP: Citus distributed plugin, ANSI SQL compatibility, window functions, CTEs, CUBE and other advanced analytics features, UDFs in any language
Stream Processing: PipelineDB extension, Notify-Listen, materialized views, rules system, and flexible stored procedures and functions
Time-Series Data: TimescaleDB plugin, partitioned tables, BRIN indices
Spatial Data: PostGIS extension (the silver bullet), built-in geometric type support, GiST indexes
Search Indexing: Full-text search indexing sufficient for simple scenarios; rich index types, support for functional indices, conditional indices
NoSQL: Native support for JSON, JSONB, XML, HStore, and Foreign Data Wrappers to NoSQL databases
Data Warehousing: Smooth migration to GreenPlum, DeepGreen, HAWK, and others in the PG ecosystem, using FDW for ETL
Graph Data: Recursive queries
Caching: Materialized views

ext

With Extensions as the Six Instruments, to honor Heaven, Earth, and the Four Directions.

With Greenplum to honor Heaven,

With Postgres-XL to honor Earth,

With Citus to honor the East,

With TimescaleDB to honor the South,

With PipelineDB to honor the West,

With PostGIS to honor the North.

— “The Rites of PG”

In Tantan’s (a popular dating app) legacy architecture, the entire system was designed around PostgreSQL. With millions of daily active users, millions of global DB-TPS, and hundreds of terabytes of data, they used PostgreSQL as their only data component. Independent data warehouses, message queues, and caches were only introduced later. And this is just the validated scale - further squeezing PostgreSQL’s potential is entirely feasible.

So, within a considerable scale, PostgreSQL can play the role of a jack-of-all-trades, one component serving as many. While it may not match specialized components in certain domains, it still performs admirably in all of them. And choosing a single data component can dramatically reduce project complexity, which means massive cost savings. It turns what would require ten people into something one person can handle.

Designing for scale you don’t need is wasted effort - a form of premature optimization. Only when no single software can meet all your requirements does the trade-off between splitting and integration become relevant. Integrating heterogeneous technologies is incredibly tricky work. If there’s one technology that can satisfy all your needs, using it is the best choice rather than trying to reinvent it with multiple components.

When business scale grows to a certain threshold, you may have no choice but to use a microservice/bus-based architecture and split database functionality into multiple components. But PostgreSQL’s existence significantly pushes back the threshold for this trade-off, and even after splitting, it continues to play a crucial role.

Operations-Friendly

Beyond its powerful features, another significant advantage of Pg is that it’s operations-friendly. It offers many practical capabilities:

DDL can be placed in transactions: dropping tables, TRUNCATE, creating functions, and indices can all be placed in transactions for atomic effect or rollback.

This enables some clever maneuvers, like swapping two tables via RENAME in a single transaction (the database equivalent of a chess castling move).
Concurrent creation and deletion of indices, adding non-null fields, and reorganizing indices and tables (without table locks).

This means you can make significant schema changes to production systems without downtime, optimizing indices as needed.
Various replication methods: segment replication, streaming replication, trigger-based replication, logical replication, plugin replication, and more.

This makes zero-downtime data migration remarkably easy: replicate, redirect reads, redirect writes - three steps, and your production migration is rock solid.
Diverse commit methods: asynchronous commits, synchronous commits, quorum-based synchronous commits.

This means Pg allows trade-offs between consistency and availability - for example, using synchronous commits for transaction databases and asynchronous commits for regular databases.
Comprehensive system views make building monitoring systems straightforward.
FDW (Foreign Data Wrappers) makes ETL incredibly simple - often just a single SQL statement.

FDW conveniently allows one instance to access data or metadata from other instances. It’s incredibly useful for cross-partition operations, database monitoring metric collection, data migration, and connecting to heterogeneous data systems.

Healthy Ecosystem

PostgreSQL’s ecosystem is thriving with an active community.

Compared to MySQL, PostgreSQL has a huge advantage in its friendly license. PG uses a PostgreSQL license similar to BSD/MIT, which essentially means as long as you’re not falsely claiming to be PostgreSQL, you can do whatever you want - even rebrand and sell it. No wonder so many “domestic databases” or “self-developed databases” are actually just rebranded or extended PG products.

Of course, many derivative products contribute back to the trunk. For instance, timescaledb, pipelinedb, and citus - originally “databases” based on PG - all eventually became native PG plugins. Often when you need some functionality, a quick search reveals existing plugins or implementations. That’s the beauty of open source - a certain idealism prevails.

PG’s code quality is exceptionally high, with crystal-clear comments. Reading the C code feels almost like reading Go - the code practically serves as documentation. You can learn a lot from it. In contrast, with other databases like MongoDB, one glance and I lost all interest in reading further.

As for MySQL, the community edition uses the GPL license, which is quite painful. Without GPL’s viral nature, would there be so many open-source MySQL derivatives? Plus, MySQL is in Oracle’s hands - letting someone else hold your future isn’t wise, especially when that someone is an industry toxin. Facebook’s React license controversy serves as a cautionary tale.

Challenges

Of course, there are some drawbacks or regrets:

Due to MVCC, the database needs periodic VACUUM maintenance to prevent performance degradation.
No good open-source cluster monitoring solution (or they’re ugly!), so you need to build your own.
Slow query logs are mixed with regular logs, requiring custom parsing.
Official Pg lacks good columnar storage, a minor disappointment for data analysis.

These are just minor issues. The real challenge might not be technical at all…

At the end of the day, MySQL truly is the most popular open-source relational database. Java developers, PHP developers - many people start with MySQL, making it harder to recruit PostgreSQL talent. Often you need to train people yourself. Looking at DB Engines popularity trends, though, the future looks bright.

dbrank

Final Thoughts

Learning PostgreSQL has been fascinating - it showed me that databases can do far more than just CRUD operations. While SQL Server and MySQL were my gateway to databases, it was PostgreSQL that truly revealed the magical world of database possibilities.

I’m writing this because an old post of mine on Zhihu was dug up, reminding me of my early days discovering PostgreSQL (https://www.zhihu.com/question/20010554/answer/94999834). Now that I’ve become a full-time PG DBA, I couldn’t resist adding more to that old grave. “The melon seller praising her own melons” - it’s only right that I praise PG. Hehe…

Full-stack engineers deserve full-stack databases.

I’ve compared MySQL and PostgreSQL myself and had the rare freedom to choose at Alibaba, a MySQL-dominated world. From a purely technical perspective, I believe PG absolutely crushes MySQL. Despite significant resistance, I eventually implemented and promoted PostgreSQL. I’ve used it for numerous projects, solving countless requirements (from small statistical reports to major revenue-generating initiatives). Most requirements were handled by PG alone, with occasional use of MQ and NoSQL (Redis, MongoDB, Cassandra/HBase). PG is truly addictive.

Eventually, my love for PostgreSQL led me to specialize in it full-time.

In my first job, I experienced the sweetness firsthand - with PostgreSQL, one person’s development efficiency rivals that of a small team:

Don’t want to write backends? PostGraphQL generates GraphQL APIs directly from database schema definitions, automatically listening for DDL changes and generating corresponding CRUD methods and stored procedure wrappers. Similar tools include PostgREST and pgrest. They’re perfectly usable for small to medium-sized applications, eliminating half the backend development work.
Need Redis functionality? Just use Pg - it can simulate standard features effortlessly, and you can skip the cache layer entirely. Implement Pub/Sub using Notify/Listen/Trigger to broadcast configuration changes and implement controls conveniently.
Need analytics? Window functions, complex JOINs, CUBE, GROUPING, custom aggregates, custom languages - it’s exhilarating. If you need to scale out, use the citus extension (or switch to Greenplum). It might lack columnar storage compared to data warehouses, but it has everything else.
Need geographic functionality? PostGIS is magical - solving complex geographic requirements in a single SQL line that would otherwise require thousands of lines of code.
Storing time-series data? The timescaledb extension may not match specialized time-series databases, but it still handles millions of records per second. I’ve used it to solve hardware sensor log storage and monitoring system metrics storage requirements.
For stream computing functionality, PipelineDB can define streaming views directly: UV, PV, real-time user profiles.
PostgreSQL’s FDW (Foreign Data Wrappers) is a powerful mechanism allowing access to various data sources through a unified SQL interface. Its applications are endless:
- The built-in file_fdw extension can connect any program’s output to a data table. The simplest application is system information monitoring.
- When managing multiple PostgreSQL instances, you can use the built-in postgres_fdw to import data dictionaries from all remote databases into a metadata database. You can access metadata from all database instances uniformly, pull real-time metrics from all databases with a single SQL statement - building monitoring systems becomes a breeze.
- I once used hbase_fdw and MongoFDW to wrap historical batch data from HBase and real-time data from MongoDB into PostgreSQL tables. A single view elegantly implemented a Lambda architecture combining batch and stream processing.
- Use redis_fdw for cache update pushing; mongo_fdw for data migration from MongoDB to PG; mysql_fdw to read MySQL data into data warehouses; implement cross-database or even cross-component JOINs; complete complex ETL operations with a single SQL line that would otherwise require hundreds of lines of code - how marvelous is that?
Rich types and method support: JSON for generating frontend JSON responses directly from the database - effortless and comfortable. Range types elegantly solve edge cases that would otherwise require programmatic handling. Other examples include arrays, multidimensional arrays, custom types, enumerations, network addresses, UUIDs, and ISBNs. These out-of-the-box data structures save programmers tremendous wheel-reinventing effort.
Rich index types: general-purpose Btree indices; Brin indices that significantly optimize sequential access; Hash indices for equality queries; GIN inverted indices; GIST generalized search trees efficiently supporting geographic and KNN queries; Bitmap simultaneously leveraging multiple independent indices; Bloom efficient filtering indices; conditional indices that can dramatically reduce index size; function indices that elegantly replace redundant fields. MySQL offers pathetically few index types by comparison.
Stable, reliable, correct, and efficient. MVCC easily implements snapshot isolation, while MySQL’s RR isolation level is deficient, unable to avoid PMP and G-single anomalies. Plus, implementations based on locks and rollback segments have various pitfalls; PostgreSQL can implement high-performance serializability through SSI.
Powerful replication: WAL segment replication, streaming replication (appearing in v9: synchronous, semi-synchronous, asynchronous), logical replication (appearing in v10: subscription/publication), trigger replication, third-party replication - every type of replication you could want.
Operations-friendly: DDL can be executed in transactions (rollback-capable), index creation without table locks, adding new columns (without default values) without table locks, cleaning/backup without table locks. System views and monitoring capabilities are comprehensive.
Numerous extensions, rich features, and extreme customizability. In PostgreSQL, you can write functions in any language: Python, Go, JavaScript, Java, Shell, etc. Rather than calling Pg a database, it’s more accurate to call it a development platform. I’ve experimented with many useless but fun things: in-database crawlers, recommendation systems, neural networks, web servers, and more. There are various third-party plugins with powerful functions or creative ideas: https://pgxn.org/.
PostgreSQL’s license is friendly - BSD lets you do whatever you want. No wonder so many databases are rebranded PG products. MySQL has GPL viral infection and remains under Oracle’s thumb.

←Previous
Next→

Last modified 2025-03-22: add postgres blogs (117ac1d)