This is the multi-page printable view of this section. Click here to print.
Pigsty Blog
- Database
- Database in K8S: Pros & Cons
- NewSQL: Distributive Nonsens
- Is running postgres in docker a good idea?
- Cloud Exit
- S3: Elite to Mediocre
- Reclaim Hardware Bonus from the Cloud
- FinOps: Endgame Cloud-Exit
- SLA: Placebo or Insurance?
- EBS: Pig Slaughter Scam
- RDS: The Idiot Tax
- Postgres
- Self-Hosting Dify with PG, PGVector, and Pigsty
- PGCon.Dev 2024, The conf that shutdown PG for a week
- Postgres is eating the database world
- PostgreSQL Convention 2024
- PostgreSQL, The most successful database
- Releases
- Pigsty v3.0: Extension Exploding & Plugable Kernels
- v2.7: Extension Overwhelming
- v2.6: the OLAP New Challenger
- v2.5: Debian / Ubuntu / PG16
- v2.4: Monitoring Cloud RDS
- v2.3: Ecosystem Applications
- v2.2: Observability Overhaul
- v2.1: Vector Embedding & RAG
- v2.0: Free RDS PG Alternative
- v1.5.0 Release Note
- v1.4.0 Release Note
- v1.3.0 Release Note
- v1.2.0 Release Note
- v1.1.0 Release Note
- v1.0.0 Release Note
- v0.9.0 Release Note
- v0.8.0 Release Note
- v0.7.0 Release Note
- v0.6.0 Release Note
- v0.5.0 Release Note
- v0.4.0 Release Note
- v0.3.0 Release Note
Database
Database in K8S: Pros & Cons
Whether databases should be housed in Kubernetes/Docker remains highly controversial. While Kubernetes (k8s) excels in managing stateless applications, it has fundamental drawbacks with stateful services, especially databases like PostgreSQL and MySQL.
In the previous article, “Databases in Docker: Good or Bad,” we discussed the pros and cons of containerizing databases. Today, let’s delve into the trade-offs in orchestrating databases in K8S and explore why it’s not a wise decision.
Summary
Kubernetes (k8s) is an exceptional container orchestration tool aimed at helping developers better manage a vast array of complex stateless applications. Despite its offerings like StatefulSet, PV, PVC, and LocalhostPV for supporting stateful services (i.e., databases), these features are still insufficient for running production-level databases that demand higher reliability.
Databases are more like “pets” than “cattle” and require careful nurturing. Treating databases as “cattle” in K8S essentially turns external disk/file system/storage services into new “database pets.” Running databases on EBS/network storage presents significant disadvantages in reliability and performance. However, using high-performance local NVMe disks will make the database bound to nodes and non-schedulable, negating the primary purpose of putting them in K8S.
Placing databases in K8S results in a “lose-lose” situation - K8S loses its simplicity in statelessness, lacking the flexibility to quickly relocate, schedule, destroy, and rebuild like purely stateless use. On the other hand, databases suffer several crucial attributes: reliability, security, performance, and complexity costs, in exchange for limited “elasticity” and utilization - something virtual machines can also achieve. For users outside public cloud vendors, the disadvantages far outweigh the benefits.
The “cloud-native frenzy,” exemplified by K8S, has become a distorted phenomenon: adopting k8s for the sake of k8s. Engineers add extra complexity to increase their irreplaceability, while managers fear being left behind by the industry and getting caught up in deployment races. Using tanks for tasks that could be done with bicycles, to gain experience or prove oneself, without considering if the problem needs such “dragon-slaying” techniques - this kind of architectural juggling will eventually lead to adverse outcomes.
Until the reliability and performance of the network storage surpass local storage, placing databases in K8S is an unwise choice. There are other ways to seal the complexity of database management, such as RDS and open-source RDS solutions like Pigsty, which are based on bare Metal or bare OS. Users should make wise decisions based on their situations and needs, carefully weighing the pros and cons.
The Status Quo
K8S excels in orchestrating stateless application services but was initially limited to stateful services. Despite not being the intended purpose of K8S and Docker, the community’s zeal for expansion has been unstoppable. Evangelists depict K8S as the next-generation cloud operating system, asserting that databases will inevitably become regular applications within Kubernetes. Various abstractions have emerged to support stateful services: StatefulSet, PV, PVC, and LocalhostPV.
Countless cloud-native enthusiasts have attempted to migrate existing databases into K8S, resulting in a proliferation of CRDs and Operators for databases. Taking PostgreSQL as an example, there are already more than ten different K8S deployment solutions available: PGO, StackGres, CloudNativePG, PostgresOperator, PerconaOperator, CYBERTEC-pg-operator, TemboOperator, Kubegres, KubeDB, KubeBlocks, and so on. The CNCF landscape rapidly expands, turning into a playground of complexity.
However, complexity is a cost. With “cost reduction” becoming mainstream, voices of reflection have begun to emerge. Could-Exit Pioneers like DHH, who deeply utilized K8S in public clouds, abandoned it due to its excessive complexity during the transition to self-hosted open-source solutions, relying only on Docker and a Ruby tool named Kamal as alternatives. Many began to question whether stateful services like databases suit Kubernetes.
K8S itself, in its effort to support stateful applications, has become increasingly complex, straying from its original intention as a container orchestration platform. Tim Hockin, a co-founder of Kubernetes, also voiced his rare concerns at this year’s KubeCon in “K8s is Cannibalizing Itself!”: “Kubernetes has become too complex; it needs to learn restraint, or it will stop innovating and lose its base.”
Lose-Lose Situation
In the cloud-native realm, the analogy of “pets” versus “cattle” is often used for illustrating stateful services. “Pets,” like databases, need careful and individual care, while “cattle” represent disposable, stateless applications (Disposability).
Cloud Native Applications 12 Factors: Disposability
One of the leading architectural goals of K8S is to treat what can be treated as cattle as cattle. The attempt to “separate storage from computation” in databases follows this strategy: splitting stateful database services into state storage outside K8S and pure computation inside K8S. The state is stored on the EBS/cloud disk/distributed storage service, allowing the “stateless” database part to be freely created, destroyed, and scheduled in K8S.
Unfortunately, databases, especially OLTP databases, heavily depend on disk hardware, and network storage’s reliability and performance still lag behind local disks by orders of magnitude. Thus, K8S offers the LocalhostPV option, allowing containers to use data volumes directly lies on the host operating system, utilizing high-performance/high-reliability local NVMe disk storage.
However, this presents a dilemma: should one use subpar cloud disks and tolerate poor database reliability/performance for K8S’s scheduling and orchestration capabilities? Or use high-performance local disks tied to host nodes, virtually losing all flexible scheduling abilities? The former is like stuffing an anchor into K8S’s small boat, slowing overall speed and agility; the latter is like anchoring and pinning the ship to a specific point.
Running a stateless K8S cluster is simple and reliable, as is running a stateful database on a physical machine’s bare operating system. Mixing the two, however, results in a lose-lose situation: K8S loses its stateless flexibility and casual scheduling abilities, while the database sacrifices core attributes like reliability, security, efficiency, and simplicity in exchange for elasticity, resource utilization, and Day1 delivery speed that are not fundamentally important to databases.
A vivid example of the former is the performance optimization of PostgreSQL@K8S, which KubeBlocks contributed. K8S experts employed various advanced methods to solve performance issues that did not exist on bare metal/bare OS at all. A fresh case of the latter is Didi’s K8S architecture juggling disaster; if it weren’t for putting the stateful MySQL in K8S, would rebuilding a stateless K8S cluster and redeploying applications take 12 hours to recover?
Pros and Cons
For serious technology decisions, the most crucial aspect is weighing the pros and cons. Here, in the order of “quality, security, performance, cost,” let’s discuss the technical trade-offs of placing databases in K8S versus classic bare metal/VM deployments. I don’t want to write a comprehensive paper that covers everything. Instead, I’ll throw some specific questions for consideration and discussion.
Quality
K8S, compared to physical deployments, introduces additional failure points and architectural complexity, increasing the blast radius and significantly prolonging the average recovery time of failures. In “Is it a Good Idea to Put Databases into Docker?”, we provided an argument about reliability, which can also apply to Kubernetes — K8S and Docker introduce additional and unnecessary dependencies and failure points to databases, lacking community failure knowledge accumulation and reliability track record (MTTR/MTBF).
In the cloud vendor classification system, K8S belongs to PaaS, while RDS belongs to a more fundamental layer, IaaS. Database services have higher reliability requirements than K8S; for instance, many companies’ cloud management platforms rely on an additional CMDB database. Where should this database be placed? You shouldn’t let K8S manage things it depends on, nor should you add unnecessary extra dependencies. The Alibaba Cloud global epic failure and Didi’s K8S architecture juggling disaster have taught us this lesson. Moreover, maintaining a separate database system inside K8S when there’s already one outside is even more unjustifiable.
Security
The database in a multi-tenant environment introduces additional attack surfaces, bringing higher risks and more complex audit compliance challenges. Does K8S make your database more secure? Maybe the complexity of K8S architecture juggling will deter script kiddies unfamiliar with K8S, but for real attackers, more components and dependencies often mean a broader attack surface.
In “BrokenSesame Alibaba Cloud PostgreSQL Vulnerability Technical Details”, security personnel escaped to the K8S host node using their own PostgreSQL container and accessed the K8S API and other tenants’ containers and data. This is clearly a K8S-specific issue — the risk is real, such attacks have occurred, and even Alibaba Cloud, a local cloud industry leader, has been compromised.
《The Attacker Perspective - Insights From Hacking Alibaba Cloud》
Performance
As stated in “Is it a Good Idea to Put Databases into Docker?”, whether it’s additional network overhead, Ingress bottlenecks, or underperforming cloud disks, all negatively impact database performance. For example, as revealed in “PostgreSQL@K8s Performance Optimization” — you need a considerable level of technical prowess to make database performance in K8S barely match that on bare metal.
Latency is measured in ms, not µs; I almost thought my eyes were deceiving me.
Another misconception about efficiency is resource utilization. Unlike offline analytical businesses, critical online OLTP databases should not aim to increase resource utilization but rather deliberately lower it to enhance system reliability and user experience. If there are many fragmented businesses, resource utilization can be improved through PDB/shared database clusters. K8S’s advocated elasticity efficiency is not unique to it — KVM/EC2 can also effectively address this issue.
In terms of cost, K8S and various Operators provide a decent abstraction, encapsulating some of the complexity of database management, which is attractive for teams without DBAs. However, the complexity reduced by using it to manage databases pales in comparison to the complexity introduced by using K8S itself. For instance, random IP address drifts and automatic Pod restarts may not be a big issue for stateless applications, but for databases, they are intolerable — many companies have had to attempt to modify kubelet to avoid this behavior, thereby introducing more complexity and maintenance costs.
As stated in “From Reducing Costs and Smiles to Reducing Costs and Efficiency” “Reducing Complexity Costs” section: Intellectual power is hard to accumulate spatially: when a database encounters problems, it needs database experts to solve them; when Kubernetes has problems, it needs K8S experts to look into them; however, when you put a database into Kubernetes, complexities combine, the state space explodes, but the intellectual bandwidth of individual database experts and K8S experts is hard to stack — you need a dual expert to solve the problem, and such experts are undoubtedly much rarer and more expensive than pure database experts. Such architectural juggling is enough to cause major setbacks for most teams, including top public clouds/big companies, in the event of a failure.
The Cloud-Native Frenzy
An interesting question arises: if K8S is unsuitable for stateful databases, why are so many companies, including big players, rushing to do this? The reasons are not technical.
Google open-sourced its K8S battleship, modeled after its internal Borg spaceship, and managers, fearing being left behind, rushed to adopt it, thinking using K8S would put them on par with Google. Ironically, Google doesn’t use K8S; it was more likely to disrupt AWS and mislead the industry. However, most companies don’t have the manpower like Google to operate such a battleship. More importantly, their problems might need a simple vessel. Running MySQL + PHP, PostgreSQL + Go/Python on bare metal has already taken many companies to IPO.
Under modern hardware conditions, the complexity of most applications throughout their lifecycle doesn’t justify using K8S. Yet, the “cloud-native” frenzy, epitomized by K8S, has become a distorted phenomenon: adopting k8s just for the sake of k8s. Some engineers are looking for “advanced” and “cool” technologies used by big companies to fulfill their personal goals like job hopping or promotions or to increase their job security by adding complexity, not considering if these “dragon-slaying” techniques are necessary for solving their problems.
The cloud-native landscape is filled with fancy projects. Every new development team wants to introduce something new: Helm today, Kubevela tomorrow. They talk big about bright futures and peak efficiency, but in reality, they create a mountain of architectural complexities and a playground for “YAML Boys” - tinkering with the latest tech, inventing concepts, earning experience and reputation at the expense of users who bear the complexity and maintenance costs.
CNCF Landscape
The cloud-native movement’s philosophy is compelling - democratizing the elastic scheduling capabilities of public clouds for every user. K8S indeed excels in stateless applications. However, excessive enthusiasm has led K8S astray from its original intent and direction - simply doing well in orchestrating stateless applications, burdened by the ill-conceived support for stateful applications.
Making Wise Decisions
Years ago, when I first encountered K8S, I too was fervent —— It was at TanTan. We had over twenty thousand cores and hundreds of database clusters, and I was eager to try putting databases in Kubernetes and testing all the available Operators. However, after two to three years of extensive research and architectural design, I calmed down and abandoned this madness. Instead, I architected our database service based on bare metal/operating systems. For us, the benefits K8S brought to databases were negligible compared to the problems and hassles it introduced.
Should databases be put into K8S? It depends: for public cloud vendors who thrive on overselling resources, elasticity and utilization are crucial, which are directly linked to revenue and profit, While reliability and performance take a back seat - after all, an availability below three nines means compensating 25% monthly credit. But for most user, including ourselves, these trade-offs hold different: One-time Day1 Setup, elasticity, and resource utilization aren’t their primary concerns; reliability, performance, Day2 Operation costs, these core database attributes are what matter most.
We open-sourced our database service architecture — an out-of-the-box PostgreSQL distribution and a local-first RDS alternative: Pigsty. We didn’t choose the so-called “build once, run anywhere” approach of K8S and Docker. Instead, we adapted to different OS distros & major versions, and used Ansible to achieve a K8S CRD IaC-like API to seal management complexity. This was arduous, but it was the right thing to do - the world does not need another clumsy attempt at putting PostgreSQL into K8S. Still, it does need a production database service architecture that maximizes hardware performance and reliability.
Pigsty vs StackGres
Perhaps one day, when the reliability and performance of distributed network storage surpass local storage and mainstream databases have some native support for storage-computation separation, things might change again — K8S might become suitable for databases. But for now, I believe putting serious production OLTP databases into K8S is immature and inappropriate. I hope readers will make wise choices on this matter.
Reference
Database in Docker: Is that a good idea?
《What can we learn from DiDi’s Epic k8s Failure》
NewSQL: Distributive Nonsens
As hardware technology advances, the capacity and performance of standalone databases have reached unprecedented heights. In this transformative era, distributed (TP) databases appear utterly powerless, much like the “data middle platform,” donning the emperor’s new clothes in a state of self-deception.
- TL; DR
- The Pull of the Internet
- The Trade-Offs of Distributive
- The Impact of New Hardware
- The Predicament of False Needs
- The Struggles in Confusion
- References
TL; DR
The core trade-off of distributed databases is: “quality for quantity,” sacrificing functionality, performance, complexity, and reliability for greater data capacity and throughput. However, “what divides must eventually converge,” and hardware innovations have propelled centralized databases to new heights in capacity and throughput, rendering distributed (TP) databases obsolete.
Hardware, exemplified by NVMe SSDs, follows Moore’s Law, evolving at an exponential pace. Over a decade, performance has increased by tens of times, and prices have dropped significantly, improving the cost-performance ratio by three orders of magnitude. A single card can now hold 32TB+, with 4K random read/write IOPS reaching 1600K/600K, latency at 70µs/10µs, and a cost of less than 200 ¥/TB·year. Running a centralized database on a single machine can achieve one to two million point write/point query QPS.
Scenarios truly requiring distributed databases are few and far between, with typical mid-sized internet companies/banks handling request volumes ranging from tens to hundreds of thousands of QPS, and non-repetitive TP data at the hundred TB level. In the real world, over 99% of scenarios do not need distributed databases, and the remaining 1% can likely be addressed through classic engineering solutions like horizontal/vertical partitioning.
Top-tier internet companies might have a few genuine use cases, yet these companies have no intention to pay. The market simply cannot sustain so many distributed database cores, and the few products that do survive don’t necessarily rely on distribution as their selling point. HATP and the integration of distributed and standalone databases represent the struggles of confused distributed TP database vendors seeking transformation, but they are still far from achieving product-market fit.
The Pull of the Internet
“Distributed database” is not a term with a strict definition. In a narrow sense, it highly overlaps with NewSQL databases such as CockroachDB, YugabyteDB, TiDB, OceanBase, and TDSQL; broadly speaking, classic databases like Oracle, PostgreSQL, MySQL, SQL Server, PolarDB, and Aurora, which span multiple physical nodes and use master-slave replication or shared storage, can also be considered distributed databases. In the context of this article, a distributed database refers to the former, specifically focusing on transactional processing (OLTP) distributed relational databases.
The rise of distributed databases stemmed from the rapid development of internet applications and the explosive growth of data volumes. In that era, traditional relational databases often encountered performance bottlenecks and scalability issues when dealing with massive data and high concurrency. Even using Oracle with Exadata struggled in the face of voluminous CRUD operations, not to mention the prohibitively expensive annual hardware and software costs.
Internet companies embarked on a different path, building their infrastructure with free, open-source databases like MySQL. Veteran developers/DBAs might still recall the MySQL best practice: keep single-table records below 21 million to avoid rapid performance degradation. Correspondingly, database sharding became a widely recognized practice among large companies.
The basic idea here was “three cobblers with their wits combined equal Zhuge Liang,” using a bunch of inexpensive x86 servers + numerous sharded open-source database instances to create a massive CRUD simple data store. Thus, distributed databases often originated from internet company scenarios, evolving along the manual sharding → sharding middleware → distributed database path.
As an industry solution, distributed databases have successfully met the needs of internet companies. However, before abstracting and solidifying it into a product for external output, several questions need to be clarified:
Do the trade-offs from ten years ago still hold up today?
Are the scenarios of internet companies applicable to other industries?
Could distribute OLTP databases be a false necessity?
The Trade-Offs of Distributive
“Distributed,” along with buzzwords like “HTAP,” “compute-storage separation,” “Serverless,” and “lakehouse,” holds no inherent meaning for enterprise users. Practical clients focus on tangible attributes and capabilities: functionality, performance, security, reliability, return on investment, and cost-effectiveness. What truly matters is the trade-off: compared to classic centralized databases, what do distributed databases sacrifice, and what do they gain in return?
数据库需求层次金字塔[1]
The core trade-off of distributed databases can be summarized as “quality for quantity”: sacrificing functionality, performance, complexity, and reliability to gain greater data capacity and request throughput.
NewSQL often markets itself on the concept of “distribution,” solving scalability issues through “distribution.” Architecturally, it typically features multiple peer data nodes and a coordinator, employing distributed consensus protocols like Paxos/Raft for replication, allowing for horizontal scaling by adding data nodes.
Firstly, due to their inherent limitations, distributed databases sacrifice many features, offering only basic and limited CRUD query support. Secondly, because distributed databases require multiple network RPCs to complete requests, their performance typically suffers a 70% or more degradation compared to centralized databases. Furthermore, distributed databases, consisting of DN/CN and TSO components among others, introduce significant complexity in operations and management. Lastly, in terms of high availability and disaster recovery, distributed databases do not offer a qualitative improvement over the classic centralized master-slave setup; instead, they introduce numerous additional failure points due to their complex components.
SYSBENCH吞吐对比[2]
In the past, the trade-offs of distributed databases were justified: the internet required larger data storage capacities and higher access throughputs—a must-solve problem, and these drawbacks were surmountable. But today, hardware advancements have rendered the “quantity” question obsolete, thus erasing the raison d’être of distributed databases along with the very problem they sought to solve.
Times have changed, My lord!
The Impact of New Hardware
Moore’s Law posits that every 18 to 24 months, processor performance doubles while costs halve. This principle largely applies to storage as well. From 2013 to 2023, spanning 5 to 6 cycles, we should see performance and cost differences of dozens of times compared to a decade ago. Is this the case?
Let’s examine the performance metrics of a typical SSD from 2013 and compare them with those of a typical PCI-e Gen4 NVMe SSD from 2022. It’s evident that the SSD’s 4K random read/write IOPS have jumped from 60K/40K to 1600K/600K, with prices plummeting from 2220$/TB to 40$/TB. Performance has improved by 15 to 26 times, while prices have dropped 56-fold[3,4,5], certainly validating the rule of thumb at a magnitude level.
HDD/SSD Performance in 2013
NVMe Gen4 SSD in 2022
A decade ago, mechanical hard drives dominated the market. A 1TB hard drive cost about seven or eight hundred yuan, and a 64GB SSD was even more expensive. Ten years later, a mainstream 3.2TB enterprise-grade NVMe SSD costs just three thousand yuan. Considering a five-year warranty, the monthly cost per TB is only 16 yuan, with an annual cost under 200 yuan. For reference, cloud providers’ reputedly cost-effective S3 object storage costs 1800¥/TB·year.
Price per unit of SSD/HDD from 2013 to 2030 with predictions
The typical fourth-generation local NVMe disk can reach a maximum capacity of 32TB to 64TB, offering 70µs/10µs 4K random read/write latencies, and 1600K/600K read/write IOPS, with the fifth generation boasting an astonishing bandwidth of several GB/s per card.
Equipping a classic Dell 64C / 512G server with such a card, factoring in five years of IDC depreciation, the total cost is under one hundred thousand yuan. Such a server running PostgreSQL sysbench can nearly reach one million QPS for single-point writes and two million QPS for point queries without issue.
What does this mean? For a typical mid-sized internet company/bank, the demand for database requests is usually in the tens of thousands to hundreds of thousands of QPS, with non-repeated TP data volumes fluctuating around hundreds of TBs. Considering hardware storage compression cards can achieve several times compression ratio, such scenarios might now be manageable by a centralized database on a single machine and card under modern hardware conditions[6].
Previously, users might have had to invest millions in high-end storage solutions like exadata, then spend a fortune on Oracle commercial database licenses and original factory services. Now, achieving similar outcomes starts with just a few thousand yuan on an enterprise-grade SSD card; open-source Oracle alternatives like PostgreSQL, capable of smoothly running the largest single tables of 32TB, no longer suffer from the limitations that once forced MySQL into partitioning. High-performance database services, once luxury items restricted to intelligence/banking sectors, have become affordable for all industries[7].
Cost-effectiveness is the primary product strength. The cost-effectiveness of high-performance, large-capacity storage has improved by three orders of magnitude over a decade, making the once-highlighted value of distributed databases appear weak in the face of such remarkable hardware evolution.
The Predicament of False Needs
Nowadays, sacrificing functionality, performance, complexity for scalability is most likely to be a fake-demands in most scenarios.
With the support of modern hardware, over 99% of real-world scenarios do not exceed the capabilities of a centralized, single-machine database. The remaining scenarios can likely be addressed through classical engineering methods like horizontal or vertical splitting. This holds true even for internet companies: even among the global top firms, scenarios where a transactional (TP) single table exceeds several tens of TBs are still rare.
Google Spanner, the forefather of NewSQL, was designed to solve the problem of massive data scalability, but how many enterprises actually handle data volumes comparable to Google’s? In terms of data volume, the lifetime TP data volume for the vast majority of enterprises will not exceed the bottleneck of a centralized database, which continues to grow exponentially with Moore’s Law. Regarding request throughput, many enterprises have enough database performance headroom to implement all their business logic in stored procedures and run it smoothly within the database.
“Premature optimization is the root of all evil,” designing for unneeded scale is a waste of effort. If volume is no longer an issue, then sacrificing other attributes for unneeded volume becomes meaningless.
“Premature optimization is the root of all evil”
In many subfields of databases, distributed technology is not a pseudo-requirement: if you need a highly reliable, disaster-resilient, simple, low-frequency KV storage for metadata, then a distributed etcd is a suitable choice; if you require a globally distributed table for arbitrary reads and writes across different locations and are willing to endure significant performance degradation, then YugabyteDB might be a good choice. For ensuring transparency and preventing tampering and denial, blockchain is fundamentally a leaderless distributed ledger database;
For large-scale data analytics (OLAP), distributed technology is indispensable (though this is usually referred to as data warehousing, MPP); however, in the transaction processing (OLTP) domain, distributed technology is largely unnecessary: OLTP databases are like working memory, characterized by being small, fast, and feature-rich. Even in very large business systems, the active working set at any one moment is not particularly large. A basic rule of thumb for OLTP system design is: If your problem can be solved within a single machine, don’t bother with distributed databases.
OLTP databases have a history spanning several decades, with existing cores developing to a mature stage. Standards in the TP domain are gradually converging towards three Wire Protocols: PostgreSQL, MySQL, and Oracle. If the discussion is about tinkering with database auto-sharding and adding global transactions as a form of “distribution,” it’s definitely a dead end. If a “distributed” database manages to break through, it’s likely not because of the “pseudo-requirement” of “distribution,” but rather due to new features, open-source ecosystems, compatibility, ease of use, domestic innovation, and self-reliance.
The Struggles in Confusion
The greatest challenge for distributed databases stems from the market structure: Internet companies, the most likely candidates to utilize distributed TP databases, are paradoxically the least likely to pay for them. Internet companies can serve as high-quality users or even contributors, offering case studies, feedback, and PR, but they inherently resist the notion of financially supporting software, clashing with their meme instincts. Even leading distributed database vendors face the challenge of being applauded but not financially supported.
In a recent casual conversation with an engineer at a distributed database company, it was revealed that during a POC with a client, a query that Oracle completed in 10 seconds, their distributed database could only match with an order of magnitude difference, even when utilizing various resources and Dirty Hacks. Even openGauss, which forked from PostgreSQL 9.2 a decade ago, can outperform many distributed databases in certain scenarios, not to mention the advancements seen in PostgreSQL 15 and Oracle 23c ten years later. This gap is so significant that even the original manufacturers are left puzzled about the future direction of distributed databases.
Thus, some distributed databases have started pivoting towards self-rescue, with HTAP being a prime example: while transaction processing in a distributed setting is suboptimal, analytics can benefit greatly. So, why not combine the two? A single system capable of handling both transactions and analytics! However, engineers in the real world understand that AP systems and TP systems each have their own patterns, and forcibly merging two diametrically opposed systems will only result in both tasks failing to succeed. Whether it’s classic ETL/CDC pushing and pulling to specialized solutions like ClickHouse/Greenplum/Doris, or logical replication to a dedicated in-memory columnar store, any of these approaches is more reliable than using a chimera HTAP database.
Another idea is monolithic-distributed integration: if you can’t beat them, join them by adding a monolithic mode to avoid the high costs of network RPCs, ensuring that in 99% of scenarios where distributed capabilities are unnecessary, they aren’t completely outperformed by centralized databases — even if distributed isn’t needed, it’s essential to stay in the game and prevent others from taking the lead! But the fundamental issue here is the same as with HTAP: forcing heterogeneous data systems together is pointless. If there was value in doing so, why hasn’t anyone created a monolithic binary that integrates all heterogeneous databases into a do-it-all behemoth — the Database Jack-of-all-trades? Because it violates the KISS principle: Keep It Simple, Stupid!
The plight of distributed databases is similar to that of Middle Data Platforms: originating from internal scenarios at major internet companies and solving domain-specific problems. Once riding the wave of the internet industry, the discussion of databases was dominated by distributed technologies, enjoying a moment of pride. However, due to excessive hype and promises of unrealistic capabilities, they failed to meet user expectations, ending in disappointment and becoming akin to the emperor’s new clothes.
There are still many areas within the TP database field worthy of focus: Leveraging new hardware, actively embracing changes in underlying architectures like CXL, RDMA, NVMe; or providing simple and intuitive declarative interfaces to make database usage and management more convenient; offering more intelligent automatic monitoring and control systems to minimize operational tasks; developing compatibility plugins like Babelfish for MySQL/Oracle, aiming for a unified relational database WireProtocol. Even investing in better support services would be more meaningful than chasing the false need for “distributed” features.
Time changes, and a wise man adapts. It is hoped that distributed database vendors will find their Product-Market Fit and focus on what users truly need.
References
[1] 数据库需求层次金字塔 : https://mp.weixin.qq.com/s/1xR92Z67kvvj2_NpUMie1Q
[2] PostgreSQL到底有多强? : https://mp.weixin.qq.com/s/651zXDKGwFy8i0Owrmm-Xg
[3] SSD Performence in 2013 : https://www.snia.org/sites/default/files/SNIASSSI.SSDPerformance-APrimer2013.pdf
[4] 2022 Micron NVMe SSD Spec: https://media-www.micron.com/-/media/client/global/documents/products/product-flyer/9400_nvme_ssd_product_brief.pdf
[5] 2013-2030 SSD Pricing : https://blocksandfiles.com/2021/01/25/wikibon-ssds-vs-hard-drives-wrights-law/
[6] Single Instance with 100TB: https://mp.weixin.qq.com/s/JSQPzep09rDYbM-x5ptsZA
[7] EBS: Scam: https://mp.weixin.qq.com/s/UxjiUBTpb1pRUfGtR9V3ag
[8] 中台:一场彻头彻尾的自欺欺人: https://mp.weixin.qq.com/s/VgTU7NcOwmrX-nbrBBeH_w
Is running postgres in docker a good idea?
For stateless app services, containers are an almost perfect devops solution. However, for stateful services like databases, it’s not so straightforward. Whether production databases should be containerized remains controversial.
From a developer’s perspective, I’m a big fan of Docker & Kubernetes and believe that they might be the future standard for software deployment and operations. But as a database administrator, I think hosting production databases in Docker/K8S is still a bad idea.
What problems does Docker solve?
Docker is described with terms like lightweight, standardized, portable, cost-effective, efficient, automated, integrated, and high-performance in operations. These claims are valid, as Docker indeed simplifies both development and operations. This explains why many companies are eager to containerize their software and services. However, this enthusiasm sometimes goes to the extreme of containerizing everything, including production databases.
Containers were originally designed for stateless apps, where temporary data produced by the app is logically part of the container. A service is created with a container and destroyed after use. These apps are stateless, with the state typically stored outside in a database, reflecting the classic architecture and philosophy of containerization.
But when it comes to containerizing the production database itself, the scenario changes: databases are stateful. To maintain their state without losing it when the container stops, database containers need to “punch a hole” to the underlying OS, which is named data volumes.
Such containers are no longer ephemeral entities that can be freely created, destroyed, moved, or transferred; they become bound to the underlying environment. Thus, the many advantages of using containers for traditional apps are not applicable to database containers.
Reliability
Getting software up & running is one thing; ensuring its reliability is another. Databases, central to information systems, are often critical, with failure leading to catastrophic consequences. This reflects common experience: while office software crashes can be tolerated and resolved with restarts, document loss or corruption is unresolvable and disastrous. Database failure without replica & backups can be terminal, particularly for internet/finance companies.
Reliability is the paramount attribute for databases. It’s the system’s ability to function correctly during adversity (hardware/software faults, human error), i.e. fault tolerance and resilience. Unlike liveness attribute such as performance, reliability, a safety attribute, proves itself over time or falsify by failures, often overlooked until disaster strikes.
Docker’s description notably omits “reliability” —— the crucial attribute for database.
Reliability Proof
As mentioned, reliability lacks a definitive measure. Confidence in a system’s reliability builds over time through consistent, correct operation (MTTF). Deploying databases on bare metal has been a long-standing practice, proven reliable over decades. Docker, despite revolutionizing DevOps, has a mere ten-year track record, which is insufficient for establishing reliability, especially for mission-critical production databases. In essence, there haven’t been enough “guinea pigs” to clear the minefield.
Community Knowledge
Improving reliability hinges on learning from failures. Failures are invaluable, turning unknowns into knowns and forming the bedrock of operational knowledge. Community experience with failures is predominantly based on bare-metal deployments, with a plethora of issues well-trodden over decades. Encountering a problem often means finding a well-documented solution, thanks to previous experiences. However, add “Docker” to the mix, and the pool of useful information shrinks significantly. This implies a lower success rate in data recovery and longer times to resolve complex issues when they arise.
A subtle reality is that, without compelling reasons, businesses and individuals are generally reluctant to share experiences with failures. Failures can tarnish a company’s reputation, potentially exposing sensitive data or reflecting poorly on the organization and team. Moreover, insights from failures are often the result of costly lessons and financial losses, representing core value for operations personnel, thus public documentation on failures is scarce.
Extra Failure Point
Running databases in Docker doesn’t reduce the chances of hardware failures, software bugs, or human errors. Hardware issues persist with or without Docker. Software defects, mainly application bugs, aren’t lessened by containerization, and the same goes for human errors. In fact, Docker introduces extra components, complexity, and failure points, decreasing overall system reliability.
Consider this simple scenario: if the Docker daemon crashes, the database process dies. Such incidents, albeit rare, are non-existent on bare-metal.
Moreover, the failure points from an additional component like Docker aren’t limited to Docker itself. Issues could arise from interactions between Docker and the database, the OS, orchestration systems, VMs, networks, or disks. For evidence, see the issue tracker for the official PostgreSQL Docker image: https://github.com/docker-library/postgres/issues?q=.
Intellectual power doesn’t easily stack — a team’s intellect relies on the few seasoned members and their communication overhead. Database issues require database experts; container issues, container experts. However, when databases are deployed on kubernetes & dockers, merging the expertise of database and K8S specialists is challenging — you need a dual-expert to resolve issues, and such individuals are rarer than specialists in one domain.
Moreover, one man’s meat is another man’s poison. Certain Docker features might turn into bugs under specific conditions.
Unnecessary Isolation
Docker provides process-level isolation, which generally benefits applications by reducing interaction-related issues, thereby enhancing system reliability. However, this isolation isn’t always advantageous for databases.
A subtle real-world case involved starting two PostgreSQL server on the same data directory, either on the host or one in the host and another inside a container. On bare metal, the second instance would fail to start as PostgreSQL recognizes the existing instance and refuses to launch; however, Docker’s isolation allows the second instance to start obliviously, potentially toast the data files if proper fencing mechanisms (like host port or PID file exclusivity) aren’t in place.
Do databases need isolation? Absolutely, but not this kind. Databases often demand dedicated physical machines for performance reasons, with only the database process and essential tools running. Even in containers, they’re typically bound exclusively to physical/virtual machines. Thus, the type of isolation Docker provides is somewhat irrelevant for such deployments, though it is a handy feature for cloud providers to efficiently oversell in a multi-tenant environment.
Maintainability
Docker simplify the day one setup, but bring much more troubles on day two operation.
The bulk of software expenses isn’t in initial development but in ongoing maintenance, which includes fixing vulnerabilities, ensuring operational continuity, handling outages, upgrading versions, repaying technical debt, and adding new features. Maintainability is crucial for the quality of life in operations work. Docker shines in this aspect with its infrastructure-as-code approach, effectively turning operational knowledge into reusable code, accumulating it in a streamlined manner rather than scattered across various installation/setup documents. Docker excels here, especially for stateless applications with frequently changing logic. Docker and Kubernetes facilitate deployment, scaling, publishing, and rolling upgrades, allowing Devs to perform Ops tasks, and Ops to handle DBA duties (somewhat convincingly).
Day 1 Setup
Perhaps Docker’s greatest strength is the standardization of environment configuration. A standardized environment aids in delivering changes, discussing issues, and reproducing bugs. Using binary images (essentially materialized Dockerfile installation scripts) is quicker and easier to manage than running installation scripts. Not having to rebuild complex, dependency-heavy extensions each time is a notable advantage.
Unfortunately, databases don’t behave like typical business applications with frequent updates, and creating new instances or delivering environments is a rare operation. Additionally, DBAs often accumulate various installation and configuration scripts, making environment setup almost as fast as using Docker. Thus, Docker’s advantage in environment configuration isn’t as pronounced, falling into the “nice to have” category. Of course, in the absence of a dedicated DBA, using Docker images might still be preferable as they encapsulate some operational experience.
Typically, it’s not unusual for databases to run continuously for months or years after initialization. The primary aspect of database management isn’t creating new instances or delivering environments, but the day-to-day operations — Day2 Operation. Unfortunately, Docker doesn’t offer much benefit in this area and can introduce additional complications.
Day2 Operation
Docker can significantly streamline the maintenance of stateless apps, enabling easy create/destroy, version upgrades, and scaling. However, does this extend to databases?
Unlike app containers, database containers can’t be freely destroyed or created. Docker doesn’t enhance the operational experience for databases; tools like Ansible are more beneficial. Often, operations require executing scripts inside containers via docker exec
, adding unnecessary complexity.
CLI tools often struggle with Docker integration. For instance, docker exec
mixes stderr
and stdout
, breaking pipeline-dependent commands. In bare-metal deployments, certain ETL tasks for PostgreSQL can be easily done with a single Bash line.
psql <src-url> -c 'COPY tbl TO STDOUT' | psql <dst-url> -c 'COPY tdb FROM STDIN'
Yet, without proper client binaries on the host, one must awkwardly use Docker’s binaries like:
docker exec -it srcpg gosu postgres bash -c "psql -c \"COPY tbl TO STDOUT\" 2>/dev/null" |\
docker exec -i dstpg gosu postgres psql -c 'COPY tbl FROM STDIN;'
complicating simple commands like physical backups, which require layers of command wrapping:
docker exec -i postgres_pg_1 gosu postgres bash -c 'pg_basebackup -Xf -Ft -c fast -D - 2>/dev/null' | tar -xC /tmp/backup/basebackup
docker
,gosu
,bash
,pg_basebackup
Client-side applications (psql
, pg_basebackup
, pg_dump
) can bypass these issues with version-matched client tools on the host, but server-side solutions lack such workarounds. Upgrading containerized database software shouldn’t necessitate host server binary upgrades.
Docker advocates for easy software versioning; updating a minor database version is straightforward by tweaking the Dockerfile and restarting the container. However, major version upgrades requiring state modification are more complex in Docker, often leading to convoluted processes like those in https://github.com/tianon/docker-postgres-upgrade.
If database containers can’t be scheduled, scaled, or maintained as easily as AppServers, why use them in production? While stateless apps benefit from Docker and Kubernetes’ scaling ease, stateful applications like databases don’t enjoy such flexibility. Replicating a large production database is time-consuming and manual, questioning the efficiency of using docker run
for such operations.
Docker’s awkwardness in hosting production databases stems from the stateful nature of databases, requiring additional setup steps. Setting up a new PostgreSQL replica, for instance, involves a local data directory clone and starting the postmaster
process. Container lifecycle tied to a single process complicates database scaling and replication, leading to inelegant and complex solutions. This process isolation in containers, or “abstraction leakage,” fails to neatly cover the multiprocess, multitasking nature of databases, introducing unnecessary complexity and affecting maintainability.
In conclusion, while Docker can improve system maintainability in some aspects, like simplifying new instance creation, the introduced complexities often undermine these benefits.
Tooling
Databases require tools for maintenance, including a variety of operational scripts, deployment, backup, archiving, failover, version upgrades, plugin installation, connection pooling, performance analysis, monitoring, tuning, inspection, and repair. Most of these tools are designed for bare-metal deployments. Like databases, these tools need thorough and careful testing. Getting something to run versus ensuring its stable, long-term, and correct operation are distinct levels of reliability.
A simple example is plugin and package management. PostgreSQL offers many useful plugins, such as PostGIS. On bare metal, installing this plugin is as easy as executing yum install
followed by create extension postgis
. However, in Docker, following best practices requires making changes at the image level to persist the extension beyond container restarts. This necessitates modifying the Dockerfile, rebuilding the image, pushing it to the server, and restarting the database container, undeniably a more cumbersome process.
Package management is a core aspect of OS distributions. Docker complicates this, as many PostgreSQL binaries are distributed not as RPM/DEB packages but as Docker images with pre-installed extensions. This raises a significant issue: how to consolidate multiple disparate images if one needs to use two, three, or over a hundred extensions from the PostgreSQL ecosystem? Compared to reliable OS package management, building Docker images invariably requires more time and effort to function properly.
Take monitoring as another example. In traditional bare-metal deployment, machine metrics are crucial for database monitoring. Monitoring in containers differs subtly from that on bare metal, and oversight can lead to pitfalls. For instance, the sum of various CPU mode durations always equals 100% on bare metal, but this assumption doesn’t necessarily hold in containers. Moreover, monitoring tools relying on the /proc
filesystem may yield metrics in containers that differ significantly from those on bare metal. While such issues are solvable (e.g., mounting the Proc filesystem inside the container), complex and ugly workarounds are generally unwelcome compared to straightforward solutions.
Similar issues arise with some failure detection tools and common system commands. Theoretically, these could be executed directly on the host, but can we guarantee that the results in the container will be identical to those on bare metal? More frustrating is the emergency troubleshooting process, where necessary tools might be missing in the container, and with no external network access, the Dockerfile→Image→Restart path can be exasperating.
Treating Docker like a VM, many tools may still function, but this defeats much of Docker’s purpose, reducing it to just another package manager. Some argue that Docker enhances system reliability through standardized deployment, given the more controlled environment. While this is true, I believe that if the personnel managing the database understand how to configure the database environment, there’s no fundamental difference between scripting environment initialization in a Shell script or in a Dockerfile.
Scalability
Performance is another point that people concerned a lot. From the performance perspective, the basic principle of database deployment is: The close to hardware, The better it is. Additional isolation & abstraction layer is bad for database performance. More isolation means more overhead, even if it is just an additional memcpy
in the kernel .
For performance-seeking scenarios, some databases choose to bypass the operating system’s page management mechanism to operate the disk directly, while some databases may even use FPGA or GPU to speed up query processing. Docker as a lightweight container, performance suffers not much, and the impact to performance-insensitive scenarios may not be significant, but the extra abstract layer will definitely make performance worse than make it better.
Summary
Container and orchestration technologies are valuable for operations, bridging the gap between software and services by aiming to codify and modularize operational expertise and capabilities. Container technology is poised to become the future of package management, while orchestration evolves into a “data center distributed cluster operating system,” forming the underlying infrastructure runtime for all software. As more challenges are addressed, confidently running both stateful and stateless applications in containers will become feasible. However, for databases, this remains an ideal rather than a practical option, especially in production.
It’s crucial to reiterate that the above discussion applies specifically to production databases. For development and testing, despite the existence of Vagrant-based virtual machine sandboxes, I advocate for Docker use—many developers are unfamiliar with configuring local test database environments, and Docker provides a clearer, simpler solution. For stateless production applications or those with non-critical derivative state data (like Redis caches), Docker is a good choice. But for core relational databases in production, where data integrity is paramount, one should carefully consider the risks and benefits: What’s the value of using Docker here? Can it handle potential issues? Are you prepared to assume the responsibility if things go wrong?
Every technological decision involves balancing pros and cons, like the core trade-off here of sacrificing reliability for maintainability with Docker. Some scenarios may warrant this, such as cloud providers optimizing for containerization to oversell resources, where container isolation, high resource utilization, and management convenience align well. Here, the benefits might outweigh the drawbacks. However, in many cases, reliability is the top priority, and compromising it for maintainability is not advisable. Moreover, it’s debatable whether using Docker significantly eases database management; sacrificing long-term operational maintainability for short-term deployment ease is unwise.
In conclusion, containerizing production databases is likely not a prudent choice.
Cloud Exit
S3: Elite to Mediocre
Object storage (S3) has been a defining service of cloud computing, once hailed as a paragon of cost reduction in the cloud era. Unfortunately, with the evolution of hardware and the emergence of resources cloud (Cloudflare R2) and open-source alternatives (MinIO), the once “cost-effective” object storage services have lost their value for money, becoming as much a “cash cow” as EBS. In our “Mudslide of Cloud Computing” series, we’ve already delved into the cost structure of cloud-based EC2 compute power, EBS disks, and RDS databases. Today, let’s examine the anchor of cloud services—object storage.
From Cost Reduction to Cash Cow
Object Storage, also known as Simple Storage Service (abbreviated as S3, hereafter referred to as S3), was once the flagship product for its cost-effectiveness in the cloud.
A decade ago, hardware was expensive; managing to use a bunch of several hundred GB mechanical hard drives to build a reliable storage service and design an elegant HTTP API was a significant barrier. Therefore, compared to those “enterprise IT” storage solutions, the cost-effective S3 seemed very attractive.
However, the field of computer hardware is quite unique—with a Moore’s Law that sees prices halve every two years. AWS S3 has indeed seen several price reductions in its history. The table below organizes the main post-reduction prices for S3 standard tier storage, along with the reference unit prices for enterprise-grade HDD/SSD in the corresponding years.
Date | $/GB·Month | ¥/TB·5年 | HDD ¥/TB | SSD ¥/TB |
---|---|---|---|---|
2006.03 | 0.150 | 63000 | 2800 | |
2010.11 | 0.140 | 58800 | 1680 | |
2012.12 | 0.095 | 39900 | 420 | 15400 |
2014.04 | 0.030 | 12600 | 371 | 9051 |
2016.12 | 0.023 | 9660 | 245 | 3766 |
2023.12 | 0.023 | 9660 | 105 | 280 |
Price Ref | EBS | All Upfront | Buy NVMe SSD | Price Ref |
---|---|---|---|---|
S3 Express | 0.160 | 67200 | DHH 12T | 1400 |
EBS io2 | 0.125 + IOPS | 114000 | Shannon 3.2T | 900 |
It’s not hard to see that the unit price of S3’s standard tier dropped from $0.15/GB·month in 2006 to $0.023/GB·month in 2023, a reduction to 15% of the original or a 6-fold decrease, which sounds good. However, when you consider that the price of the underlying HDDs for S3 dropped to 3.7% of their original, a whopping 26-fold decrease, the trickery becomes apparent.
The resource premium multiple of S3 increased from 7 times in 2006 to 30 times today!
In 2023, when we re-calculate the costs, it’s clear that the value for money of storage services like S3/EBS has changed dramatically—cloud computing power EC2 compared to building one’s own servers has a 5 – 10 times premium, while cloud block storage EBS has a several dozen to a hundred times premium compared to local SSDs. Cloud-based S3 compared to ordinary HDDs also has about a thirty times resource premium. And as the anchor of cloud services, the prices of S3/EBS/EC2 are passed on to almost all cloud services—completely stripping cloud services of their cost-effectiveness.
The core issue here is: The price of hardware resources drops exponentially according to Moore’s Law, but the savings are not passed through the cloud providers’ intermediary layer to the end-user service prices. To not advance is to go back; failing to reduce prices at the pace of Moore’s Law is effectively a price increase. Taking S3 as an example, over the past decade, cloud providers’ S3 has nominally reduced prices by 6-fold, but hardware resources have become 26 times cheaper, so how should we view this pricing now?
Cost, Performance, Throughput
Despite the high premiums of cloud services, if it represents an irreplaceable best choice, the use by high-value, price-insensitive top-tier customers is not affected even with a high premium and low cost-effectiveness. However, it’s not just about cost; the performance of storage hardware also follows Moore’s Law. Over time, building one’s own S3 has started to show a significant advantage in performance.
The performance of S3 is mainly reflected in its throughput. AWS S3’s 100 Gb/s network provides up to 12.5 GB/s of access bandwidth, which is indeed commendable. Such throughput was undoubtedly impressive a decade ago. However, today, an enterprise-level 12 TB NVMe SSD, costing less than $20,000, can achieve 14 GB/s of read/write bandwidth. 100Gb switches and network cards have also become very common, making such performance readily achievable.
In another key performance indicator, “latency,” S3 is significantly outperformed by local disks. The first-byte latency of the S3 standard tier is quite poor, ranging between 100-200ms according to the documentation. Of course, AWS has just launched “High-Performance S3” — S3 Express One Zone at 2023 Re:Invent, which can achieve millisecond-level latency, addressing this shortcoming. However, it still falls far short of the NVMe’s 4K random read/write latency of 55µs/9µs.
S3 Express’s millisecond-level latency sounds good, but when we compare it to a self-built NVMe SSD + MinIO setup, this “millisecond-level” performance is embarrassingly inadequate. Modern NVMe SSDs achieve 4K random read/write latencies of 55µs/9µs. With a thin layer of MinIO forwarding, the first-byte output latency is at least an order of magnitude better than S3 Express. If standard tier S3 is used for comparison, the performance gap widens to three orders of magnitude.
The gap in performance is just one aspect; the cost is even more crucial. The price of standard tier S3 has remained unchanged since 2016 at $0.023/GB·month, equating to 161 RMB/TB·month. The higher-tier S3 Express One Zone is an order of magnitude more expensive, at $0.16/GB·month, equating to 1120 RMB/TB·month. For reference, we can compare the data from “Reclaiming the Dividends of Computer Hardware” and “Is Cloud Storage a Cash Cow?”:
Factor | Local PCI-E NVME SSD | Aliyun ESSD PL3 | AWS io2 Block Express |
---|---|---|---|
Cost | 14.5 RMB/TB·month (5-year amortization / 3.2T MLC) 5-year warranty, ¥3000 retail | 3200 RMB/TB·month (Original price 6400 RMB, monthly package 4000 RMB) 50% discount for 3-year upfront payment | 1900 RMB/TB·month Best discount for the largest specification 65536GB 256K IOPS |
Capacity | 32TB | 32 TB | 64 TB |
IOPS | 4K random read: 600K ~ 1.1M 4K random write 200K ~ 350K | Max 4K random read: 1M | 16K random IOPS: 256K |
Latency | 4K random read: 75µs 4K random write: 15µs | 4K random read: 200µs | Random IO: 500µs (assumed 16K) |
Reliability | UBER < 1e-18, equivalent to 18 nines MTBF: 2 million hours 5DWPD, over three years | Data reliability: 9 nines Storage and Data Reliability | Durability: 99.999%, 5 nines (0.001% annual failure rate) io2 details |
SLA | 5-year warranty, direct replacement for issues | Aliyun RDS SLA Availability 99.99%: 15% monthly fee 99%: 30% monthly fee 95%: 100% monthly fee | Amazon RDS SLA Availability 99.95%: 15% monthly fee 99%: 25% monthly fee 95%: 100% monthly fee |
e local NVMe SSD example used here is the Shannon DirectIO G5i 3.2TB MLC particle enterprise-level SSD, extensively used by us. Brand new, disassembled retail pieces are priced at ¥2788 (available on Xianyu!), translating to a monthly cost per TB of 14.5 RMB over 60 months (5 years). Even if we calculate using the Inspur list price of ¥4388, the cost per TB·month is only 22.8. If this example is not convincing enough, we can refer to the 12 TB Gen4 NVMe enterprise-level SSDs purchased by DHH in “Is It Time to Give Up on Cloud Computing?”, priced at $2390 each, with a cost per TB·month of exactly 23 RMB.
So, why are NVMe SSDs, which outperform by several orders of magnitude, priced an order of magnitude cheaper than standard tier S3 (161 vs 23) and two orders of magnitude cheaper than S3 Express (1120 vs 23 x3)? If I were to use such hardware (even accounting for triple replication) + open-source software to build an object storage service, could I achieve a three orders of magnitude improvement in cost-effectiveness? (This doesn’t even account for the reliability advantages of SSDs over HDDs.)
It’s worth noting that the comparison above focuses solely on the cost of storage space. The cost of data transfer in and out of object storage is also a significant expense, with some tiers charging not for storage but for retrieval traffic. Additionally, there are issues of SSD reliability compared to HDD, data sovereignty in the cloud, etc., which will not be elaborated further here.
Of course, cloud providers might argue that their S3 service is not just about storage hardware resources but an out-of-the-box service. This includes software intellectual property and maintenance labor costs. They may claim that self-hosting has a higher failure rate, is riskier, and incurs significant operational labor costs. Unfortunately, these arguments might have been valid in 2006 or 2013, but they seem rather ludicrous today.
Self-Hosted OSS S3
A decade and a half ago, the vast majority of users lacked the IT capabilities to self-host, and there were no mature open-source alternatives to S3. Users could tolerate the premium for this high technology. However, as various cloud providers and IDCs began offering object storage, and even open-source free object storage solutions like MinIO emerged, the market shifted from a seller’s to a buyer’s market. The logic of value pricing turned into cost pricing, and the unyielding premium on resources naturally faced scrutiny — what extra value does it actually provide to justify such significant costs?
Proponents of cloud storage claim that moving to the cloud is cheaper, simpler, and faster than self-hosting. For individual webmasters and small to medium-sized internet companies within the cloud’s suitable spectrum, this claim certainly holds. If your data scale is only a few dozen GBs, or you have some medium-scale overseas business and CDN needs, I would not recommend jumping on the bandwagon to self-host object storage. You should instead turn to Cloudflare and use R2 — perhaps the best solution.
However, for the truly high-value, medium-to-large scale customers who contribute the majority of revenue, these value propositions do not necessarily hold. If you are primarily using local storage for TB/PB scale data, then you should seriously consider the cost and benefits of self-hosting object storage services — which has become very simple, stable, and mature with open-source software. Storage service reliability mainly depends on disk redundancy: apart from occasional hard drive failures (HDD AFR 1%, SSD 0.2-0.3%), requiring you (or a maintenance service provider) to replace parts, there isn’t much additional burden.
If the open-source Ceph, which mixes EBS/S3 capabilities, is considered somewhat operationally complex and not fully feature-complete; then the fully S3-compatible object storage service MinIO can be considered truly plug-and-play — a standalone binary without external dependencies, requiring only a few configuration parameters to quickly set up, transforming server disk arrays into a standard local S3-compatible service, even integrating AWS’s AK/SK/IAM compatible implementations!
From an operational management perspective, the operational complexity of Redis is an order of magnitude lower than PostgreSQL, and MinIO’s operational complexity is another order of magnitude lower than Redis. It’s so simple that I could spend less than a week to integrate MinIO deployment/monitoring as an add-on into our open-source PostgreSQL RDS solution, serving as an optional central backup storage repository.
At Tantan, several MinIO clusters were built and maintained this way: holding 25PB of data, possibly the largest scale of MinIO deployment in China at the time. How many people were needed for maintenance? Just a fraction of one operations engineer’s working time was enough, and the overall self-hosting cost was about half of the cloud list price. Practice proves the point, if anyone tells you that self-hosting object storage is difficult and expensive, you can try it yourself — in just a few hours, these sales FUD tactics will fall apart.
For object storage services, the cloud’s three core value propositions: “cheaper, simpler, faster”, the “simpler” part may not hold up, “cheaper” has turned the other way, probably only leaving “faster” — indeed, no one can beat the cloud on this point. You can apply for PB-level storage services across all regions of the world in less than a minute on the cloud, which is amazing! However, you also have to pay a high premium, several times to dozens of times over for this privilege.
Therefore, for object storage services, among the cloud’s three core value propositions: “cheaper, simpler, faster”, the “simpler” part may not hold, and “cheaper” has gone in the opposite direction, probably only leaving “faster” — indeed, no one can beat the cloud on this point. You can indeed apply for PB-level storage services across all regions of the world in less than a minute on the cloud, which is amazing! However, you also have to pay a high premium for this privilege, several to dozens of times over. For enterprises of a certain scale, compared to the cost of operations increasing several times, waiting a couple of weeks or making a one-time capital investment is not a big deal.
Summary
The exponential decline in hardware costs has not been fully reflected in the service prices of cloud providers, turning public clouds from universally beneficial infrastructure into monopolistic profit centers.
However, the tide is turning. Hardware is becoming interesting again, and cloud providers can no longer indefinitely hide this advantage. The savvy are starting to crunch the numbers, and the bold have already taken action. Pioneers like Elon Musk and DHH have fully realized this, moving away from the cloud to reap millions in financial benefits, enjoy performance gains, and gain more operational independence. More and more people are beginning to notice this, following in the footsteps of these pioneers to make the wise choice and reclaim their hardware dividends.
References
[1]
2006: https://aws.amazon.com/cn/blogs/aws/amazon_s3/
[2]
2010: http://aws.typepad.com/aws/2010/11/what-can-i-say-another-amazon-s3-price-reduction.html
[3]
2012: http://aws.typepad.com/aws/2012/11/amazon-s3-price-reduction-december-1-2012.html
[5]
2016: https://aws.amazon.com/ru/blogs/aws/aws-storage-update-s3-glacier-price-reductions/
[6]
2023: https://aws.amazon.com/cn/s3/pricing
[7]
First-byte Latency: https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance.html
[8]
Storage & Reliability: https://help.aliyun.com/document_detail/476273.html
[9]
EBS io2 Spec: https://aws.amazon.com/cn/blogs/storage/achieve-higher-database-performance-using-amazon-ebs-io2-block-express-volumes/
[10]
Aliyun RDS SLA: https://terms.aliyun.com/legal-agreement/terms/suit_bu1_ali_cloud/suit_bu1_ali_cloud201910310944_35008.html?spm=a2c4g.11186623.0.0.270e6e37n8Exh5
[11]
Amazon RDS SLA: https://d1.awsstatic.com/legal/amazonrdsservice/Amazon-RDS-Service-Level-Agreement-Chinese.pdf
Reclaim Hardware Bonus from the Cloud
Hardware is interesting again, with the AI wave fueling a GPU frenzy. However, the intrigue isn’t limited to GPUs —— developments in CPUs and SSDs remain largely unnoticed by the majority of devs. A whole generation of developers is obscured by cloud hype and marketing noise.
Hardware performance is skyrocketing, and costs are plummeting, turning the public cloud from a decent service into a cash cow. These shifts necessitate a reevaluation of technology and software. It’s time to get back to basics and reclaim the hardware dividend that belongs to users.
Revolutionary New Hardware
If you’ve been unaware of computer hardware for a while, the specs of the latest gear might shock you.
Once, Intel’s CPUs saw marginal gains each generation, allowing old PCs to remain viable year after year. However, CPU evolution has recently accelerated, with significant leaps in core counts and regular 20-30% improvements in single-core performance.
For instance, AMD’s recently released desktop CPU, the Threadripper 7995WX, is a performance beast with 96 cores and 192 threads at speeds ranging from 2.5 to 5.1 GHz, retailing on Amazon for $5600. The server CPU series, EPYC, includes the previous generation EPYC Genoa 9654, with 96 cores and 192 threads at speeds ranging from 2.4 to 3.55 GHz, priced at $3940 on Amazon. This year’s new EPYC 9754 goes even further, offering a single CPU with 128 cores and 256 threads. This means a standard dual-socket server could have an astonishing 512 threads! If we consider cloud computing/container platforms’ 500% overselling rate, this could virtualize more than two thousand five hundred 1-core virtual machines.
Take AMD’s new Threadripper 7995WX, a 96-core, 192-thread behemoth clocked at 2.5 to 5.1 GHz, retailing at $5600 on Amazon. On the server side, the previous-gen EPYC Genoa 9654 offered 96 cores and 192 threads at 2.4 to 3.55 GHz, priced at $3940. The latest EPYC 9754 pushes boundaries further with 128 cores and 256 threads, enabling a dual-socket server to boast a staggering 512 vCPUs — enough to oversubscribe and virtualize over 2500+ 1c VMs at 500% oversell rates.
SSD/NVMe storage has seen even more dramatic generational jumps. Speeds have escalated from Gen2’s 500MB/s to Gen3’s 2.5GB/s, and now Gen4’s mainstream 7GB/s, with Gen5 at 14GB/s emerging. Gen6 is released, with Gen7 on the horizon, as I/O bandwidth doubles exponentially.
Consider the Gen5 NVMe SSD: KIOXIA CM7, which offers 128K sequential read bandwidth of 14GB/s and write bandwidth of 7GB/s, with 4K random IOPS of 2.7M for reads and 600K for writes. It’s doubtful that many database software packages can fully utilize this insane read/write bandwidth and IOPS. For context, HDD generally fluctuates around a read/write bandwidth of a few hundred MB/s, with 7200 RPM drives achieving IOPS in the tens and 15000 RPM drives in the low hundreds. NVMe SSDs’ I/O bandwidth rates are already four orders of magnitude better than HDD — 10,000x better.
In terms of 4K RankRW response times, which are of utmost concern for databases, NVMe SSDs have achieved 55/9 µs for reads & writes since several generations ago. Meanwhile, HDD seek time usually measures around 10ms, with an average rotational latency depending on speed between 2ms and 4ms, meaning a single I/O operation typically takes over a dozen milliseconds. Comparing dozens of milliseconds to 55/9µs, NVMe SSDs are three orders of magnitude faster than mechanical disks — 1000x faster!
Besides computing and storage, network hardware has also improved significantly. 40GbE and 100GbE are now commonplace — a 100GbE optical module network card costs just about several hundred dollars, offering a network transfer speed of 12 GB/s, a hundred times faster than the gigabit network cards familiar to older programmers.
1.6T Ethernet is already on the radar.
As computing, storage, and networking hardware evolve exponentially following Moore’s Law, hardware becomes fascinating again. But the real intrigue lies in how these technological leaps will impact the world.
Distributed Losing Favor
The landscape of hardware has undergone monumental changes over the past decade, rendering many assumptions in the software realm obsolete, such as those concerning distributed databases.
Today, the capabilities of a standard x86 server have reached astonishing levels. An intriguing draft calculation roughly demonstrates the feasibility of running the entirety of Twitter on a modern server (Dell PowerEdge R740xd, with 32 cores, 768GB RAM, 6TB NVMe, 360TB HDD, GPU slots, and 4x40Gbe networking). While you wouldn’t do this for production redundancy (using two or three servers might be safer), this calculation indeed raises an interesting question — Is scalability still a real issue?
At the turn of the century, an Apache server could barely handle a few hundred concurrent requests. The best software struggled with tens of thousands of concurrent connections — the industry’s notorious C10K problem, where handling several thousand connections was seen as a feat. However, with the advent of Epoll and Nginx in 2003/2004, “high concurrency” ceased to be a challenge — any novice who learned to configure Nginx could achieve what masters only dreamed of a few years earlier. “Customers in the Eyes of Cloud Providers: Poor, Idle, and Lacking Love” details this evolution.
As of 2023, the impact of hardware has once again revolutionized distributed databases: Scalability, much like the C10K problem two decades ago, has become a solved issue of the past. If a service like Twitter can run on a single server, then 99.xxxx+% of services will not exceed the scalability needs that such a server can provide throughout their entire lifecycle. This means the once-prized “distributed” technology boasted by big tech companies has become redundant with the advent of new hardware — Anyone still discussing partitioning, distributed databases, and high concurrency on a massive scale is living in the past, having ceased to learn and grow over the past decade.
The foundational assumption of distributed databases — that a single machine’s processing power is insufficient to support the load — has been shattered by contemporary hardware. Centralized databases don’t even need to lift a finger; their capacity automatically scales to meet demands that most services will never reach in their lifetime. Some might argue that services like WeChat or Alipay require distributed databases, but setting aside whether distributed databases are the only solution, assuming these rare extreme cases can sustain a couple of distributed TP kernels, distributed OLTP databases will no longer be the main direction for database development as network hardware becomes more cost-effective than disk storage. Alibaba’s choice of a distributed path for its database progeny, OceanBase, versus its current preference for centralized architectures with PolarDB, serves as a telling example.
In the realm of big data analytics (OLAP), distributed systems might have been essential, but now even this is questionable — for the majority of companies, their entire database volume could potentially be processed on a single server. Scenarios that previously demanded “distributed data warehouses” might now be addressed by running PostgreSQL or DuckDB on a modern server. True, large internet companies may have PB/ZB-level data scenarios, but even for core internet services, it’s rare for a single service’s data volume to exceed a single machine’s processing limits. For instance, BreachForums’ recent leak of 5 years of Taobao shopping records (2015-2020, 8.2 billion records) compressed to 600GB, and similarly, the data sizes for JD.com’s billions and Pinduoduo’s 14.5 billion records are on par. Moreover, companies like Dell or Inspur offer PB-level NVMe all-flash storage cabinets, capable of housing the entire U.S. insurance industry’s historical data and analysis tasks in a single box for less than $200,000.
The core trade-off of distributed databases is “quality for quantity,” sacrificing functionality, performance, complexity, and reliability in exchange for greater data capacity and throughput. However, “premature optimization is the root of all evil,” and designing for unnecessary scale is futile. If scale is no longer an issue, then sacrificing other attributes for unneeded capacity, incurring extra complexity and costs, becomes utterly pointless.
Cost of Owning Servers
With new hardware boasting such powerful performance, what about the cost? Moore’s Law states that every 18 to 24 months, processor performance doubles while the cost halves. Compared to a decade ago, new hardware is not only more powerful but also cheaper.
In “DHH: The Cloud-Exit Odyssey”, we have a fresh example of a public procurement. DHH and 37 Signals purchased a batch of physical machines for their move away from the cloud in 2023: they bought 20 servers from Dell, totaling 4,000-core vCPUs, 7,680GB of memory, and 384TB of NVMe storage, among other things, for a total expenditure of $500,000.
The specific configuration of each server was as follows: Dell R7625 server, 192 vCPU / 384 GB memory: two AMD EPYC 9454 processors (48 cores/96 threads, 2.75 GHz), equipped with 2x vCPU memory (16 x 32GB memory), a 12 TB NVMe Gen4 SSD, plus other components, at a cost of $20,000 per server ($\19,980), amortized over five years is $333 per month.
To verify the validity of this quote, we can directly refer to the retail market prices of the core components: the CPU is the EPYC 9654, with a current retail price of $3,725 each, totaling $7,450 for two. 32GB DDR5 ECC server memory, retailing at $128 per stick, 16 sticks total $2,048. Enterprise-grade NVMe SSD 12TB, priced at $2,390. 100G optical module 100GbE QSFP28 priced at $1,804, adding up to around $13,692, plus the server barebone, power supply, system disk, RAID card, fans, etc., the total price of $20,000 is reasonable.
Of course, a server is not just made up of CPUs, memory, hard drives, and network cards; we also need to consider the total cost of ownership. Data centers need to provide these machines with electricity, rack space, and networking, maintenance fees, and reserve redundancy (prices in the US). After accounting for these costs, they are basically on par with the monthly hardware amortization cost, so the comprehensive monthly cost of a server with 192C / 384G / 12T NVMe storage is $666, which is about $3.5 / vCPU·month.
I believe DHH’s figures are accurate, as at Tantan, from day one, we chose to build our IDC / resource cloud, and after several rounds of cost optimization, we achieved a similar price — our database server model (Dell R730, 64 vCPU / 512GB / 3.2 TB NVMe SSD) plus the cost of manpower, maintenance, electricity, and internet, the TCO was about $10,400 , with a core-month cost of $2.71 / vCPU·month. Here is a table for reference on the price per unit of computing power:
BM / EC2 / ECS Specs | $ / vCPU·Month |
---|---|
DHH’s self-hosted vCPU·Month Price (192C 384G) | 3.5 |
TanTan IDC self-hosted DC (64C 384G) | 2.7 |
TanTan container platform (container, oversold 500%) | 1.0 |
Aliyun ECS family c 2x (us-east-1), hourly | 23.8 |
Aliyun ECS family c 2x (us-east-1), monthly | 18.2 |
Aliyun ECS family c 2x (us-east-1), yearly | 15.6 |
Aliyun ECS family c 2x (us-east-1), 3-year upfront | 10.0 |
Aliyun ECS family c 2x (us-east-1), 5-year upfront | 6.9 |
AWS C5N.METAL 96C (On Demand) | 35.0 |
AWS C5N.METAL 96C (1y Reserve, All Upfront) | 20.6 |
AWS C5N.METAL 96C (3y Reserve, All Upfront) | 12.8 |
Cloud Rental Price
For reference, we can compare the cost to leasing compute power from AWS EC2. A monthly expense of $666 can get you the best specification without storage, the c6in.4xlarge on-demand instance (16 cores, 32G x 3.5GHz); while the on-demand cost for a c7a.metal instance, which has similar compute and memory specification (192C/384G) but excludes EBS storage, is $7,200 per month, which is 10.8 times the comprehensive local build cost; the lowest monthly cost for a 3-year reserved instance can go down to $2,756, which is still 4.1 times the cost of building your own server. If we calculate the cost per core-month, the price for the majority of AWS EC2 instances ranges between $10 ~ $30, which is roughly a hundred to a few hundred dollars, leading us to a rough conclusion: the unit price of cloud compute is 5 to 10 times that of self-built solutions.
Note that these prices do not include the hundredfold premium for EBS cloud storage. In “Is Cloud Disk a Rip-off?”, we’ve already detailed the cost comparison between enterprise SSDs and equivalent cloud disks. Here, we can provide two updated reference values: the cost per TB-month for the 12TB enterprise NVMe SSD purchased by DHH (with a five-year warranty) is 24 CNY, while the cost per TB-month for a retail Samsung consumer SSD 990Pro on GameStop can reach an astonishing 6.6 CNY… Meanwhile, the corresponding block storage TB-month cost on AWS and Alibaba Cloud, even after full discounts, is respectively 1,900 and 3,200 CNY. In the most outrageous scenarios (6400 vs 6.6), the premium can even reach a thousandfold. However, a more apples-to-apples comparison results in: the unit price of cloud block storage is 100 to 200 times that of self-built solutions (and the performance is not as good as local disks).
EC2 and EBS prices can be considered the anchor of cloud service pricing, for example, the premium rate of cloud databases RDS that mainly use EC2 and EBS compared to local self-built solutions fluctuates between the two, depending on your storage usage: the unit price of cloud databases is dozens of times that of self-built solutions. For more details, refer to “Is Cloud Database a Dumb Tax?”.
Of course, we can’t deny the cost advantages of public clouds for micro instances and startups — for example, the nano instances on public clouds used to patch together 12C, 0.52G configurations really can be offered to users at a core-month cost of a few dollars. In “Exploiting Alibaba Cloud ECS for a Digital Homestead,” I recommended exploiting Alibaba Cloud’s Double 11 virtual machine deals for this reason. For instance, a 2C 2G server’s compute cost, calculated with a 500% overselling, is 84 CNY per year, and the cost for 40G cloud disk storage, calculated with triple replication, is about 20 CNY per year, making the annual cost for these two parts over a hundred CNY. This doesn’t include the cost of a public IP or the more valuable 3M bandwidth (for example, if you could fully utilize 3M bandwidth 24 hours a day, that would mean 32G of data per day, costing about 25 CNY). The list price for such cloud servers is ¥1500 per year, so the 99¥ price allowing for a low-cost renewal for four years indeed can be considered a loss-leading benefit.
However, when your business can no longer be covered by a bunch of micro instances, you really should do the math again carefully: in several key examples, the cost of cloud services is extremely high — whether for large physical machine databases, large NVMe storage, or just the latest and fastest compute. The rental price for such production-grade resources is so high — that a few months’ rent could equal the cost of buying it outright. In such cases, you really should just buy the donkey!
Reclaim Hardware Bonus from Cloud
I still remember on April 1, 2019, when the domestic value-added tax in China was officially reduced from 16% to 13%, Apple’s official website immediately implemented a price reduction across the board, with the maximum discount reaching 8% — several iconic iPhone models were reduced by 500 yuan, effectively passing the tax cut benefits to the users. However, many manufacturers chose to turn a deaf ear and maintain their original prices, pocketing the benefits for themselves — why would they want to distribute this newfound wealth to the less fortunate? A similar situation has occurred in the cloud computing domain — the exponential decrease in hardware costs has not been fully reflected in the service prices of cloud providers, gradually turning public cloud from a universally accessible infrastructure into a monopolistic cash cow.
In the old days, developers had to deeply understand hardware to write code. However, the older generation of engineers and programmers, who had a keen sense of hardware, have mostly retired, changed positions, moved into management, or stopped learning. Subsequently, as operating systems and compiler technologies advanced and various VM programming languages emerged, software no longer needed to concern itself with how hardware executed instructions. Then came services like EC2, which encapsulated computing power, and S3/EBS, which encapsulated storage, leading applications to interact with HTTP APIs rather than system calls. Software and hardware diverged into two separate realms, each going its own way. An entire new generation of engineers grew up in the cloud environment, shielded from an understanding of computer hardware.
However, things are beginning to change, with hardware becoming interesting again, and cloud providers are unable to perpetually hide this dividend — the wise are starting to crunch the numbers, and the brave have already taken action. Pioneers like Musk and DHH have fully recognized this, moving off the cloud and onto solid ground — directly generating tens of millions of dollars in financial benefits, with returns in performance, and gaining more independence in operations. More and more people will come to the same realization, following in the footsteps of these trailblazers to make the wise choice of reclaiming their hardware bonus from the cloud.
FinOps: Endgame Cloud-Exit
At the SACC 2023 FinOps session, I fiercely criticized cloud vendors. This is a transcript of my speech, introducing the ultimate FinOps concept — cloud-exit and its best practice.
TL; DR
Misaligned FinOps Focus: Total Cost = Unit Price x Quantity. FinOps efforts are centered around reducing the quantity of wasted resources, deliberately ignoring the elephant in the room — cloud resource unit price.
Public Cloud as a Slaughterhouse: Attract customers with cheap EC2/S3, then slaughter them with EBS/RDS. The cost of cloud compute is five times that of in-house, while block storage costs can be over a hundred times more, making it the ultimate cost assassin.
The Endgame of FinOps is Going Off-Cloud: For enterprises of a certain scale, the cost of in-house IDC is around 10% of the list price of cloud services. Going off-cloud is both the endgame of orthodox FinOps and the starting point of true FinOps.
In-house Capabilities Determine Bargaining Power: Users with in-house capabilities can negotiate extremely low discounts even without going off-cloud, while companies without in-house capabilities can only pay a high “no-expert tax” to public cloud vendors.
Databases are Key to In-House Transition: Migrating stateless applications on K8S and data warehouses is relatively easy. The real challenge is building databases in-house without compromising quality and security.
Misaligned FinOps Focus
Compared to the amount of waste, the unit price of resources is the key point.
The FinOps Foundation states that FinOps focuses on “cloud cost optimization”. However, we believe that emphasizing only public clouds deliberately narrows this concept — the focus should be on the cost control and optimization of all resources, not just those on public clouds — including “hybrid clouds” and “private clouds”. Even without using public clouds, some FinOps methodologies can still be applied to the entire K8S cloud-native ecosystem. Because of this, many involved in FinOps are led astray — their focus is limited to reducing the quantity of cloud resource waste, neglecting a very important issue: unit price.
Total cost depends on two factors: Quantity ✖️ Unit Price. Compared to quantity, unit price might be the key to cost reduction and efficiency improvement. Previous speakers mentioned that about 1/3 of cloud resources are wasted on average, which is the optimization space for FinOps. However, if you use non-elastic services on public clouds, the unit price of the resources you use is already several to dozens of times higher, making the wasted portion negligible in comparison.
In the first stop of my career, I experienced a FinOps movement firsthand. Our BU was among the first internal users of Alibaba Cloud and also where the “data middle platform” concept originated. Alibaba Cloud sent over a dozen engineers to help us migrate to the cloud. After migrating to ODPS, our annual storage and computing costs were 70 million, and through FinOps methods like health scoring, we did optimize and save tens of millions. However, running the same services with an in-house Hadoop suite in our data center cost less than 10 million annually — savings are good, but they’re nothing compared to the multiplied resource costs.
As cost reduction and efficiency become the main theme, cloud repatriation is becoming a trend. Alibaba, the inventor of the middle platform concept, has already started dismantling its own middle platform. Yet, many companies are still falling into the trap of the slaughterhouse, repeating the old path of cloud migration - cloud repatriation.
Public Clouds: A Slaughterhouse in Disguise
Attract customers with cheap EC2/S3, then slaughter them with EBS/RDS pricing.
The elasticity touted by public clouds is designed for their business model: low startup costs, exorbitant maintenance costs. Low initial costs lure users onto the cloud, and its good elasticity can adapt to business growth at any time. However, once the business stabilizes, vendor lock-in occurs, making it difficult to switch providers without incurring high costs, turning maintenance into a financial nightmare for users. This model is colloquially known as a pig slaughterhouse.
To slaughter pigs, one must first raise them. You can’t catch a wolf without putting your child at risk. Hence, for new users, startups, and small businesses, public clouds offer sweet deals, even at a loss, to make noise and attract business. New users get first-time discounts, startups receive free or half-price credits, and there’s a sophisticated pricing strategy. Taking AWS RDS pricing as an example, the mini models with 1 or 2 cores are priced at just a few dollars per core per month, translating to a few hundred yuan per year (excluding storage). This is an affordable option for those needing a low-usage database for small data storage.
However, even a slight increase in configuration leads to a magnitude increase in the price per core month, skyrocketing to twenty or thirty to a hundred dollars, sometimes even more — not to mention the shocking EBS prices. Users may only realize what has happened when they see the exorbitant bill suddenly appearing.
Compared to in-house solutions, the price of cloud resources is generally several to more than ten times higher, with a rent-to-buy ratio ranging from a few days to several months. For example, the cost of a physical server core month in an IDC, including all costs for network, electricity, maintenance, and IT staff, is about 19 yuan. Using a K8S container private cloud, the cost of a virtual core month is only 7 yuan.
In contrast, the price per core month for Alibaba Cloud’s ECS is a couple of hundred yuan, and for AWS EC2, it’s two to three hundred yuan. If you “don’t care about elasticity” and prepay for three years, you can usually get a discount of about 50-60%. But no matter how you calculate it, the price difference between cloud computing power and local in-house computing power is there and significant.
The pricing of cloud storage resources is even more outrageous. A common 3.2 TB enterprise-grade NVMe SSD, with its formidable performance, reliability, and cost-effectiveness, has a wholesale price of just over ¥3000, significantly outperforming older storage solutions. However, for the same storage on the cloud, providers dare to charge 100 times the price. Compared to direct hardware procurement, the cost of AWS EBS io2 is 120 times higher, while Alibaba Cloud’s ESSD PL3 is 200 times higher.
Using a 3.2TB enterprise-grade PCI-E SSD card as a benchmark, the rent-to-buy ratio on AWS is 15 days, while on Alibaba Cloud it’s less than 5 days, meaning renting for this period allows you to purchase the entire disk outright. If you opt for a three-year prepaid purchase on Alibaba Cloud with the maximum discount of 50%, the three-year rental fee could buy over 120 similar disks.
The price markup ratio of cloud databases (RDS) falls between that of cloud disks and cloud servers. For example, using RDS for PostgreSQL on AWS, a 64C / 256GB RDS costs $25,817 per month, equivalent to 180,000 yuan per month. One month’s rent is enough to purchase two servers with much better performance for in-house use. The rent-to-buy ratio is not even a month; renting for just over ten days would be enough to purchase an entire server.
Any rational enterprise user can see the folly in this: If the procurement of such services is not for short-term, temporary needs, then it definitely qualifies as a significant financial misjudgment.
Payment Model | Price | Cost Per Year (¥10k) |
---|---|---|
Self-hosted IDC (Single Physical Server) | ¥75k / 5 years | 1.5 |
Self-hosted IDC (2-3 Server HA Cluster) | ¥150k / 5 years | 3.0 ~ 4.5 |
Alibaba Cloud RDS (On-demand) | ¥87.36/hour | 76.5 |
Alibaba Cloud RDS (Monthly) | ¥42k / month | 50 |
Alibaba Cloud RDS (Yearly, 15% off) | ¥425,095 / year | 42.5 |
Alibaba Cloud RDS (3-year, 50% off) | ¥750,168 / 3 years | 25 |
AWS (On-demand) | $25,817 / month | 217 |
AWS (1-year, no upfront) | $22,827 / month | 191.7 |
AWS (3-year, full upfront) | $120k + $17.5k/month | 175 |
AWS China/Ningxia (On-demand) | ¥197,489 / month | 237 |
AWS China/Ningxia (1-year, no upfront) | ¥143,176 / month | 171 |
AWS China/Ningxia (3-year, full upfront) | ¥647k + ¥116k/month | 160.6 |
Comparing the costs of self-hosting versus using a cloud database:
Method | Cost Per Year (¥10k) |
---|---|
Self-hosted Servers 64C / 384G / 3.2TB NVME SSD 660K IOPS (2-3 servers) | 3.0 ~ 4.5 |
Alibaba Cloud RDS PG High-Availability pg.x4m.8xlarge.2c, 64C / 256GB / 3.2TB ESSD PL3 | 25 ~ 50 |
AWS RDS PG High-Availability db.m5.16xlarge, 64C / 256GB / 3.2TB io1 x 80k IOPS | 160 ~ 217 |
RDS pricing compared to self-hosting, see “Is Cloud Database an idiot Tax?”
Any meaningful cost reduction and efficiency increase initiative cannot ignore this issue: if there’s potential to slash resource prices by 50% to 200%, then focusing on a 30% reduction in waste is not a priority. As long as your main business is on the cloud, traditional FinOps is like scratching an itch through a boot — migrating off the cloud is the focal point of FinOps.
The Endgame of FinOps is Exiting from the Cloud
The well-fed do not understand the pangs of hunger, human joys and sorrows are not universally shared.
I spent five years at Tantan — a Nordic-style internet startup founded by a Swede. Nordic engineers have a characteristic pragmatism. When it comes to choosing between cloud and on-premise solutions, they are not swayed by hype or marketing but rather make decisions based on quantitative analysis of pros and cons. We meticulously calculated the costs of building our own infrastructure versus using the cloud — the straightforward conclusion was that the total cost of on-premise solutions (including labor) generally fluctuates between 10% to 100% of the list price for cloud services.
Thus, from its inception, Tantan chose to build its own infrastructure. Apart from overseas compliance businesses, CDN, and a very small amount of elastic services using public clouds, the main part of our operations was entirely hosted in IDC-managed data centers. Our database was not small, with 13K cores for PostgreSQL and 12K cores for Redis, 4.5 million QPS, and 300TB of unique transactional data. The annual cost for these two parts was less than 10 million yuan: including salaries for two DBAs, one network engineer, network and electricity, managed hosting fees, and hardware amortized over five years. However, for such a scale, if we were to use cloud databases, even with significant discounts, the starting cost would be between 50 to 60 million yuan, not to mention the even more expensive big data sector.
However, digitalization in enterprises is phased, and different companies are at different stages. For many internet companies, they have reached the stage where they are fully engaged with building cloud-native K8S ecosystems. At this stage, focusing on resource utilization, mixed online and offline deployments, and reducing waste are reasonable demands and directions where FinOps should concentrate its efforts. Yet, for the vast majority of enterprises outside the digital realm, the urgent need is not reducing waste but lowering the unit cost of resources — Dell servers can be discounted by 50%, IDC virtual machines by 50%, and even cloud services can be heavily discounted. Are these companies still paying the list price, or even facing several times the markup in rebates? A great many companies are still being severely exploited due to information asymmetry and lack of capability.
Enterprises should evaluate their scale and stage, assess their business, and weigh the pros and cons accordingly. For small-scale startups, the cloud can indeed save a lot of manpower costs, which is very attractive — but please be cautious not to be locked in by vendors due to the convenience offered. If your annual cloud expenditure has already exceeded 1 million yuan, it’s time to seriously consider the benefits of descending from the cloud — many businesses do not require the elasticity for massive concurrent spikes or training AI models. Paying a premium for temporary/sudden needs or overseas compliance is reasonable, but paying several times to tens of times more for unnecessary elasticity is wasteful. You can keep the truly elastic parts of your operations on the public cloud and transfer those parts that do not require elasticity to IDCs. Just by doing this, the cost savings could be astonishing.
Descending from the cloud is the ultimate goal of traditional FinOps and the starting point of true FinOps.
Self-Hosting Matters
“To seek peace through struggle is to preserve peace; to seek peace through compromise is to lose peace.”
When the times are favorable, the world joins forces; when fortune fades, even heroes lose their freedom: During the bubble phase, it was easy to disregard spending heavily in the cloud. However, in an economic downturn, cost reduction and efficiency become central themes. An increasing number of companies are realizing that using cloud services is essentially paying a “no-expert tax” and “protection money”. Consequently, a trend of “cloud repatriation” has emerged, with 37Signals’ DHH being one of the most notable proponents. Correspondingly, the revenue growth rate of major cloud providers worldwide has been experiencing a continuous decline, with Alibaba Cloud’s revenue even starting to shrink in the first quarter of 2023.
《“Why Cloud Computing Hasn’t Yet Hit Its Stride in Earning Profits”》
The underlying trend is the emergence of open-source alternatives, breaking down the technical barriers of public clouds; the advent of resource clouds/IDC2.0, offering a cost-effective alternative to public cloud resources; and the release of technical talents from large layoffs, along with the future of AI models, giving every industry the opportunity to possess the expert knowledge and capability required for self-hosting. Combining these trends, the combination of IDC2.0 + open-source self-hosting is becoming increasingly competitive: Bypassing the public cloud intermediaries and working directly with IDCs is clearly a more economical choice.
Public cloud providers are not incapable of engaging in the business of selling IDC resources profitably. Given their higher level of expertise compared to IDCs, they should, in theory, leverage their technological advantages and economies of scale to offer cheaper resources than IDC self-hosting. However, the harsh reality is that resource clouds can offer users virtual machines at a 80% discount, while public clouds cannot. Even considering the exponential growth law of Moore’s Law in the storage and computing industry, public clouds are actually increasing their prices every year!
Well-informed major clients, especially those capable of migrating at will, can indeed negotiate for 80% off the list prices with public clouds, a feat unlikely for smaller clients — in this sense, clouds are essentially subsidizing large clients by bleeding small and medium-sized clients dry. Cloud vendors offer massive discounts to large clients while fleecing small and medium-sized clients and developers, completely contradicting the original intention and vision of cloud computing.
Clouds lure in users with low initial prices, but once users are deeply locked in, the slaughter begins — the previously discussed discounts and benefits disappear at each renewal. Escaping the cloud entails a significant cost, leaving users in a dilemma between a rock and a hard place, forced to continue paying protection money.
However, for users with the capability to self-host, capable of flexibly moving between multi-cloud and on-premises hybrid clouds, this is not an issue: The trump card in negotiations is the ability to go off-cloud or migrate to another cloud at any time. This is more effective than any argument — as the saying goes, “To seek peace through struggle is to preserve peace; to seek peace through compromise is to lose peace.” The extent of cost reduction depends on your bargaining power, which in turn depends on your ability to self-host.
Self-hosting might seem daunting, but it is not difficult for those who know how. The key is addressing the core issues of resources and capabilities. In 2023, due to the emergence of resource clouds and open-source alternatives, these issues have become much simpler than before.
In terms of resources, IDC and resource clouds have solved the problem adequately. The aforementioned IDC self-hosting doesn’t mean buying land and building data centers from scratch but directly using the hosting services of resource clouds/IDCs — you might only need a network engineer to plan the network, with other maintenance tasks managed by the provider.
If you prefer not to hassle, IDCs can directly sell you virtual machines at 20% of the list price, or you can rent a physical server with 64C/256G for a couple thousand a month; whether renting an entire data center or just a single colocation space, it’s all feasible. A retail colocation space with comprehensive services can be settled for about five thousand a year, running a K8S or virtualization on a couple of hundred-core physical servers, why bother with flexible ECS?
FinOps Leads to CLoud-Exit
Building your own infrastructure comes with the added perk of extreme FinOps—utilizing out-of-warranty or even second-hand servers. Servers are typically depreciated over three to five years, yet it’s not rare to see them operational for eight to ten years. This contrasts with cloud services, where you’re just consuming resources; owning your server translates to tangible assets, making any extended use essentially a gain.
For instance, a new 64-core, 256GB server could cost around $7,000, but after a year or two, the price for such “electronic waste” drops to merely $400. By replacing the most failure-prone components with brand new enterprise-grade 3.2TB NVMe SSDs (costing $390), you could secure the entire setup for just $800.
In such scenarios, your vCPU·Month price could plummet to less than $0.15, a figure legendary in the gaming industry, where server costs can dip to mere cents. With Kubernetes (K8S) orchestration and database high-availability switching, reliability can be assured through parallel operation of multiple such servers, achieving an astonishing cost-efficiency ratio.
In terms of capability, with the emergence of sufficiently robust open-source alternatives, the difficulty of self-hosting has dramatically decreased compared to a few years ago.
For example, Kubernetes/OpenStack/SealOS are open-source alternatives to cloud providers’ EC2/ECS/VPS management software; MinIO/Ceph aim to replace S3/OSS; while Pigsty and various database operators serve as open-source substitutes for RDS cloud database management. There’s a plethora of open-source software available for effectively utilizing these resources, along with numerous commercial entities offering transparently priced support services.
Your operations should ideally converge to using just virtual machines and object storage, the lowest common denominator across all cloud providers. Ideally, all applications should run on Kubernetes, which can operate in any environment—be it a cloud-hosted K8S, ECS, dedicated servers, or your own data center. External states like database backups and big data warehouses should be managed with compute-storage separation, using MinIO/S3 storage.
Such a CloudNative tech stack theoretically enables operation and flexible migration across any resource environment, thus avoiding vendor lock-in and maintaining control. This allows you to either significantly cut costs by moving off the cloud or leverage it to negotiate discounts with public cloud providers.
However, self-hosting isn’t without risks, with RDS representing a major potential vulnerability.
Database: The Biggest Risk Factor
Cloud databases may not be the most expensive line item, but they are definitely the most deeply locked-in and challenging to migrate.
Quality, security, efficiency, and cost represent different levels of a hierarchical pyramid of needs. The goal of FinOps is to reduce costs and increase efficiency without compromising quality and security.
Stateless apps on K8S or offline big data platforms pose little fatal risk when migrating. Especially if you have already achieved big data compute-storage separation and stateless app cloud-native transformation, moving these components is generally not too troublesome. The former can afford a few hours of downtime, while the latter can be updated through blue-green deployments and canary releases. The database, serving as the working memory, is prone to major issues when migrated.
Most IT system architectures are centered around the database, making it the key risk point in cloud migration, particularly with OLTP databases/RDS. Many users hesitate to move off the cloud and self-host due to the lack of reliable database services — traditional Kubernetes Operators don’t fully replicate the cloud database experience: hosting OLTP databases on K8S/containers with EBS is not yet a mature practice.
There’s a growing demand for a viable open-source alternative to RDS, and that’s precisely what we aim to address: enabling users to establish a local RDS service in any environment that matches or exceeds cloud databases — Pigsty, a free open-source alternative to RDS PG. It empowers users to effectively utilize PostgreSQL, the world’s most advanced and successful database.
Pigsty is a non-profit, open-source software powered by community love. It offers a ready-to-use, feature-rich PostgreSQL distribution with automatic high availability, PITR, top-tier monitoring systems, Infrastructure as Code, cloud-based Terraform templates, local Vagrant sandbox for one-click installation, and SOP manuals for various operations, enabling quick RDS self-setup without needing a professional DBA.
Although Pigsty is a database distribution, it enables users to practice ultimate FinOps—running production-level PostgreSQL RDS services anywhere (ECS, resource clouds, data center servers, or even local laptop VMs) at almost pure resource cost. It turns the cost of cloud database capabilities from being proportional to marginal resource costs to nearly zero in fixed learning costs.
Perhaps it’s the socialist ethos of Nordic companies that nurtures such pure free software. Our goal isn’t profit but to promote a philosophy: to democratize the expertise of using the advanced open-source database PostgreSQL for everyone, not just cloud monopolies. Cloud providers monopolize open-source expertise and roles, exploiting free open-source software, and we aim to break this monopoly—Freedom is not free. You shouldn’t concede the world to those you despise but rather overturn their table.
This is the essence of FinOps—empowering users with viable alternatives and the ability to self-host, thus negotiating with cloud providers from a position of strength.
References
[1] 云计算为啥还没挖沙子赚钱?
[2] 云数据库是不是智商税?
[3] 云SLA是不是安慰剂?
[4] 云盘是不是杀猪盘?
[5] 范式转移:从云到本地优先
[6] 杀猪盘真的降价了吗?
[8] 垃圾腾讯云CDN:从入门到放弃
[9] 云RDS:从删库到跑路
[10] 分布式数据库是伪需求吗?
[11] 微服务是不是个蠢主意?
[12] 更好的开源RDS替代:Pigsty
SLA: Placebo or Insurance?
In the world of cloud computing, Service Level Agreements (SLAs) are seen as a cloud provider’s commitment to the quality of its services. However, a closer examination of these SLAs reveals that they might not offer the safety net one might expect: you might think you’ve insured your database for peace of mind, but in reality, you’ve bought a placebo that provides emotional comfort rather than actual coverage.
Insurance Policy or Placebo?
One of the reasons many users opt for cloud services is for the “safety net” they supposedly provide, often referring to the SLA when asked what this “safety net” entails. Cloud experts liken purchasing cloud services to buying insurance: certain failures might never occur throughout many companies’ lifespans, but should they happen, the consequences could be catastrophic. In such cases, a cloud service provider’s SLA is supposed to act as this safety net. Yet, when we actually review these SLAs, we find that this “policy” isn’t as useful as one might think.
Data is the lifeline of many businesses, and cloud storage serves as the foundation for nearly all data storage on the public cloud. Let’s take cloud storage services as an example. Many cloud service providers boast of their cloud storage services having nine nines of data reliability [1]. However, upon examining their SLAs, we find that these crucial promises are conspicuously absent from the SLAs [2].
What is typically included in the SLAs is the service’s availability. Even this promised availability is superficial, paling in comparison to the core business reliability metrics in the real world, with compensation schemes that are practically negligible in the face of common downtime losses. Compared to an insurance policy, SLAs more closely resemble placebos that offer emotional value.
Subpar Availability
The key metric used in cloud SLAs is availability. Cloud service availability is typically represented as the proportion of time a resource can be accessed from the outside, usually over a one-month period. If a user cannot access the resource over the Internet due to a problem on the cloud provider’s end, the resource is considered unavailable/down.
Taking the industry benchmark AWS as an example, most of its services use a similar SLA template [3]. The SLA for a single virtual machine on AWS is as follows [4]. This means that in the best-case scenario, if an EC2 instance on AWS is unavailable for less than 21 minutes in a month (99.9% availability), AWS compensates nothing. In the worst-case scenario, only when the unavailability exceeds 36 hours (95% availability) can you receive a 100% credit return.
Instance-Level SLA
For each individual Amazon EC2 instance (“Single EC2 Instance”), AWS will use commercially reasonable efforts to make the Single EC2 Instance available with an Instance-Level Uptime Percentage of at least 99.5%, in each case during any monthly billing cycle (the “Instance-Level SLA”). In the event any Single EC2 Instance does not meet the Instance-Level SLA, you will be eligible to receive a Service Credit as described below.
Instance-Level Uptime Percentage Service Credit Percentage Less than 99.5% but equal to or greater than 99.0% 10% Less than 99.0% but equal to or greater than 95.0% 30% Less than 95.0% 100% Note: In addition to the Instance-Level SLA, AWS will not charge you for any Single EC2 Instance that is Unavailable for more than six minutes of a clockhour. This applies automatically and you do not need to request credit for any such hour with more than six minutes of Unavailability.
For some internet companies, a 15-minute service outage is enough to jeopardize bonuses, and a 30-minute outage is sufficient for leadership changes. The actual availability of core systems running most of the time might have five nines, six nines, or even infinite nines. Cloud providers, incubated from major internet companies, using such inferior availability metrics is indeed disappointing.
What’s more outrageous is that these compensations are not automatically provided to you after a failure occurs. Users are required to measure downtime themselves, submit evidence for claims within a specific timeframe (usually two months), and request compensation to receive any. This requires users to collect monitoring metrics and log evidence to negotiate with cloud providers, and the compensation returned is not in cash but in vouchers/duration compensations — meaning virtually no real loss for the cloud providers and no actual value for the users, with almost no chance of compensating for the actual losses incurred during service interruptions.
Is the “Safety Net” Meaningful?
For businesses, a “safety net” means minimizing losses as much as possible when failures occur. Unfortunately, SLAs are of little help in this regard.
The impact of service unavailability on business varies by industry, time, and duration. A brief outage of a few seconds to minutes might not significantly affect general industries, however, long-term outages (several hours to several days) can severely affect revenue and reputation.
In the Uptime Institute’s 2021 data center survey [5], several of the most severe outages cost respondents an average of nearly $1 million, not including the worst 2% of cases, which suffered losses exceeding $40 million.
However, SLA compensations are a drop in the ocean compared to these business losses. Taking the t4g.nano
virtual machine instance in the us-east-1
region as an example, priced at about $3 per month. If the unavailability is less than 7 hours and 18 minutes (99% monthly availability), AWS will pay 10% of the monthly cost of that virtual machine, a total compensation of 30 cents. If the virtual machine is unavailable for less than 36 hours (95% availability within a month), the compensation is only 30% — less than $1. Only if the unavailability exceeds a day and a half, can users receive a full refund for the month — $3. Even if compensating for thousands of instances, this is virtually negligible compared to the losses.
In contrast, the traditional insurance industry genuinely provides coverage for its customers. For instance, SF Express charges 1% of the item’s value for insurance, but if the item is lost, they compensate the full amount. Similarly, commercial health insurance costing tens of thousands yearly can cover millions in medical expenses. “Insurance” in this industry truly means you get what you pay for.
Cloud service providers charge far more than the BOM for their expensive services (see: “Are Public Clouds a Pig Butchering Scam?” [7]), but when service issues arise, their so-called “safety net” compensation is merely vouchers, which is clearly unfair.
Vanished Reliability
Some people use cloud services to “pass the buck,” absolving themselves of responsibility. However, some critical responsibilities cannot be shifted to external IT suppliers, such as data security. Users might tolerate temporary service unavailability, but the damage caused by lost or corrupted data is often unacceptable. Blindly trusting exaggerated promises can have severe consequences, potentially a matter of life and death for a startup.
In storage products offered by various cloud providers, it’s common to see promises of nine nines of reliability [1], implying a one in a billion chance of data loss when using cloud disks. Examining actual reports on cloud provider disk failure rates [6] casts doubt on these figures. However, as long as providers are bold enough to make, stand by, and honor such claims, there shouldn’t be an issue.
Yet, upon examining the SLAs of various cloud providers, this promise disappears! [2]
In the 2018 sensational case “The Disaster Tencent Cloud Brought to a Startup Company!” [8], the startup believed the cloud provider’s promises and stored data on server hard drives, only to encounter what was termed “silent disk errors”: “Years of accumulated data were lost, causing nearly ten million yuan in losses.” Tencent Cloud expressed apologies to the company, willing to compensate the actual expenses incurred on Tencent Cloud totaling 3,569 yuan and, with the aim of helping the business quickly recover, promised an additional compensation of 132,900 yuan
What Exactly is an SLA
Having discussed this far, proponents of cloud services might play their last card: although the post-failure “safety net” is a facade, what users need is to avoid failures as much as possible. According to the SLA promises, there is a 99.99% probability of avoiding failures, which is of the most value to users.
However, SLAs are deliberately confused with the actual reliability of the service: Users should not consider SLAs as reliable indicators of service availability — not even as accurate records of past availability levels. For providers, an SLA is not a real commitment to reliability or a track record but a marketing tool designed to convince buyers that the cloud provider can host critical business applications.
The UPTIME INSTITUTE’s annual data center failure analysis report shows that many cloud services perform below their published SLAs. The analysis of failures in 2022 found that efforts to contain the frequency of failures have failed, and the cost and consequences of failures are worsening [9].
Key Findings Include:
- High outage rates haven’t changed significantly. One in five organizations report experiencing a “serious” or “severe” outage (involving significant financial losses, reputational damage, compliance breaches and in some severe cases, loss of life) in the past three years, marking a slight upward trend in the prevalence of major outages. According to Uptime’s 2022 Data Center Resiliency Survey, 80% of data center managers and operators have experienced some type of outage in the past three years – a marginal increase over the norm, which has fluctuated between 70% and 80%.
- The proportion of outages costing over $100,000 has soared in recent years. Over 60% of failures result in at least $100,000 in total losses, up substantially from 39% in 2019. The share of outages that cost upwards of $1 million increased from 11% to 15% over that same period.
- Power-related problems continue to dog data center operators. Power-related outages account for 43% of outages that are classified as significant (causing downtime and financial loss). The single biggest cause of power incidents is uninterruptible power supply (UPS) failures.
- Networking issues are causing a large portion of IT outages. According to Uptime’s 2022 Data Center Resiliency Survey, networking-related problems have been the single biggest cause of all IT service downtime incidents – regardless of severity – over the past three years. Outages attributed to software, network and systems issues are on the rise due to complexities from the increasing use of cloud technologies, software-defined architectures and hybrid, distributed architectures.
- The overwhelming majority of human error-related outages involve ignored or inadequate procedures. Nearly 40% of organizations have suffered a major outage caused by human error over the past three years. Of these incidents, 85% stem from staff failing to follow procedures or from flaws in the processes and procedures themselves.
- External IT providers cause most major public outages. The more workloads that are outsourced to external providers, the more these operators account for high-profile, public outages. Third-party, commercial IT operators (including cloud, hosting, colocation, telecommunication providers, etc.) account for 63% of all publicly reported outages that Uptime has tracked since 2016. In 2021, commercial operators caused 70% of all outages.
- Prolonged downtime is becoming more common in publicly reported outages. The gap between the beginning of a major public outage and full recovery has stretched significantly over the last five years. Nearly 30% of these outages in 2021 lasted more than 24 hours, a disturbing increase from just 8% in 2017.
- Public outage trends suggest there will be at least 20 serious, high-profile IT outages worldwide each year. Of the 108 publicly reported outages in 2021, 27 were serious or severe. This ratio has been fairly consistent since the Uptime Intelligence team began cataloging major outages in 2016, indicating that roughly one-fourth of publicly recorded outages each year are likely to be serious or severe.
Rather than compensating users, SLAs are more of a “punishment” for cloud providers when their service quality fails to meet standards. The deterrent effect of the punishment depends on the certainty and severity of the punishment. Monthly duration/voucher compensations impose virtually no real cost on cloud providers, making the severity of the punishment nearly zero; compensation also requires users to submit evidence and get approval from the cloud provider, meaning the certainty is not high either.
Compared to experts and engineers who might lose bonuses and jobs due to failures, the punishment of SLAs for cloud providers is akin to a slap on the wrist. If the punishment is meaningless, then cloud providers have no incentive to improve service quality. When users encounter problems, they can only wait and die, and the service attitude towards small customers, in particular, is arrogantly dismissive compared to self-built/third-party service companies.
More subtly, cloud providers have absolute power over the SLA agreement: they reserve the right to unilaterally adjust and revise SLAs and inform users of their effectiveness, leaving users with only the right to choose not to use the service, without any participation or choice. As a default “take-it-or-leave-it” clause, it blocks any possibility for users to seek meaningful compensation.
Thus, SLAs are not an insurance policy against losses for users. In the worst-case scenario, it’s an unavoidable loss; at best, it provides emotional comfort. Therefore, when choosing cloud services, we need to be vigilant and fully understand the contents of their SLAs to make informed decisions.
Reference
【1】阿里云 ESSD云盘
【2】阿里云 SLA 汇总页
【3】AWS SLA 汇总页
【7】公有云是不是杀猪盘
EBS: Pig Slaughter Scam
We already answer the question: Is RDS an Idiot Tax?. But when compared to the hundredfold markup of public cloud block storage, cloud databases seem almost reasonable. This article uses real data to reveal the true business model of public cloud: “Cheap” EC2/S3 to attract customers, and fleece with “Expensive” EBS/RDS. Such practices have led public clouds to diverge from their original mission and vision.
TL; DR
EC2/S3/EBS pricing serves as the anchor for all cloud services pricing. While the pricing for EC2/S3 might still be considered reasonable, EBS pricing is outright extortionate. The best block storage services offered by public cloud providers are essentially on par with off-the-shelf PCI-E NVMe SSDs in terms of performance specifications. Yet, compared to direct hardware purchases, the cost of AWS EBS can be up to 60 times higher, and Alibaba Cloud’s ESSD can reach up to 100 times higher.
Why is there such a staggering markup for plug-and-play disk hardware? Cloud providers fail to justify the exorbitant prices. When considering the design and pricing models of other cloud storage services, there’s only one plausible explanation: The high markup on EBS is a deliberately set barrier, intended to fleece cloud database customers.
With EC2 and EBS serving as the pricing anchors for cloud databases, their markups are several and several dozen times higher, respectively, thus supporting the exorbitant profit margins of cloud databases. However, such monopolistic profits are unsustainable: the impact of IDC 2.0/telecom/national cloud on IaaS; private cloud/cloud-native/open source as alternatives to PaaS; and the tech industry’s massive layoffs, AI disruption, and the impact of China’s low labor costs on cloud services (through IT outsourcing/shared expertise). If public clouds continue to adhere to their current fleecing model, diverging from their original mission of providing fundamental compute and storage infrastructure, they will inevitably face increasingly severe competition and challenges from the aforementioned forces.
WHAT a Scam!
When you use a microwave at home to heat up a ready-to-eat braised chicken rice meal costing 10 yuan, you wouldn’t mind if a restaurant charges you 30 yuan for microwaving the same meal and serving it to you, considering the costs of rent, utilities, labor, and service. But what if the restaurant charges you 1000 yuan for the same dish, claiming: “What we offer is not just braised chicken rice, but a reliable and flexible dining service”, with the chef controlling the quality and cooking time, pay-per-portion so you get exactly as much as you want, pay-per-need so you get as much as you eat, with options to switch to hot and spicy soup or skewers if you don’t feel like chicken, claiming it’s all worth the price. Wouldn’t you feel the urge to give the owner a piece of your mind? This is exactly what’s happening with block storage!
With hardware technology evolving rapidly, PCI-E NVMe SSDs have reached a new level of performance across various metrics. A common 3.2 TB enterprise-grade MLC SSD offers incredible performance, reliability, and value for money, costing less than ¥3000, significantly outperforming older storage solutions.
Aliyun ESSD PL3 and our own IDC’s procured PCI-E NVMe SSDs come from the same supplier. Hence, their maximum capacity and IOPS limitations are identical. AWS’s top-tier block storage solution, io2 Block Express, also shares similar specifications and metrics. Cloud providers’ highest-end storage solutions utilize these 32TB single cards, leading to a maximum capacity limit of 32TB (64TB for AWS), which suggests a high degree of hardware consistency underneath.
However, compared to direct hardware procurement, the cost of AWS EBS io2 is up to 120 times higher, while Aliyun’s ESSD PL3 is up to 200 times higher. Taking a 3.2TB enterprise-grade PCI-E SSD card as a reference, the ratio of on-demand rental to purchase price is 15 days on AWS and less than 5 days on Aliyun, meaning you could own the entire disk after renting it for this duration. If you opt for a three-year prepaid purchase on Aliyun, taking advantage of the maximum 50% discount, the rental fees over three years could buy over 120 disks of the same model.
Is that SSD made of gold ?
Cloud providers argue that block storage should be compared to SAN rather than local DAS, which should be compared to instance storage (Host Storage) on the cloud. However, public cloud instance storage is generally ephemeral (Ephemeral Storage), with data being wiped once the instance is paused/stopped【7,11】, making it unsuitable for serious production databases. Cloud providers themselves advise against storing critical data on it. Therefore, the only viable option for database storage is EBS block storage. Products like DBFS, which have similar performance and cost metrics to EBS, are also included in this category.
Ultimately, users care not about whether the underlying hardware is SAN, SSD, or HDD; the real priorities are tangible metrics: latency, IOPS, reliability, and cost. Comparing local options with the best cloud solutions poses no issue, especially when the top-tier cloud storage uses the same local disks.
Some “experts” claim that cloud block storage is stable and reliable, offering multi-replica redundancy and error correction. In the past, Share Everything databases required SAN storage, but many databases now operate on a Share Nothing architecture. Redundancy is managed at the database instance level, eliminating the need for triple-replica storage redundancy, especially since enterprise-grade disks already possess strong self-correction capabilities and safety redundancy (UBER < 1e-18). With redundancy already in place at the database level, multi-replica block storage becomes an unnecessary waste for databases. Even if cloud providers did use two additional replicas for redundancy, it would only reduce the markup from 200x to 66x, without fundamentally changing the situation.
“Experts” also liken purchasing “cloud services” to buying insurance: “An annual failure rate of 0.02% may seem negligible to most, but a single incident can be devastating, with the cloud provider offering a safety net.” This sounds appealing, but a closer look at cloud providers’ EBS SLAs reveals no guarantees for reliability. ESSD cloud disk promotions mention 9 nines of data reliability, but such claims are conspicuously absent from the SLAs. Cloud providers only guarantee availability, and even then, the guarantees are modest, as illustrated by the AWS EBS SLA:
In plain language: if the service is down for a day and a half in a month (95% availability), you get a 100% coupon for that month’s service fee; seven hours of downtime (99%) yields a 30% coupon; and a few minutes of downtime (99.9% for a single disk, 99.99% for a region) earns a 10% coupon. Cloud providers charge a hundredfold more, yet offer mere coupons as compensation for significant outages. Applications that can’t tolerate even a few minutes of downtime wouldn’t benefit from these meager coupons, reminiscent of the past incident, “The Disaster Tencent Cloud Brought to a Startup Company.”
SF Express offers 1% insurance for parcels, compensating for losses with real money. Annual commercial health insurance plans costing tens of thousands can cover millions in expenses when issues arise. The insurance industry should not be insulted; it operates on a principle of value for money. Thus, an SLA is not an insurance policy against losses for users. At worst, it’s a bitter pill to swallow without recourse; at best, it provides emotional comfort.
The premium charged for cloud database services might be justified by “expert manpower,” but this rationale falls flat for plug-and-play disks, with cloud providers unable to explain the exorbitant price markup. When pressed, their engineers might only say:
“We’re just following AWS; that’s how they designed it.”
WHY so Pricing?
Even engineers within public cloud services may not fully grasp the rationale behind their pricing strategies, and those who do are unlikely to share. However, this does not prevent us from deducing the reasoning behind such decisions from the design of the product itself.
Storage follows a de facto standard: POSIX file system + block storage. Whether it’s database files, images, audio, or video, they all use the same file system interface to store data on disks. But AWS’s “divine intervention” splits this into two distinct services: S3 (Simple Storage Service) and EBS (Elastic Block Store). Many “followers” have imitated AWS’s product design and pricing model, yet the logic and principles behind such actions remain elusive.
Aliyun EBS OSS Compare
S3, standing for Simple Storage Service, is a simplified alternative to file system/storage: sacrificing strong consistency, directory management, and access latency for the sake of low cost and massive scalability. It offers a simple, high-latency, high-throughput flat KV storage service, detached from standard storage services. This aspect, being cost-effective, serves as a major allure for users to migrate to the cloud, thus becoming possibly the only de facto cloud computing standard across all public cloud providers.
Databases, on the other hand, require low latency, strong consistency, high quality, high performance, and random read/write block storage, which is encapsulated in the EBS service: Elastic Block Store. This segment becomes the forbidden fruit for cloud providers: reluctant to let users dabble. Because EBS serves as the pricing anchor for RDS — the barrier and moat for cloud databases.
For IaaS providers, who make their living by selling resources, there’s not much room for price inflation, as costs can be precisely calculated against the BOM. However, for PaaS services like cloud databases, which include “services,” labor/development costs are significantly marked up, allowing for astronomical pricing and high profits. Despite storage, computing, and networking making up half of the revenue for domestic public cloud IaaS, their gross margin stands only at 15% to 20%. In contrast, public cloud PaaS, represented by cloud databases, can achieve gross margins of 50% or higher, vastly outperforming the IaaS model.
If users opt to use IaaS resources (EC2/EBS) to build their own databases, it represents a significant loss of profit for cloud providers. Thus, cloud providers go to great lengths to prevent this scenario. But how is such a product designed to meet this need?
Firstly, instance storage, which is best suited for self-hosted databases, must come with various restrictions: instances that are hibernated/stopped are reclaimed and wiped, preventing serious production database services from running on EC2’s built-in disks. Although EBS’s performance and reliability might slightly lag behind local NVMe SSD storage, it’s still viable for database operations, hence the restrictions: but not without giving users an option, hence the exorbitant pricing! As compensation, the secondary, cheaper, and massive storage option, S3, can be priced more affordably to lure customers.
Of course, to make customers bite, some cloud computing KOLs promote the accompanying “public cloud-native” philosophy: “EC2 is not suitable for stateful applications. Please store state in S3 or RDS and other managed services, as these are the ‘best practices’ for using our cloud.”
These four points are well summarized, but what public clouds will not disclose is the cost of these “best practices.” To put these four points in layman’s terms, they form a carefully designed trap for customers:
Dump ordinary files in S3! (With such cost-effective S3, who needs EBS?)
Don’t build your own database! (Forget about tinkering with open-source alternatives using instance storage)
Please deeply use the vendor’s proprietary identity system (vendor lock-in)
Faithfully contribute to the cloud database! (Once users are locked in, the time to “slaughter” arrives)
HOW to Do that
The business model of public clouds can be summarized as: Attract customers with cheap EC2/S3, make a killing with EBS/RDS.
To slaughter the pig, you first need to raise it. No pains, no gains. Thus, for new users, startups, and small-to-medium enterprises, public clouds spare no effort in offering sweeteners, even at a loss, to drum up business. New users enjoy a significant discount on their first order, startups receive free or half-price credits, and the pricing strategy is subtly crafted.
Taking AWS RDS pricing as an example, the unit price for mini models with 1 to 2 cores is only a few dollars per core per month, which translates to three to four hundred yuan per year (excluding storage): If you need a low-usage database for minor storage, this might be the most straightforward and affordable choice.
However, as soon as you slightly increase the configuration, even by just a little, the price per core per month jumps by orders of magnitude, reaching twenty to a hundred dollars, with the potential to skyrocket by dozens of times — and that’s before the doubling effect of the astonishing EBS prices. Users only realize what has happened when they are faced with a suddenly astronomical bill.
For instance, using RDS for PostgreSQL on AWS, the price for a 64C / 256GB db.m5.16xlarge RDS for one month is $25,817, which is equivalent to about 180,000 yuan per month. The monthly rent is enough for you to buy two servers with even better performance and set them up on your own. The rent-to-buy ratio doesn’t even last a month; renting for just over ten days is enough to buy the whole server for yourself.
Payment Model | Price | Cost Per Year (¥10k) |
---|---|---|
Self-hosted IDC (Single Physical Server) | ¥75k / 5 years | 1.5 |
Self-hosted IDC (2-3 Server HA Cluster) | ¥150k / 5 years | 3.0 ~ 4.5 |
Alibaba Cloud RDS (On-demand) | ¥87.36/hour | 76.5 |
Alibaba Cloud RDS (Monthly) | ¥42k / month | 50 |
Alibaba Cloud RDS (Yearly, 15% off) | ¥425,095 / year | 42.5 |
Alibaba Cloud RDS (3-year, 50% off) | ¥750,168 / 3 years | 25 |
AWS (On-demand) | $25,817 / month | 217 |
AWS (1-year, no upfront) | $22,827 / month | 191.7 |
AWS (3-year, full upfront) | $120k + $17.5k/month | 175 |
AWS China/Ningxia (On-demand) | ¥197,489 / month | 237 |
AWS China/Ningxia (1-year, no upfront) | ¥143,176 / month | 171 |
AWS China/Ningxia (3-year, full upfront) | ¥647k + ¥116k/month | 160.6 |
Comparing the costs of self-hosting versus using a cloud database:
Method | Cost Per Year (¥10k) |
---|---|
Self-hosted Servers 64C / 384G / 3.2TB NVME SSD 660K IOPS (2-3 servers) | 3.0 ~ 4.5 |
Alibaba Cloud RDS PG High-Availability pg.x4m.8xlarge.2c, 64C / 256GB / 3.2TB ESSD PL3 | 25 ~ 50 |
AWS RDS PG High-Availability db.m5.16xlarge, 64C / 256GB / 3.2TB io1 x 80k IOPS | 160 ~ 217 |
RDS pricing compared to self-hosting, see “Is Cloud Database an idiot Tax?”
Any rational business user can see the logic here: **If the purchase of such a service is not for short-term, temporary needs, then it is definitely considered a major financial misstep.
This is not just the case with Relational Database Services / RDS, but with all sorts of cloud databases. MongoDB, ClickHouse, Cassandra, if it uses EC2 / EBS, they are all doing the same. Take the popular NoSQL document database MongoDB as an example:
This kind of pricing could only come from a product manager without a decade-long cerebral thrombosis
Five years is the typical depreciation period for servers, and with the maximum discount, a 12-node (64C 512G) configuration is priced at twenty-three million. The minor part of this quote alone could easily cover the five-year hardware maintenance, plus you could afford a team of MongoDB experts to customize and set up as you wish.
Fine dining restaurants charge a 15% service fee on dishes, and users can understand and support this reasonable profit margin. If cloud databases charge a few tens of percent on top of hardware resources for service fees and elasticity premiums (let’s not even start on software costs for cloud services that piggyback on open-source), it can be justified as pricing for productive elements, with the problems solved and services provided being worth the money.
However, charging several hundred or even thousands of percent as a premium falls into the category of destructive element distribution: cloud providers bank on the fact that once users are onboard, they have no alternatives, and migration would incur significant costs, so they can confidently proceed with the slaughter! In this sense, the money users pay is not for the service, but rather a compulsory levy of a “no-expert tax” and “protection money”.
The Forgotten Vision
Facing accusations of “slaughtering the pig,” cloud vendors often defend themselves by saying: “Oh, what you’re seeing is the list price. Sure, it’s said to be a minimum of 50% off, but for major customers, there are no limits to the discounts.” As a rule of thumb: the cost of self-hosting fluctuates around 5% to 10% of the current cloud service list prices. If such discounts can be maintained long-term, cloud services become more competitive than self-hosting.
Professional and knowledgeable large customers, especially those capable of migrating at any time, can indeed negotiate steep discounts of up to 80% with public clouds, while smaller customers naturally lack bargaining power and are unlikely to secure such deals.
However, cloud computing should not turn into ‘calculating clouds’: if cloud providers can only offer massive discounts to large enterprises while “shearing the sheep” and “slaughtering the pig” when dealing with small and medium-sized customers and developers, they are essentially robbing the poor to subsidize the rich. This practice completely contradicts the original intent and vision of cloud computing and is unsustainable in the long run.
When cloud computing first emerged, the focus was on the cloud hardware / IaaS layer: computing power, storage, bandwidth. Cloud hardware represents the founding story of cloud vendors: to make computing and storage resources as accessible as utilities, with themselves playing the role of infrastructure providers. This is a compelling vision: public cloud vendors can reduce hardware costs and spread labor costs through economies of scale; ideally, while keeping a profit for themselves, they can offer storage and computing power that is more cost-effective and flexible than IDC prices.
On the other hand, cloud software (PaaS / SaaS) follows a fundamentally different business logic: cloud hardware relies on economies of scale to optimize overall efficiency and earn money through resource pooling and overselling, which represents a progress in efficiency. Cloud software, however, relies on sharing expertise and charging service fees for outsourced operations and maintenance. Many services on the public cloud are essentially wrappers around free open-source software, relying on monopolizing expertise and exploiting information asymmetry to charge exorbitant insurance fees, which constitutes a transfer of value.
Unfortunately, for the sake of obfuscation, both cloud software and cloud hardware are branded under the “cloud” title. Thus, the narrative of cloud computing mixes breaking resource monopolies with establishing expertise monopolies: it combines the idealistic glow of democratizing computing power across millions of households with the greed of monopolizing and unethically profiting from it.
Public cloud providers that abandon platform neutrality and their original intent of being infrastructure providers, indulging in PaaS / SaaS / and even application layer profiteering, will sink in a bottomless competition.
Where to Go
Monopolistic profits vanish as competition emerges, plunging public cloud providers into a grueling battle.
At the infrastructure level, telecom operators, state-owned clouds, and IDC 1.5/2.0 have entered the fray, offering highly competitive IaaS services. These services include turnkey network and electricity hosting and maintenance, with high-end servers available for either purchase and hosting or direct rental at actual prices, showing no fear in terms of flexibility.
IDC 2.0’s new server rental model: Actual price rental, ownership transfers to the user after a full term
On the software front, what once were the technical barriers of public clouds, various management software / PaaS solutions, have seen excellent open-source alternatives emerge. OpenStack / Kubernetes have replaced EC2, MinIO / Ceph have taken the place of S3, and on RDS, open-source alternatives like Pigsty and various K8S Operators have appeared.
The whole “cloud-native” movement, in essence, is the open-source ecosystem’s response to the challenge of public cloud “freeloading”: users and developers have created a complete set of local-priority public cloud open-source alternatives to avoid being exploited by public cloud providers.
The term “CloudNative” is aptly named, reflecting different perspectives: public clouds see it as being “born on the public cloud,” while private clouds think of it as “running cloud-like services locally.” Ironically, the biggest proponents of Kubernetes are the public clouds themselves, akin to a salesman crafting his own noose.
In the context of economic downturn, cost reduction and efficiency gains have become the main theme. Massive layoffs in the tech sector, coupled with the future large-scale impact of AI on intellectual industries, will release a large amount of related talent. Additionally, the low-wage advantage in our era will significantly alleviate the scarcity and high cost of building one’s own talent pool. Labor costs, in comparison to cloud service costs, offer much more advantage.
Considering these trends, the combination of IDC2.0 and open-source self-building is becoming increasingly competitive: for organizations with a bit of scale and talent reserves, bypassing public clouds as middlemen and directly collaborating with IDCs is clearly a more economical choice.
Staying true to the original mission is essential. Public clouds do an admirable job at the cloud hardware / IaaS level, except for being outrageously expensive, there aren’t many issues, and the offerings are indeed solid. If they could return to their original vision and truly excel as providers of basic infrastructure, akin to utilities, selling resources might not offer high margins, but it would allow them to earn money standing up. Continuing down the path of exploitation, however, will ultimately lead customers to vote with their feet.
References
【2】云数据库是不是智商税
【3】范式转移:从云到本地优先
【7】AWS实例存储
【9】AWS EBS SLA
【12】阿里云:云盘概述
【13】图说块存储与云盘
【15】云计算为啥还没挖沙子赚钱?
RDS: The Idiot Tax
Winter is coming, and big tech companies are starting to lay off employees, entering a mode of cost reduction and efficiency increase. As public cloud databases, often referred to as the “slaughterhouse knives” of the cloud, continue to face scrutiny, one might wonder if their story can go on.
Recently, a thought-provoking article by DHH, the co-founder of Basecamp & HEY, has sparked widespread discussion. Its core message can be summed up in one line:
“We spend $500,000 a year on cloud databases (RDS/ES). Do you have any idea how many powerful servers that kind of money could buy?
We’re moving off the cloud. Goodbye!”
So, how many powerful servers can $500,000 buy?
Absurd Pricing
Sharpening the knives for the sheep and pigs
Let’s rephrase the question: how much do servers and RDS (Relational Database Service) cost?
Taking the physical server model we heavily use as an example: Dell R730, 64 cores, 384GB of memory, equipped with a 3.2 TB MLC NVME SSD. A server like this, running a standard production-level PostgreSQL, can handle up to hundreds of thousands of TPS (Transactions Per Second), and read-only queries can reach four to five hundred thousand. How much does it cost? Including electricity, internet, IDC (Internet Data Center) hosting, and maintenance fees, and amortizing the cost over a 5-year depreciation period, the total lifecycle cost is around seventy-five thousand, or fifteen thousand per year. Of course, for production use, high availability is a must, so a typical database cluster would need two to three physical servers, amounting to an annual cost of thirty to forty-five thousand dollars.
This calculation does not include the cost of DBA (Database Administrator) salaries: managing tens of thousands of cores with just two or three people is not that expensive.
If you directly purchase a cloud database of this specification, what would the cost be? Let’s look at the pricing from Alibaba Cloud in China. Since the basic version is practically unusable for production (for reference, see: “Cloud Database: From Deletion to Desertion”), we’ll choose the high-availability version, which usually involves two to three instances. Opting for a yearly or monthly subscription, for an exclusive use of a 64-core, 256GB instance with PostgreSQL 15 on x86 in East China 1 availability zone, and adding a 3.2TB ESSD PL3 cloud disk, the annual cost ranges from 250,000 (for a 3-year contract) to 750,000 (on-demand), with storage costs accounting for about a third.
Let’s also consider AWS, the leading public cloud provider. The closest equivalent on AWS is the db.m5.16xlarge, also with 64 cores and 256GB across multiple availability zones. Similarly, we add a 3.2TB io1 SSD disk with up to 80,000 IOPS, and review the global and China-specific pricing from AWS. The overall cost ranges from 1.6 million to 2.17 million yuan per year, with storage costs accounting for about half. The table below summarizes the costs:
Payment Model | Price | Cost Per Year (¥10k) |
---|---|---|
Self-hosted IDC (Single Physical Server) | ¥75k / 5 years | 1.5 |
Self-hosted IDC (2-3 Server HA Cluster) | ¥150k / 5 years | 3.0 ~ 4.5 |
Alibaba Cloud RDS (On-demand) | ¥87.36/hour | 76.5 |
Alibaba Cloud RDS (Monthly) | ¥42k / month | 50 |
Alibaba Cloud RDS (Yearly, 15% off) | ¥425,095 / year | 42.5 |
Alibaba Cloud RDS (3-year, 50% off) | ¥750,168 / 3 years | 25 |
AWS (On-demand) | $25,817 / month | 217 |
AWS (1-year, no upfront) | $22,827 / month | 191.7 |
AWS (3-year, full upfront) | $120k + $17.5k/month | 175 |
AWS China/Ningxia (On-demand) | ¥197,489 / month | 237 |
AWS China/Ningxia (1-year, no upfront) | ¥143,176 / month | 171 |
AWS China/Ningxia (3-year, full upfront) | ¥647k + ¥116k/month | 160.6 |
Comparing the costs of self-hosting versus using a cloud database:
Method | Cost Per Year (¥10k) |
---|---|
Self-hosted Servers 64C / 384G / 3.2TB NVME SSD 660K IOPS (2-3 servers) | 3.0 ~ 4.5 |
Alibaba Cloud RDS PG High-Availability pg.x4m.8xlarge.2c, 64C / 256GB / 3.2TB ESSD PL3 | 25 ~ 50 |
AWS RDS PG High-Availability db.m5.16xlarge, 64C / 256GB / 3.2TB io1 x 80k IOPS | 160 ~ 217 |
So, the question arises, if the cost of using a cloud database for one year is enough to buy several or even more than a dozen better-performing servers, what then is the real benefit of using a cloud database? Of course, large public cloud customers can usually receive business discounts, but even with discounts, the magnitude of the cost difference is hard to ignore.
Is using a cloud database essentially paying a “tax” for lack of better judgment?
Comfort Zone
No Silver Bullet
Databases are the heart of data-intensive applications, and since applications follow the lead of databases, choosing the right database requires great care. Evaluating a database involves many dimensions: reliability, security, simplicity, scalability, extensibility, observability, maintainability, cost-effectiveness, and more. What clients truly care about are these attributes, not the fluffy tech hype: decoupling of compute and storage, Serverless, HTAP, cloud-native, hyper-converged… These must be translated into the language of engineering: what is sacrificed for what is gained to be meaningful.
Public cloud proponents like to gild it: cost-saving, flexible elasticity, reliable security, a panacea for enterprise digital transformation, a revolution from horse-drawn carriage to automobile, good, fast, and cheap, and so on. Unfortunately, few of these claims are realistic. Cutting through the fluff, the only real advantage of cloud databases over professional database services is elasticity, specifically in two aspects: low startup costs and strong scalability.
Low startup costs mean that users don’t need to build data centers, hire and train personnel, or purchase servers to get started; strong scalability refers to the ease of upgrading or downgrading configurations and scaling capacity. Thus, the core scenarios where public cloud truly fits are these two:
- Initial stages, simple applications with minimal traffic
- Workloads with no predictable pattern, experiencing drastic fluctuations
The former mainly includes simple websites, personal blogs, small apps and tools, demos/PoC, and the latter includes low-frequency data analysis/model training, sudden spike sales or ticket grabs, celebrity concurrent affairs, and other special scenarios.
The business model of the public cloud is essentially renting: renting servers, bandwidth, storage, experts. It’s fundamentally no different from renting houses, cars, or power banks. Of course, renting servers and outsourcing operations doesn’t sound very appealing, hence the term “cloud” sounds more like a cyber landlord. The characteristic of the renting model is its elasticity.
The renting model has its benefits, for example, shared power banks can meet the temporary, small-scale charging needs when out and about. However, for many people who travel daily between home and work, using shared power banks to charge phones and computers every day is undoubtedly absurd, especially when renting a power bank for an hour costs about the same as buying one outright after just a few hours. Renting a car can perfectly meet temporary, emergency, or one-off transportation needs: traveling or hauling goods on short notice. But if your travel needs are frequent and local, purchasing an autonomous car might be the most convenient and cost-effective choice.
The key issue is the rent-to-own ratio, with houses taking decades, cars a few years, but public cloud servers usually only a few months. If your business can sustain for more than a few months, why rent instead of buying outright?
Thus, the money cloud vendors make comes either from VC-funded tech startups seeking explosive growth, from entities in gray areas where the rent-seeking space exceeds the cloud premium, from the foolishly wealthy, or from a mishmash of webmasters, students, VPN individual users. Smart, high-net-worth enterprise customers, who could enjoy a comfortable, affordable big house, why would they choose to squeeze into rental cube apartments?
If your business fits within the suitable spectrum for the public cloud, that’s fantastic; but paying several times or even more than a tenfold premium for unnecessary flexibility and elasticity is purely a tax on lack of intelligence.
The Cost Assassin
Profit margins lie in information asymmetry, but you can’t fool everyone forever.
The elasticity of public clouds is designed for their business model: low startup costs, high maintenance costs. Low startup costs lure users to the cloud, and the excellent elasticity adapts to business growth at any time. However, once the business stabilizes, vendor lock-in occurs, making it difficult to switch providers, and the high maintenance costs become unbearable for users. This model is colloquially known as the pig slaughtering scam.
In the first stop of my career, I had such a pig slaughtering experience that remains vivid in my memory. As one of the first internal BUs forced onto A Cloud, A Cloud directly sent engineers to handhold us through the cloud migration process. We replaced our self-built big data/database suite with ODPS. The service was indeed decent, but the annual cost of storage and computing soared from tens of millions to nearly a hundred million, almost transferring all profits to A Cloud, making it the ultimate cost assassin.
At my next stop, the situation was entirely different. We managed a PostgreSQL and Redis database cluster with 25,000 cores and 4.5 million QPS. For databases of this size, if charged by AWS RCU/WCU, the cost would be billions annually; even with a long-term, yearly subscription and a substantial business discount, it would still cost at least fifty to sixty million. Yet, we had only two or three DBAs and a few hundred servers, with a total annual cost of manpower and assets of less than ten million.
Here, we can calculate the unit cost in a simple way: the comprehensive cost of using one core (including memory/disk) for a month, termed as core·month. We have calculated the costs of self-built server types and compared them with the quotes from cloud providers, with the following rough results:
硬件算力 | 单价 |
---|---|
IDC自建机房(独占物理机 A1: 64C384G) | 19 |
IDC自建机房(独占物理机 B1: 40C64G) | 26 |
IDC自建机房(独占物理机 C2: 8C16G) | 38 |
IDC自建机房(容器,超卖200%) | 17 |
IDC自建机房(容器,超卖500%) | 7 |
UCloud 弹性虚拟机(8C16G,有超卖) | 25 |
阿里云 弹性服务器 2x内存(独占无超卖) | 107 |
阿里云 弹性服务器 4x内存(独占无超卖) | 138 |
阿里云 弹性服务器 8x内存(独占无超卖) | 180 |
AWS C5D.METAL 96C 200G (按月无预付) | 100 |
AWS C5D.METAL 96C 200G(预付3年) | 80 |
数据库 | |
AWS RDS PostgreSQL db.T2 (4x) | 440 |
AWS RDS PostgreSQL db.M5 (4x) | 611 |
AWS RDS PostgreSQL db.R6G (8x) | 786 |
AWS RDS PostgreSQL db.M5 24xlarge | 1328 |
阿里云 RDS PG 2x内存(独占) | 260 |
阿里云 RDS PG 4x内存(独占) | 320 |
阿里云 RDS PG 8x内存(独占) | 410 |
ORACLE数据库授权 | 10000 |
So, the question arises, why can server hardware priced at twenty units be sold for hundreds, and why does installing cloud database software on it multiply the price? Is it because the operations are made of gold, or is the server made of gold?
A common response is: Databases are the crown jewels of foundational software, embodying countless intangible intellectual properties BlahBlah. Thus, it’s reasonable for the software to be priced much higher than the hardware. This reasoning might be acceptable for top-tier commercial databases like Oracle, or console games from Sony and Nintendo.
But for cloud databases (RDS for PostgreSQL/MySQL/…) on public clouds, which are essentially rebranded and modified open-source database kernels with added control software and shared DBA services, this markup is absurd: the database kernel is free. Is your control software made of gold, or are your DBAs made of gold?
The secret of public clouds lies here: they acquire customers with ‘cheap’ S3 and EC2, then “slaughter the pig” with RDS.
Although nearly half of the revenue of domestic public cloud IaaS (storage, computing, network) comes with only a 15% to 20% gross margin, the revenue from public cloud PaaS may be lower, but its gross margin can reach 50%, utterly outperforming the resource-selling IaaS. And the most representative of PaaS services is the cloud database.
Normally, if you’re not using public cloud as just an IDC 2.0 or CDN provider, the most expensive service would be the database. Are the storage, computing, and networking resources on the public cloud expensive? Strictly speaking, not outrageously so. The cost of hosting and maintaining a physical machine in an IDC is about twenty to thirty units per core·month, while the price of using one CPU core for a month on the public cloud ranges from seventy to two hundred units, considering various discounts and activities, as well as the premium for elasticity, it’s barely within an acceptable range.
However, cloud databases are outrageously expensive, with the price for the same computing power per month being several times to over ten times higher than the corresponding hardware. For the cheaper Alibaba Cloud, the price per core·month ranges from two hundred to four hundred units, and for the more expensive AWS, it can reach seven to eight hundred or even more than a thousand.
If you’re only using one or two cores of RDS, then it might not be worth the hassle to switch, just consider it a tax. But if your business scales up and you’re still not moving away from the cloud, then you’re really paying a tax on intelligence.
Good Enough?
Make no mistake, RDS are just mediocre solutions.
When it comes to the cost of cloud databases/cloud servers, if you manage to bring this up with a sales representative, their pitch usually shifts to: Yes, we are expensive, but we are good!
But, is RDS really that good?
It could be argued that for toy applications, small websites, personal hosting, and self-built databases by those without technical knowledge, RDS might be good enough. However, from the perspective of high-value clients and database experts, RDS is seen as nothing more than a barely passable, communal pot meal.
At its core, the public cloud stems from the operational capabilities that overflowed from major tech companies. People within these companies are well aware of their own technological capabilities, so there’s no need for any undue idolization. (Google might be an exception).
Take performance as an example, where the core metric is latency/response time, especially tail latency, which directly impacts user experience: nobody wants to wait several seconds for a screen swipe to register. Here, disks play a crucial role.
In our production environment, we use local NVME SSDs, with a typical 4K write latency of 15µs and read latency of 94µs. Consequently, the response time for a simple query on PostgreSQL is usually between 100 ~ 300µs, and the response time on the application side typically ranges from 200 ~ 600µs; for simple queries, our SLO is to achieve within 1ms for hits, and within 10ms for misses, with anything over 10ms considered a slow query that needs optimization.
AWS’s EBS service, when tested with fio, shows disastrously poor performance【6】, with default gp3 read/write latencies at 40ms and io1 at 10ms, a difference of nearly three orders of magnitude. Moreover, the maximum IOPS is only eighty thousand. RDS uses EBS for storage, and if even a single disk access takes 10ms, it’s just not workable. io2 does use the same kind of NVMe SSDs as we do, but remote block storage has double the latency compared to local disks.
Indeed, sometimes cloud providers do offer sufficiently good local NVMe SSDs, but they cunningly impose various restrictions to prevent users from using EC2 to build their own databases. AWS restricts this by offering NVMe SSD Ephemeral Storage, which is wiped clean upon EC2 restart, rendering it unusable. Alibaba Cloud, on the other hand, sells at exorbitant prices, with Alibaba Cloud’s ESSD PL3 being 200 times more expensive compared to direct hardware purchases. For a reference, a 3.2TB enterprise-grade PCI-E SSD card, AWS’s rental ratio is one month, while Alibaba Cloud’s is nine days, meaning the cost of renting for this period is equivalent to purchasing the entire disk. If purchasing on Alibaba Cloud with a three-year maximum discount at 50% off, the cost of three years of rent could buy 123 of the same disks, nearly 400TB in total ownership.
Observability is another example where no RDS monitoring can be considered “good”. Just looking at the number of monitoring metrics, while knowing whether a service is dead or alive may require only a few metrics, fault root cause analysis benefits from as many monitoring metrics as possible to build a good context. Most RDS services only provide basic monitoring metrics and rudimentary dashboards. For example, Alibaba Cloud RDS PG【7】’s so-called “enhanced monitoring” includes only a few pitiful metrics. AWS and PG database-related metrics are also less than 100, while our own monitoring system includes over 800 types of host metrics, 610 types for PGSQL database, 257 types for REDIS, totaling around three thousand metrics, dwarfing those of RDS.
Public Demo:https://demo.pigsty.cc
As for reliability, I used to have basic trust in the reliability of RDS, until the scandal in A Cloud’s Hong Kong data center a month ago. The rented data center had a fire suppression incident with water spraying, OSS malfunction, and numerous RDS services became unusable and could not be switched over; then, A Cloud’s entire Region’s control services crashed due to a single AZ failure, making a mockery of the idea of remote disaster recovery for cloud databases.
Of course, this is not to say that self-hosting would not have these issues, but a somewhat reliable IDC hosting would not commit such egregious errors. Security needs no further discussion; recent high-profile incidents, such as the infamous SHGA; hardcoding AK/SK in a bunch of sample codes, is cloud RDS more secure? Don’t make me laugh. At least traditional architecture has a VPN bastion as a layer of protection, while databases exposed on the public network with weak passwords are all too common, undeniably increasing the attack surface.
Another widespread criticism of cloud databases is their extensibility. RDS does not grant users dbsu permissions, meaning users cannot install extension plugins in the database. PostgreSQL’s charm lies in its extensions; without extensions, PostgreSQL is like cola without ice, yogurt without sugar. A more severe issue is that in some failure scenarios, users even lose the ability to help themselves, as seen in the real case of “Cloud Database: From Deleting Databases to Running Away”: WAL archiving and PITR, basic functionalities, are charged features in RDS. Regarding maintainability, some say cloud databases are convenient as they can be created and destroyed with just a few clicks, but those people have likely never experienced the ordeal of receiving SMS verification codes for restarting each database. With Database as Code style management tools, true engineers would never resort to such “ClickOps”.
However, everything has its rationale for existence, and cloud databases are not entirely without merit. In terms of scalability, cloud databases have indeed reached new heights, such as various Serverless offerings, but this is more about saving money and overselling for cloud providers, offering little real benefit to users.
The Obsolescence of DBAs?
Dominated by cloud vendors, hard to hire, and now obsolete?
Another pitch from cloud databases is that with RDS, you don’t need a DBA anymore!
For instance, this infamous article, “Why Are You Still Hiring DBAs?”, argues: We have autonomous database services! RDS and DAS can solve these database-related issues for you, making DBAs redundant, haha. I believe anyone who seriously reviews these so-called “autonomous services” or “AI4DB” official documents will not buy into this nonsense: Can a module, hardly a decent monitoring system, truly autonomize database management? This is simply a pipe dream.
DBA, Database Administrator, historically also known as database coordinators or database programmers, is a role that spans across development and operations teams, covering responsibilities related to DA, SA, Dev, Ops, and SRE. They manage everything related to data and databases: setting management policies and operational standards, planning hardware and software architecture, coordinating database management, verifying table schema designs, optimizing SQL queries, analyzing execution plans, and even handling emergencies and data recovery.
The first value of a DBA is in security fallback: They are the guardians of a company’s core data assets and can potentially inflict fatal damage on the company. There’s a joke at Ant Financial that besides regulatory bodies, DBAs could bring Alipay down. Executives often fail to recognize the importance of DBAs until a database incident occurs, and a group of CXOs anxiously watches the DBA firefighting and fixing… Compared to the cost of avoiding a database failure, such as a nationwide flight halt, Youtube downtime, or a factory’s day-long shutdown, hiring a DBA seems trivial.
The second value of a DBA is in model design and optimization. Many companies do not care if their queries perform poorly, thinking “hardware is cheap,” and solve problems by throwing money at hardware. However, improperly tuned queries/SQL or poorly designed data models and table structures can degrade performance by orders of magnitude. At some scale, the cost of stacking hardware becomes prohibitively expensive compared to hiring a competent DBA. Frankly, I believe the largest IT expenditure in most companies is due to developers not using databases correctly.
A DBA’s basic skill is managing DBs, but their essence lies in Administration: managing the entropy created by developers requires more than just technical skills. “Autonomous databases” might help analyze loads and create indexes, but they cannot understand business needs or push for table structure optimization, and this is something unlikely to be replaced by cloud services in the next two to three decades.
Whether it’s public cloud vendors, cloud-native/private clouds represented by Kubernetes, or local open-source RDS alternatives like Pigsty, their core value is to use software as much as possible, not manpower, to deal with system complexity. So, will cloud software revolutionize operations and DBA roles?
Cloud is not a maintenance-free outsourcing magic. According to the law of complexity conservation, the only way for the roles of system administrators or database administrators to disappear is for them to be rebranded as “DevOps Engineers” or SREs. Good cloud software can shield you from mundane operational tasks and solve 70% of routine issues, but there will always be complex problems that only humans can handle. You might need fewer people to manage these cloud services, but you still need people【12】. After all, you need knowledgeable individuals to coordinate and manage, so you don’t get exploited by cloud vendors.
In large organizations, a good DBA is crucial. However, excellent DBAs are quite rare and in high demand, leading to this role being outsourced in most organizations: either to professional database service companies or to cloud database RDS service teams. Organizations unable to find DBA providers must internally assign this responsibility to their development/operations staff, until the company grows large enough or suffers enough setbacks for some Dev/Ops to develop the necessary skills.
DBAs won’t become obsolete; they will just be monopolized by cloud vendors to provide services.
The Shadow of Monopoly
In 2020, the adversary of computing freedom was cloud computing software.
Beyond the “obsolescence of DBAs,” the emergence of the cloud harbors a larger threat. We should be concerned about a scenario where public clouds (or “Fruit Clouds”) grow dominant, controlling both hardware and operators up and down the stream, monopolizing computation, storage, networking, and top-tier expert resources to become the de facto standards. If all top-tier DBAs are poached by cloud vendors to provide centralized shared expert services, ordinary business organizations will completely lose the capability to utilize databases effectively, eventually left with no choice but to be “taxed” by public clouds. Ultimately, all IT resources would be concentrated in the hands of cloud vendors, who, by controlling a critical few, could control the entire internet. This is undeniably contrary to the original intent behind the creation of the internet.
Let me reference Martin Kelppmann:
In the 2020s, the enemy of freedom in computing is cloud software
i.e. software that runs primarily on the vendor’s servers, with all your data also stored on those servers. This cloud software may have a client-side component (a mobile app, or the JavaScript running in your web browser), but it only works in conjunction with the vendor’s server. And there are lots of problems with cloud software:
- If the company providing the cloud software goes out of business or decides to discontinue a product, the software stops working, and you are locked out of the documents and data you created with that software. This is an especially common problem with software made by a startup, which may get acquired by a bigger company that has no interest in continuing to maintain the startup’s product.
- Google and other cloud services may suddenly suspend your account with no warning and no recourse, for example if an automated system thinks you have violated its terms of service. Even if your own behaviour has been faultless, someone else may have hacked into your account and used it to send malware or phishing emails without your knowledge, triggering a terms of service violation. Thus, you could suddenly find yourself permanently locked out of every document you ever created on Google Docs or another app.
- With software that runs on your own computer, even if the software vendor goes bust, you can continue running it forever (in a VM/emulator if it’s no longer compatible with your OS, and assuming it doesn’t need to contact a server to check for a license check). For example, the Internet Archive has a collection of over 100,000 historical software titles that you can run in an emulator inside your web browser! In contrast, if cloud software gets shut down, there is no way for you to preserve it, because you never had a copy of the server-side software, neither as source code nor in compiled form.
- The 1990s problem of not being able to customise or extend software you use is aggravated further in cloud software. With closed-source software that runs on your own computer, at least someone could reverse-engineer the file format it uses to store its data, so that you could load it into alternative software (think pre-OOXML Microsoft Office file formats, or Photoshop files before the spec was published). With cloud software, not even that is possible, since the data is only stored in the cloud, not in files on your own computer.
If all software was free and open source, these problems would all be solved. However, making the source code available is not actually necessary to solve the problems with cloud software; even closed-source software avoids the aforementioned problems, as long as it is running on your own computer rather than the vendor’s cloud server. Note that the Internet Archive is able to keep historical software working without ever having its source code: for purposes of preservation, running the compiled machine code in an emulator is just fine. Maybe having the source code would make it a little easier, but it’s not crucial. The important thing is having a copy of the software at all.
My collaborators and I have previously argued for local-first software, which is a response to these problems with cloud software. Local-first software runs on your own computer, and stores its data on your local hard drive, while also retaining the convenience of cloud software, such as real-time collaboration and syncing your data across all of your devices. It is nice for local-first software to also be open source, but this is not necessary: 90% of its benefits apply equally to closed-source local-first software.
Cloud software, not closed-source software, is the real threat to software freedom, because the harm from being suddenly locked out of all of your data at the whim of a cloud provider is much greater than the harm from not being able to view and modify the source code of your software. For that reason, it is much more important and pressing that we make local-first software ubiquitous. If, in that process, we can also make more software open-source, then that would be nice, but that is less critical. Focus on the biggest and most urgent challenges first.
However, where there is action, there is reaction; local-first software began to emerge as a countermeasure to cloud software. For instance, the Cloud Native movement, represented by Kubernetes, is a prime example. “Cloud Native,” as interpreted by cloud vendors, means “software that is natively developed in a public cloud environment”; but its real significance should be “local,” as in the opposite of “Cloud” — “Local” cloud / private cloud / proprietary cloud / native cloud, the name doesn’t matter. What matters is that it can run anywhere the user desires (including on cloud servers), not just exclusively in public clouds!
Open-source projects, like Kubernetes, have democratized resource scheduling/smart operations capabilities previously unique to public clouds, enabling enterprises to run ‘cloud’-like capabilities locally. For stateless applications, it already serves as a sufficiently robust “cloud operating system” kernel. Open-source alternatives like Ceph/Minio offer S3 object storage solutions, leaving only one question unanswered: how to manage and deploy stateful, production-grade database services?
The era is calling for an open-source alternative to RDS.
Answer & Solution
Pigsty —— Battery-Included, Local-First PostgreSQL Distribution as an Open-Source RDS Alternative
I envision a future where everyone has the factual right to freely use superior services, not confined within the pens (Pigsty) provided by a few public cloud vendors, feeding on subpar offerings. This is why I created Pigsty — a better, open-source, free alternative to PostgreSQL RDS. It enables users to launch a database service better than cloud RDS with just one click, anywhere (including on cloud servers).
Pigsty is a comprehensive complement to PostgreSQL, and a spicy critique of cloud databases. Its name signifies “pigpen,” but it also stands for Postgres In Great STYle, symbolizing PostgreSQL at its peak. It is a solution distilled from best practices in managing and using PostgreSQL, entirely based on open source software and capable of running anywhere. Born from real-world, high-standard PostgreSQL clusters, it was developed to fulfill the database management needs of Tantan, performing valuable work across eight dimensions:
Observability is akin to heaven; as heaven maintains vigor through movement, a gentleman should constantly strive for self-improvement; Pigsty utilizes a modern observability tech stack to create an unparalleled monitoring system for PostgreSQL, offering a comprehensive overview from global dashboards to granular historical metrics for individual tables/indexes/functions, enabling users to see through the system and control everything. Additionally, Pigsty’s monitoring system can operate independently to monitor third-party database instances.
Controllability is akin to earth; as earth’s nature is broad and bearing, a gentleman should carry the world with broad virtue; Pigsty provides Database as Code capabilities: describing the state of database clusters through expressive declarative interfaces and employing idempotent scripts for deployment and adjustments. This allows users to customize finely without worrying about implementation details, freeing their mental capacity and lowering the barrier from expert to novice level in database operations and management.
Scalability is like water; as water flows and encompasses all, a gentleman should maintain virtue consistently; Pigsty offers pre-configured tuning templates (OLTP / OLAP / CRIT / TINY), automatically optimizes system parameters, and can infinitely scale read capabilities through cascading replication. It also utilizes Pgbouncer for connection pool optimization to handle massive concurrent connections; Pigsty ensures PostgreSQL’s performance is maximized under modern hardware conditions: achieving tens of thousands of concurrent connections, million-level single-point query QPS, and hundred thousand-level single transaction TPS.
Maintainability is like fire; as fire illuminates, a great person should illuminate the surroundings; Pigsty allows for online instance addition or removal for scaling, Switchover/rolling upgrades for scaling up or down, and offers a downtime-free migration solution based on logical replication, minimizing maintenance windows to sub-second levels, thus enhancing the system’s evolvability, availability, and maintainability to a new standard.
Security is like thunder; as thunder signifies awe, a gentleman should reflect and be cautious; Pigsty offers an access control model following the principle of least privilege, along with various security features: synchronous commit for replication to prevent data loss, data directory checksums to prevent corruption, SSL encryption for network traffic to prevent eavesdropping, and AES-256 for remote backups to prevent data leaks. As long as physical hardware and passwords are secure, users need not worry about database security.
Simplicity is like wind; as wind follows its path, a gentleman should decree and act accordingly; Using Pigsty is no more difficult than any cloud database. It aims to deliver complete RDS functionality with the least complexity, allowing users to choose and combine modules as needed. Pigsty offers a Vagrant-based local development testing sandbox and Terraform cloud IaC for one-click deployment templates, enabling offline installation on any new EL node and complete environment replication.
Reliability is like a mountain; as a mountain stands firm, a gentleman should be steadfast in thought; Pigsty provides a high-availability architecture with self-healing capabilities to address hardware issues and offers out-of-the-box PITR for recovery from accidental data deletion and software flaws, verified through long-term, large-scale production environment operation and high-availability drills.
Extensibility is like a lake; as a lake reflects beauty, a gentleman should discuss and practice with friends; Pigsty deeply integrates core PostgreSQL ecosystem extensions like PostGIS, TimescaleDB, Citus, PGVector, and numerous extension plugins. It offers a modular design of the Prometheus/Grafana observability tech stack, and high-availability deployment of MINIO, ETCD, Redis, Greenplum, etc., in combination with PostgreSQL.
More importantly, Pigsty is entirely open-source and free software, licensed under AGPL v3.0. Powered by passion, you can run a fully functional, even superior RDS service at the cost of mere hardware expenses per month. Whether you are a beginner or a seasoned DBA, managing a massive cluster or a small setup, whether you’re already using RDS or have set up databases locally, if you are a PostgreSQL user, Pigsty will be beneficial to you, completely free. You can focus on the most interesting or valuable parts of your business and leave the routine tasks to the software.
RDS Cost and Scale Cost Curve
Pigsty allows you to practice the ultimate FinOps principle — running production-level PostgreSQL RDS database services anywhere (ECS, resource cloud, data center servers, even local notebook virtual machines) at prices close to pure resource costs. Turning the cost capability of cloud databases from being proportional to marginal resource costs to virtually zero fixed learning costs.
If you can use a better RDS service at a fraction of the cost, then continuing to use cloud databases is truly just a tax on your intellect.
Reference
【1】Why we’re leaving the cloud
【2】上云“被坑”十年终放弃,寒冬里第一轮“下云潮”要来了?
【3】Aliyun RDS for PostgreSQL Pricing
【5】 AWS Pricing Calculator (中国宁夏)
【8】你为什么还在招DBA
【11】Me-Better RDS PostgreSQL 替代 Pigsty
【12】Pigsty v2 正式发布:更好的RDS PG开源替代
【13】It’s time to say goodbye to the GPL
【14】云数据库是不是智商税?
【16】蹭个热度–要不要DBA和云数据库
【17】你怎么不招DBA
【18】DBA还是一份好工作吗?
【19】云RDS:从删库到跑路
Postgres
Self-Hosting Dify with PG, PGVector, and Pigsty
Dify – The Innovation Engine for GenAI Applications
Dify is an open-source LLM app development platform. Orchestrate LLM apps from agents to complex AI workflows, with an RAG engine. Which claims to be more production-ready than LangChain.
Of course, a workflow orchestration software like this needs a database underneath — Dify uses PostgreSQL for meta data storage, as well as Redis for caching and a dedicated vector database. You can pull the Docker images and play locally, but for production deployment, this setup won’t suffice — there’s no HA, backup, PITR, monitoring, and many other things.
Fortunately, Pigsty provides a battery-include production-grade highly available PostgreSQL cluster, along with the Redis and S3 (MinIO) capabilities that Dify needs, as well as Nginx to expose the Web service, making it the perfect companion for Dify.
Off-load the stateful part to Pigsty, you only need to pull up the stateless blue circle part with a simple
docker compose up
.
BTW, I have to criticize the design of the Dify template. Since the metadata is already stored in PostgreSQL, why not add pgvector
to use it as a vector database? What’s even more baffling is that pgvector
is a separate image and container. Why not just use a PG image with pgvector
included?
Dify “supports” a bunch of flashy vector databases, but since PostgreSQL is already chosen, using pgvector
as the default vector database is the natural choice. Similarly, I think the Dify team should consider removing Redis. Celery task queues can use PostgreSQL as backend storage, so having multiple databases is unnecessary. Entities should not be multiplied without necessity.
Therefore, the Pigsty-provided Dify Docker Compose template has made some adjustments to the official example. It removes the db
and redis
database images, using instances managed by Pigsty. The vector database is fixed to use pgvector
, reusing the same PostgreSQL instance.
In the end, the architecture is simplified to three stateless containers: dify-api
, dify-web
, and dify-worker
, which can be created and destroyed at will. There are also two optional containers, ssrf_proxy
and nginx
, for providing proxy and some security features.
There’s a bit of state management left with file system volumes, storing things like private keys. Regular backups are sufficient.
Reference:
Pigsty Preparation
Let’s take the single-node installation of Pigsty as an example. Suppose you have a machine with the IP address 10.10.10.10
and already pigsty installed.
We need to define the database clusters required in the Pigsty configuration file pigsty.yml
.
Here, we define a cluster named pg-meta
, which includes a superuser named dbuser_dify
(the implementation is a bit rough as the Migration script executes CREATE EXTENSION
which require dbsu privilege for now),
And there’s a database named dify
with the pgvector
extension installed, and a specific firewall rule allowing users to access the database from anywhere using a password (you can also restrict it to a more precise range, such as the Docker subnet 172.0.0.0/8
).
Additionally, a standard single-instance Redis cluster redis-dify
with the password redis.dify
is defined.
pg-meta:
hosts: { 10.10.10.10: { pg_seq: 1, pg_role: primary } }
vars:
pg_cluster: pg-meta
pg_users: [ { name: dbuser_dify ,password: DBUser.Dify ,superuser: true ,pgbouncer: true ,roles: [ dbrole_admin ] } ]
pg_databases: [ { name: dify, owner: dbuser_dify, extensions: [ { name: pgvector } ] } ]
pg_hba_rules: [ { user: dbuser_dify , db: all ,addr: world ,auth: pwd ,title: 'allow dify user world pwd access' } ]
redis-dify:
hosts: { 10.10.10.10: { redis_node: 1 , redis_instances: { 6379: { } } } }
vars: { redis_cluster: redis-dify ,redis_password: 'redis.dify' ,redis_max_memory: 64MB }
For demonstration purposes, we use single-instance configurations. You can refer to the Pigsty documentation to deploy high availability PG and Redis clusters. After defining the clusters, use the following commands to create the PG and Redis clusters:
bin/pgsql-add pg-meta # create the dify database cluster
bin/redis-add redis-dify # create redis cluster
Alternatively, you can define a new business user and business database on an existing PostgreSQL cluster, such as pg-meta
, and create them with the following commands:
bin/pgsql-user pg-meta dbuser_dify # create dify biz user
bin/pgsql-db pg-meta dify # create dify biz database
You should be able to access PostgreSQL and Redis with the following connection strings, adjusting the connection information as needed:
psql postgres://dbuser_dify:[email protected]:5432/dify -c 'SELECT 1'
redis-cli -u redis://[email protected]:6379/0 ping
Once you confirm these connection strings are working, you’re all set to start deploying Dify.
For demonstration purposes, we’re using direct IP connections. For a multi-node high availability PG cluster, please refer to the service access section.
The above assumes you are already a Pigsty user familiar with deploying PostgreSQL and Redis clusters. You can skip the next section and proceed to see how to configure Dify.
Starting from Scratch
If you’re already familiar with setting up Pigsty, feel free to skip this section.
Prepare a fresh Linux x86_64 node that runs compatible OS, then run as a sudo-able user:
curl -fsSL https://repo.pigsty.io/get | bash
It will download Pigsty source to your home, then perform configure and install to finish the installation.
cd ~/pigsty # get pigsty source and entering dir
./bootstrap # download bootstrap pkgs & ansible [optional]
./configure # pre-check and config templating [optional]
# change pigsty.yml, adding those cluster definitions above into all.children
./install.yml # install pigsty according to pigsty.yml
You should insert the above PostgreSQL cluster and Redis cluster definitions into the pigsty.yml
file, then run install.yml
to complete the installation.
Redis Deploy
Pigsty will not deploy redis in install.yml
, so you have to run redis.yml
playbook to install Redis explicitly:
./redis.yml
Docker Deploy
Pigsty will not deploy Docker by default, so you need to install Docker with the docker.yml
playbook.
./docker.yml
Dify Confiugration
You can configure dify in the .env
file:
All parameters are self-explanatory and filled in with default values that work directly in the Pigsty sandbox env. Fill in the database connection information according to your actual conf, consistent with the PG/Redis cluster configuration above.
Changing the SECRET_KEY
field is recommended. You can generate a strong key with openssl rand -base64 42
:
# meta parameter
DIFY_PORT=8001 # expose dify nginx service with port 8001 by default
LOG_LEVEL=INFO # The log level for the application. Supported values are `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`
SECRET_KEY=sk-9f73s3ljTXVcMT3Blb3ljTqtsKiGHXVcMT3BlbkFJLK7U # A secret key for signing and encryption, gen with `openssl rand -base64 42`
# postgres credential
PG_USERNAME=dbuser_dify
PG_PASSWORD=DBUser.Dify
PG_HOST=10.10.10.10
PG_PORT=5432
PG_DATABASE=dify
# redis credential
REDIS_HOST=10.10.10.10
REDIS_PORT=6379
REDIS_USERNAME=''
REDIS_PASSWORD=redis.dify
# minio/s3 [OPTIONAL] when STORAGE_TYPE=s3
STORAGE_TYPE=local
S3_ENDPOINT='https://sss.pigsty'
S3_BUCKET_NAME='infra'
S3_ACCESS_KEY='dba'
S3_SECRET_KEY='S3User.DBA'
S3_REGION='us-east-1'
Now we can pull up dify with docker compose:
cd pigsty/app/dify && make up
Expose Dify Service via Nginx
Dify expose web/api via its own nginx through port 80 by default, while pigsty uses port 80 for its own Nginx. T
herefore, we expose Dify via port 8001
by default, and use Pigsty’s Nginx to forward to this port.
Change infra_portal
in pigsty.yml
, with the new dify
line:
infra_portal: # domain names and upstream servers
home : { domain: h.pigsty }
grafana : { domain: g.pigsty ,endpoint: "${admin_ip}:3000" , websocket: true }
prometheus : { domain: p.pigsty ,endpoint: "${admin_ip}:9090" }
alertmanager : { domain: a.pigsty ,endpoint: "${admin_ip}:9093" }
blackbox : { endpoint: "${admin_ip}:9115" }
loki : { endpoint: "${admin_ip}:3100" }
dify : { domain: dify.pigsty ,endpoint: "10.10.10.10:8001", websocket: true }
Then expose dify web service via Pigsty’s Nginx server:
./infra.yml -t nginx
Don’t forget to add dify.pigsty
to your DNS or local /etc/hosts
/ C:\Windows\System32\drivers\etc\hosts
to access via domain name.
PGCon.Dev 2024, The conf that shutdown PG for a week
PGCon.Dev, once known as PGCon—the annual must-attend gathering for PostgreSQL hackers and key forum for its future direction, has been held in Ottawa since its inception in 2007.
This year marks a new chapter as the original organizer, Dan, hands over the reins to a new team, and the event moves to SFU’s Harbour Centre in Vancouver, kicking off a new era with grandeur.
How engaging was this event? Peter Eisentraut, member of the PostgreSQL core team, noted that during PGCon.Dev, there were no code commits to PostgreSQL – resulting in the longest pause in twenty years, a whopping week! a historic coding ceasefire! Why? Because all the developers were at the conference!
Considering the last few interruptions that occurred in the early days of the project twenty years ago,
I’ve been embracing PostgreSQL for a decade, but attending a global PG Hacker conference in person was a first for me, and I’m immensely grateful for the organizer’s efforts. PGCon.Dev 2024 wrapped up on May 31st, though this post comes a bit delayed as I’ve been exploring Vancouver and Banff National Park ;)
Day Zero: Extension Summit
Day zero is for leadership meetings, and I’ve signed up for the afternoon’s Extension Ecosystem Summit.
Maybe this summit is somewhat subtly related to my recent post, “Postgres is eating the database world,” highlighting PostgreSQL’s thriving extension ecosystem as a unique and critical success factor and drawing the community’s attention.
I participated in David Wheeler’s Binary Packing session along with other PostgreSQL community leaders. Despite some hesitation to new standards like PGXN v2 from current RPM/APT maintainers. In the latter half of the summit, I attended a session led by Yurii Rashkovskii, discussing extension directory structures, metadata, naming conflicts, version control, and binary distribution ideas.
Prior to this summit, the PostgreSQL community had held six mini-summits discussing these topics intensely, with visions for the extension ecosystem’s future development shared by various speakers. Recordings of these sessions are available on YouTube.
And after the summit, I had a chance to chat with Devrim, the RPM maintainer, about extension packing, which was quite enlightening.
“Keith Fan Group” – from Devrim on Extension Summit
Day One: Brilliant Talks and Bar Social
The core of PGCon.Dev lies in its sessions. Unlike some China domestic conferences with mundane product pitches or irrelevant tech details, PGCon.Dev presentations are genuinely engaging and substantive. The official program kicked off on May 29th, after a day of closed-door leadership meetings and the Ecosystem Summit on the 28th.
The opening was co-hosted by Jonathan Katz, 1 of the 7 core PostgreSQL team members and a chief product manager at AWS RDS, and Melanie Plageman, a recent PG committer from Microsoft. A highlight was when Andres Freund, the developer who uncovered the famous xz
backdoor, was celebrated as a superhero on stage.
Following the opening, the regular session tracks began. Although conference videos aren’t out yet, I’m confident they’ll “soon” be available on YouTube. Most sessions had three tracks running simultaneously; here are some highlights I chose to attend.
Pushing the Boundaries of PG Extensions
Yurii’s talk, “Pushing the Boundaries of PG Extensions,” tackled what kind of extension APIs PostgreSQL should offer. PostgreSQL boasts robust extensibility, but the current extension API set is decades old, from the 9.x era. Yurii’s proposal aims to address issues with the existing extension mechanisms. Challenges such as installing multiple versions of an extension simultaneously, avoiding database restarts post-extension installations, managing extensions as seamlessly as data, and handling dependencies among extensions were discussed.
Yurii and Viggy, founders of Omnigres, aim to transform PostgreSQL into a full-fledged application development platform, including hosting HTTP servers directly within the database. They designed a new extension API and management system for PostgreSQL to achieve this. Their innovative improvements represent the forefront of exploration into PostgreSQL’s core extension mechanisms.
I had a great conversation with Viggy and Yurii. Yurii walked me through compiling and installing Omni. I plan to support the Omni extension series in the next version of Pigsty, making this powerful application development framework plug-and-play.
Anarchy in DBMS
Abigale Kim from CMU, under the mentorship of celebrity professor Andy Pavlo, delivered the talk “Anarchy in the Database—A Survey and Evaluation of DBMS Extensibility.” This topic intrigued me since Pigsty’s primary value proposition is about PostgreSQL’s extensibility.
Kim’s research revealed interesting insights: PostgreSQL is the most extensible DBMS, supporting 9 out of 10 extensibility points, closely followed by DuckDB. With over 375+ available extensions, PostgreSQL significantly outpaces other databases.
Kim’s quantitative analysis of compatibility levels among these extensions resulted in a compatibility matrix, unveiling conflicts—most notably, powerful extensions like TimescaleDB and Citus are prone to clashes. This information is very valuable for users and distribution maintainers. Read the detailed study.
I joked with Kim that — now I could brag about PostgreSQL’s extensibility with her research data.
How PostgreSQL is Misused and Abused
The first-afternoon session featured Karen Jex from CrunchyData, an unusual perspective from a user — and a female DBA. Karen shared common blunders by PostgreSQL beginners. While I knew all of what was discussed, it reaffirmed that beginners worldwide make similar mistakes — an enlightening perspective for PG Hackers, who found the session quite engaging.
PostgreSQL and the AI Ecosystem
The second-afternoon session by Bruce Momjian, co-founder of the PGDG and a core committee member from the start, was unexpectedly about using PostgreSQL’s multi-dimensional arrays and queries to implement neural network inference and training.
Haha, some ArgParser code. I see it
During the lunch, Bruce explained that Jonathan Katz needed a topic to introduce the vector database extension PGVector in the PostgreSQL ecosystem, so Bruce was roped in to “fill the gap.” Check out Bruce’s presentation.
PB-Level PostgreSQL Deployments
The third afternoon session by Chris Travers discussed their transition from using ElasticSearch for data storage—with a poor experience and high maintenance for 1PB over 30 days retention, to a horizontally scaled PostgreSQL cluster perfectly handling 10PB of data. Normally, PostgreSQL comfort levels on a single machine range from several dozen to a few hundred TB. Deployments at the PB scale, especially at 10PB, even within a horizontally scaled cluster, are exceptionally rare. While the practice itself is standard—partitioning and sharding—the scale of data managed is truly impressive.
Highlight: When Hardware and Database Collide
Undoubtedly, the standout presentation of the event, Margo Seltzer’s talk “When Hardware and Database Collide” was not only the most passionate and compelling talk I’ve attended live but also a highlight across all conferences.
Professor Margo Seltzer, formerly of Harvard and now at UBC, a member of the National Academy of Engineering and the creator of BerkeleyDB, delivered a powerful discourse on the core challenges facing databases today. She pinpointed that the bottleneck for databases has shifted from disk I/O to main memory speed. Emerging hardware technologies like HBM and CXL could be the solution, posing new challenges for PostgreSQL hackers to tackle.
This was a refreshing divergence from China’s typically monotonous academic talks, leaving a profound impact and inspiration. Once the conference video is released, I highly recommend checking out her energizing presentation.
WetBar Social
Following Margo’s session, the official Social Event took place at Rogue Kitchen & Wetbar, just a street away from the venue at Waterfront Station, boasting views of the Pacific and iconic Vancouver landmarks.
The informal setting was perfect for engaging with new and old peers. Conversations with notable figures like Devrim, Tomasz, Yurii, and Keith were particularly enriching. As an RPM maintainer, I had an extensive and fruitful discussion with Devrim, resolving many longstanding queries.
The atmosphere was warm and familiar, with many reconnecting after long periods. A couple of beers in, conversations flowed even more freely among fellow PostgreSQL enthusiasts. The event concluded with an invitation from Melanie for a board game session, which I regretfully declined due to my limited English in such interactive settings.
Day 2: Debate, Lunch, and Lighting Talks
Multi-Threading Postgres
The warmth from the previous night’s socializing carried over into the next day, marked by the eagerly anticipated session on “Multi-threaded PostgreSQL,” which was packed to capacity. The discussion, initiated by Heikki, centered on the pros and cons of PostgreSQL’s process and threading models, along with detailed implementation plans and current progress.
The threading model promises numerous benefits: cheaper connections (akin to a built-in connection pool), shared relation and plan caches, dynamic adjustment of shared memory, config changes without restarts, more aggressive Vacuum operations, runtime Explain Analyze, and easier memory usage limits per connection. However, there’s significant opposition, maybe led by Tom Lane, concerned about potential bugs, loss of isolation benefits from the multi-process model, and extensive incompatibilities requiring many extensions to be rewritten.
Heikki laid out a detailed plan to transition to the threading model over five to seven years, aiming for a seamless shift without intermediate states. Intriguingly, he cited Tom Lane’s critical comment in his presentation:
For the record, I think this will be a disaster. There is far too much code that will get broken, largely silently, and much of it is not under our control. – regards, tom lane
Although Tom Lane smiled benignly without voicing any objections, the strongest dissent at the conference came not from him but from an extension maintainer. The elder developer, who maintained several extensions, raised concerns about compatibility, specifically regarding memory allocation and usage. Heikki suggested that extension authors should adapt their work to a new model during a transition grace period of about five years. This suggestion visibly upset the maintainer, who left the meeting in anger.
Given the proposed threading model’s significant impact on the existing extension ecosystem, I’m skeptical about this change. At the conference, I consulted on the threading model with Heikki, Tom Lane, and other hackers. The community’s overall stance is one of curious & cautious observation. So far, the only progress is in PG 17, where the fork-exec-related code has been refactored and global variables marked for future modifications. Any real implementation would likely not occur until at least PG 20+.
Hallway Track
The sessions on the second day were slightly less intense than the first, so many attendees chose the “Hallway Track”—engaging in conversations in the corridors and lobby. I’m usually not great at networking as an introvert, but the vibrant atmosphere quickly drew me in. Eye contact alone was enough to spark conversations, like triggering NPC dialogue in an RPG. I also managed to subtly promote Pigsty to every corner of the PG community.
Despite being a first-timer at PGCon.Dev, I was surprised by the recognition and attention I received, largely thanks to the widely read article, “PostgreSQL is eating the Database world.” Many recognized me by my badge Vonng / Pigsty.
A simple yet effective networking trick is never to underestimate small gifts’ effect. I handed out gold-plated Slonik pins, PostgreSQL’s mascot, which became a coveted item at the conference. Everyone who talked with me received one, and those who didn’t have one were left asking where to get one. LOL
Anyway, I’m glad to have made many new friends and connections.
Multinational Community Lunch
As for lunch, HighGo hosted key participants from the American, European, Japanese, and Chinese PostgreSQL communities at a Cantonese restaurant in Vancouver. The conversation ranged from serious technical discussions to lighter topics. I’ve made acquaintance with Tatsuro Yamada, who gives a talk, “Advice is seldom welcome but efficacious”, and Kyotaro Horiguchi, a core contributor to PostgreSQL known for his work on WAL replication and multibyte string processing and the author of pg_hint_plan.
Another major contributor to the PostgreSQL community, Mark Wong organizes PGUS and has developed a series of PostgreSQL monitoring extensions. He also manages community merchandise like contributor coins, shirts, and stickers. He even handcrafted a charming yarn elephant mascot, which was so beloved that one was sneakily “borrowed” at the last PG Conf US.
Bruce, already a familiar face in the PG Chinese community, Andreas Scherbaum from Germany, organizer of the European PG conferences, and Miao Jian, founder of Han Gao, representing the only Chinese database company at PGCon.Dev, all shared insightful stories and discussions about the challenges and nuances of developing databases in their respective regions.
On returning to the conference venue, I had a conversation with Jan Wieck, a PostgreSQL Hackers Emeritus. He shared his story of participating in the PostgreSQL project from the early days and encouraged me to get more involved in the PostgreSQL community, reminding me its future depends on the younger generation.
Making PG Hacking More Inclusive
At PGCon.Dev, a special session on community building chaired by Robert Hass, featured three new PostgreSQL contributors sharing their journey and challenges, notably the barriers for non-native English speakers, timezone differences, and emotionally charged email communications.
Robert emphasized in a post-conference blog his desire to see more developers from India and Japan rise to senior positions within PostgreSQL’s ranks, noting the underrepresentation from these countries despite their significant developer communities.
While we’re at it, I’d really like to see more people from India and Japan in senior positions within the project. We have very large developer communities from both countries, but there is no one from either of those countries on the core team, and they’re also underrepresented in other senior positions. At the risk of picking specific examples to illustrate a general point, there is no one from either country on the infrastructure team or the code of conduct committee. We do have a few committers from those countries, which is very good, and I was pleased to see Amit Kapila on the 2024.pgconf.dev organizing commitee, but, overall, I think we are still not where we should be. Part of getting people involved is making them feel like they are not alone, and part of it is also making them feel like progression is possible. Let’s try harder to do that.
Frankly, the lack of mention of China in discussions about inclusivity at PGCon.Dev, in favor of India and Japan, left a bittersweet taste. But I think China deserves the snub, given its poor international community engagement.
China has hundreds of “domestic/national” databases, many mere forks of PostgreSQL, yet there’s only a single notable Chinese contributor to PostgreSQL is Richard Guo from PieCloudDB, recently promoted to PG Committer. At the conference, the Chinese presence was minimal, summing up to five attendees, including myself. It’s regrettable that China’s understanding and adoption of PostgreSQL lag behind the global standard by about 10-15 years.
I hope my involvement can bootstrap and enhance Chinese participation in the global PostgreSQL ecosystem, making their users, developers, products, and open-source projects more recognized and accepted worldwide.
Lightning Talks
Yesterday’s event closed with a series of lightning talks—5 minutes max per speaker, or you’re out. Concise and punchy, the session wrapped up 11 topics in just 45 minutes. Keith shared improvements to PG Monitor, and Peter Eisentraut discussed SQL standard updates. But from my perspective, the highlight was Devrim Gündüz’s talk on PG RPMs, which lived up to his promise of a “big reveal” made at the bar the previous night, packing a 75-slide presentation into 5 lively minutes.
Speaking of PostgreSQL, despite being open-source, most users rely on official pre-compiled binaries packages rather than building from source. I maintain 34 RPM extensions for Pigsty, my Postgres distribution, but much of the ecosystem, including over a hundred other extensions, is managed by Devrim from the official PGDG repo. His efforts ensure quality for the world’s most advanced and popular database.
Devrim is a fascinating character — a Turkish native living in London, a part-time DJ, and the maintainer of the PGDG RPM repository, sporting a PostgreSQL logo tattoo. After an engaging chat about the PGDG repository, he shared insights on how extensions are added, highlighting the community-driven nature of PGXN and recent popular additions like pgvector
, (which I made the suggestion haha).
Interestingly, with the latest Pigsty v2.7 release, four of my maintained (packaging) extensions (pgsql-http
, pgsql-gzip
, pg_net
, pg_bigm
) were adopted into the PGDG official repository. Devrim admitted to scouring Pigsty’s extension list for good picks, though he humorously dismissed any hopes for my Rust pgrx extensions making the cut, reaffirming his commitment to not blending Go and Rust plugins into the official repository. Our conversation was so enriching that I’ve committed myself to becoming a “PG Extension Hunter,” scouting and recommending new plugins for official inclusion.
Day 3: Unconference
One of the highlights of PGCon.Dev is the Unconference, a self-organized meeting with no predefined agenda, driven by attendee-proposed topics. On day three, Joseph Conway facilitated the session where anyone could pitch topics for discussion, which were then voted on by participants. My proposal for a Built-in Prometheus Metrics Exporter was merged into a broader Observability topic spearheaded by Jeremy.
The top-voted topics were Multithreading (42 votes), Observability (35 votes), and Enhanced Community Engagement (35 votes). Observability features were a major focus, reflecting the community’s priority. I proposed integrating a contrib monitoring extension in PostgreSQL to directly expose metrics via HTTP endpoint, using pg_exporter
as a blueprint but embedded to overcome the limitations of external components, especially during crash recovery scenarios.
There’s a clear focus on observability among the community. As the author of pg_exporter, I proposed developing a first-party monitoring extension. This extension would integrate Prometheus monitoring endpoints directly into PostgreSQL, exposing metrics via HTTP without needing external components.
The rationale for this proposal is straightforward. While pg_exporter
works well, it’s an external component that adds management complexity. Additionally, in scenarios where PostgreSQL is recovering from a crash and cannot accept new connections, external tools struggle to access internal states. An in-kernel extension could seamlessly capture this information.
The suggested implementation involves a background worker process similar to the bgw_replstatus
extension. This process would listen on an additional port to expose monitoring metrics through HTTP, using pg_exporter as a blueprint. Metrics would primarily be defined via a Collector configuration table, except for a few critical system indicators.
This idea garnered attention from several PostgreSQL hackers at the event. Developers from EDB and CloudNativePG are evaluating whether pg_exporter
could be directly integrated into their distributions as part of their monitoring solutions. And finally, an Observability Special Interest Group (SIG) was formed by attendees interested in observability, planning to continue discussions through a mailing list.
Issue: Support for LoongArch Architecture
During the last two days, I have had some discussions with PG Hackers about some Chinese-specific issues.
A notable suggestion was supporting the LoongArch architecture in the PGDG global repository, which was backed by some enthusiastically local chip and OS manufacturers. Despite the interest, Devrim indicated a “No” due to the lack of support for LoongArch in OS Distro used in the PG community, like CentOS 7, Rocky 8/9, and Debian 10/11/12. Tomasz Rybak was more receptive, noting potential future support if LoongArch runs on Debian 13.
In summary, official PG RPMs might not yet support LoongArch, but APT has a chance, contingent on broader OS support for mainstream open-source Linux distributions.
Issue: Server-side Chinese Character Encoding
At the recent conference, Jeremy Schneider presented an insightful talk on collation rules that resonated with me. He highlighted the pitfalls of not using C.UTF8 for collation, a practice I’ve advocated for based on my own research, and which is detailed in his presentation here.
Post-talk, I discussed further with Jeremy and Peter Eisentraut the nuances of character sets in China, especially the challenges posed by the mandatory GB18030 standard, which PostgreSQL can handle on the client side but not the server side. Also, there are some issues about 20 Chinese characters not working on the convert_to
+ gb18030
encoding mapping.
Closing
The event closed with Jonathan Katz and Melanie Plageman wrapping up an exceptional conference that leaves us looking forward to next year’s PGCon.Dev 2025 in Canada, possibly in Vancouver, Toronto, Ottawa, or Montreal.
Inspired by the engagement at this conference, I’m considering presenting on Pigsty or PostgreSQL observability next year.
Notably, following the conference, Pigsty’s international CDN traffic spiked significantly, highlighting the growing global reach of our PostgreSQL distribution, which really made my day.
Pigsty CDN Traffic Growth after PGCon.Dev 2024
Some slides are available on the official site, and some blog posts about PGCon are here.Dev 2024:
Postgres is eating the database world
PostgreSQL isn’t just a simple relational database; it’s a data management framework with the potential to engulf the entire database realm. The trend of “Using Postgres for Everything” is no longer limited to a few elite teams but is becoming a mainstream best practice.
OLAP’s New Challenger
In a 2016 database meetup, I argued that a significant gap in the PostgreSQL ecosystem was the lack of a sufficiently good columnar storage engine for OLAP workloads. While PostgreSQL itself offers lots of analysis features, its performance in full-scale analysis on larger datasets doesn’t quite measure up to dedicated real-time data warehouses.
Consider ClickBench, an analytics performance benchmark, where we’ve documented the performance of PostgreSQL, its ecosystem extensions, and derivative databases. The untuned PostgreSQL performs poorly (x1050), but it can reach (x47) with optimization. Additionally, there are three analysis-related extensions: columnar store Hydra (x42), time-series TimescaleDB (x103), and distributed Citus (x262).
ClickBench c6a.4xlarge, 500gb gp2 results in relative time
This performance can’t be considered bad, especially compared to pure OLTP databases like MySQL and MariaDB (x3065, x19700); however, its third-tier performance is not “good enough,” lagging behind the first-tier OLAP components like Umbra, ClickHouse, Databend, SelectDB (x3~x4) by an order of magnitude. It’s a tough spot - not satisfying enough to use, but too good to discard.
However, the arrival of ParadeDB and DuckDB changed the game!
ParadeDB’s native PG extension pg_analytics achieves second-tier performance (x10), narrowing the gap to the top tier to just 3–4x. Given the additional benefits, this level of performance discrepancy is often acceptable - ACID, freshness and real-time data without ETL, no additional learning curve, no maintenance of separate services, not to mention its ElasticSearch grade full-text search capabilities.
DuckDB focuses on pure OLAP, pushing analysis performance to the extreme (x3.2) — excluding the academically focused, closed-source database Umbra, DuckDB is arguably the fastest for practical OLAP performance. It’s not a PG extension, but PostgreSQL can fully leverage DuckDB’s analysis performance boost as an embedded file database through projects like DuckDB FDW and pg_quack.
The emergence of ParadeDB and DuckDB propels PostgreSQL’s analysis capabilities to the top tier of OLAP, filling the last crucial gap in its analytic performance.
The Pendulum of Database Realm
The distinction between OLTP and OLAP didn’t exist at the inception of databases. The separation of OLAP data warehouses from databases emerged in the 1990s due to traditional OLTP databases struggling to support analytics scenarios’ query patterns and performance demands.
For a long time, best practice in data processing involved using MySQL/PostgreSQL for OLTP workloads and syncing data to specialized OLAP systems like Greenplum, ClickHouse, Doris, Snowflake, etc., through ETL processes.
DDIA, Martin Kleppmann, ch3, The republic of OLTP & Kingdom of OLAP
Like many “specialized databases,” the strength of dedicated OLAP systems often lies in performance — achieving 1-3 orders of magnitude improvement over native PG or MySQL. The cost, however, is redundant data, excessive data movement, lack of agreement on data values among distributed components, extra labor expense for specialized skills, extra licensing costs, limited query language power, programmability and extensibility, limited tool integration, poor data integrity and availability compared with a complete DMBS.
However, as the saying goes, “What goes around comes around”. With hardware improving over thirty years following Moore’s Law, performance has increased exponentially while costs have plummeted. In 2024, a single x86 machine can have hundreds of cores (512 vCPU EPYC 9754x2), several TBs of RAM, a single NVMe SSD can hold up to 64TB, and a single all-flash rack can reach 2PB; object storage like S3 offers virtually unlimited storage.
Hardware advancements have solved the data volume and performance issue, while database software developments (PostgreSQL, ParadeDB, DuckDB) have addressed access method challenges. This puts the fundamental assumptions of the analytics sector — the so-called “big data” industry — under scrutiny.
As DuckDB’s manifesto "Big Data is Dead" suggests, the era of big data is over. Most people don’t have that much data, and most data is seldom queried. The frontier of big data recedes as hardware and software evolve, rendering “big data” unnecessary for 99% of scenarios.
If 99% of use cases can now be handled on a single machine with standalone DuckDB or PostgreSQL (and its replicas), what’s the point of using dedicated analytics components? If every smartphone can send and receive texts freely, what’s the point of pagers? (With the caveat that North American hospitals still use pagers, indicating that maybe less than 1% of scenarios might genuinely need “big data.”)
The shift in fundamental assumptions is steering the database world from a phase of diversification back to convergence, from a big bang to a mass extinction. In this process, a new era of unified, multi-modeled, super-converged databases will emerge, reuniting OLTP and OLAP. But who will lead this monumental task of reconsolidating the database field?
PostgreSQL: The Database World Eater
There are a plethora of niches in the database realm: time-series, geospatial, document, search, graph, vector databases, message queues, and object databases. PostgreSQL makes its presence felt across all these domains.
A case in point is the PostGIS extension, which sets the de facto standard in geospatial databases; the TimescaleDB extension awkwardly positions “generic” time-series databases; and the vector extension, PGVector, turns the dedicated vector database niche into a punchline.
This isn’t the first time; we’re witnessing it again in the oldest and largest subdomain: OLAP analytics. But PostgreSQL’s ambition doesn’t stop at OLAP; it’s eyeing the entire database world!
What makes PostgreSQL so capable? Sure, it’s advanced, but so is Oracle; it’s open-source, as is MySQL. PostgreSQL’s edge comes from being both advanced and open-source, allowing it to compete with Oracle/MySQL. But its true uniqueness lies in its extreme extensibility and thriving extension ecosystem.
TimescaleDB survey: what is the main reason you choose to use PostgreSQL
PostgreSQL isn’t just a relational database; it’s a data management framework capable of engulfing the entire database galaxy. Besides being open-source and advanced, its core competitiveness stems from extensibility, i.e., its infra’s reusability and extension’s composability.
The Magic of Extreme Extensibility
PostgreSQL allows users to develop extensions, leveraging the database’s common infra to deliver features at minimal cost. For instance, the vector database extension pgvector, with just several thousand lines of code, is negligible in complexity compared to PostgreSQL’s millions of lines. Yet, this “insignificant” extension achieves complete vector data types and indexing capabilities, outperforming lots of specialized vector databases.
Why? Because pgvector’s creators didn’t need to worry about the database’s general additional complexities: ACID, recovery, backup & PITR, high availability, access control, monitoring, deployment, 3rd-party ecosystem tools, client drivers, etc., which require millions of lines of code to solve well. They only focused on the essential complexity of their problem.
For example, ElasticSearch was developed on the Lucene search library, while the Rust ecosystem has an improved next-gen full-text search library, Tantivy, as a Lucene alternative. ParadeDB only needs to wrap and connect it to PostgreSQL’s interface to offer search services comparable to ElasticSearch. More importantly, it can stand on the shoulders of PostgreSQL, leveraging the entire PG ecosystem’s united strength (e.g., mixed searches with PG Vector) to “unfairly” compete with another dedicated database.
Pigsty has 255 extensions available. And there are 1000+ more in the ecosystem
The extensibility brings another huge advantage: the composability of extensions, allowing different extensions to work together, creating a synergistic effect where 1+1 » 2. For instance, TimescaleDB can be combined with PostGIS for spatio-temporal data support; the BM25 extension for full-text search can be combined with the PGVector extension, providing hybrid search capabilities.
Furthermore, the distributive extension Citus can transparently transform a standalone cluster into a horizontally partitioned distributed database cluster. This capability can be orthogonally combined with other features, making PostGIS a distributed geospatial database, PGVector a distributed vector database, ParadeDB a distributed full-text search database, and so on.
What’s more powerful is that extensions evolve independently, without the cumbersome need for main branch merges and coordination. This allows for scaling — PG’s extensibility lets numerous teams explore database possibilities in parallel, with all extensions being optional, not affecting the core functionality’s reliability. Those features that are mature and robust have the chance to be stably integrated into the main branch.
PostgreSQL achieves both foundational reliability and agile functionality through the magic of extreme extensibility, making it an outlier in the database world and changing the game rules of the database landscape.
Game Changer in the DB Arena
The emergence of PostgreSQL has shifted the paradigms in the database domain: Teams endeavoring to craft a “new database kernel” now face a formidable trial — how to stand out against the open-source, feature-rich Postgres. What’s their unique value proposition?
Until a revolutionary hardware breakthrough occurs, the advent of practical, new, general-purpose database kernels seems unlikely. No singular database can match the overall prowess of PG, bolstered by all its extensions — not even Oracle, given PG’s ace of being open-source and free.
A niche database product might carve out a space for itself if it can outperform PostgreSQL by an order of magnitude in specific aspects (typically performance). However, it usually doesn’t take long before the PostgreSQL ecosystem spawns open-source extension alternatives. Opting to develop a PG extension rather than a whole new database gives teams a crushing speed advantage in playing catch-up!
Following this logic, the PostgreSQL ecosystem is poised to snowball, accruing advantages and inevitably moving towards a monopoly, mirroring the Linux kernel’s status in server OS within a few years. Developer surveys and database trend reports confirm this trajectory.
PostgreSQL has long been the favorite database in HackerNews & StackOverflow. Many new open-source projects default to PostgreSQL as their primary, if not only, database choice. And many new-gen companies are going All in PostgreSQL.
As “Radical Simplicity: Just Use Postgres” says, Simplifying tech stacks, reducing components, accelerating development, lowering risks, and adding more features can be achieved by “Just Use Postgres.” Postgres can replace many backend technologies, including MySQL, Kafka, RabbitMQ, ElasticSearch, Mongo, and Redis, effortlessly serving millions of users. Just Use Postgres is no longer limited to a few elite teams but becoming a mainstream best practice.
What Else Can Be Done?
The endgame for the database domain seems predictable. But what can we do, and what should we do?
PostgreSQL is already a near-perfect database kernel for the vast majority of scenarios, making the idea of a kernel “bottleneck” absurd. Forks of PostgreSQL and MySQL that tout kernel modifications as selling points are essentially going nowhere.
This is similar to the situation with the Linux OS kernel today; despite the plethora of Linux distros, everyone opts for the same kernel. Forking the Linux kernel is seen as creating unnecessary difficulties, and the industry frowns upon it.
Accordingly, the main conflict is no longer the database kernel itself but two directions— database extensions and services! The former pertains to internal extensibility, while the latter relates to external composability. Much like the OS ecosystem, the competitive landscape will concentrate on database distributions. In the database domain, only those distributions centered around extensions and services stand a chance for ultimate success.
Kernel remains lukewarm, with MariaDB, the fork of MySQL’s parent, nearing delisting, while AWS, profiting from offering services and extensions on top of the free kernel, thrives. Investment has flowed into numerous PG ecosystem extensions and service distributions: Citus, TimescaleDB, Hydra, PostgresML, ParadeDB, FerretDB, StackGres, Aiven, Neon, Supabase, Tembo, PostgresAI, and our own PG distro — — Pigsty.
A dilemma within the PostgreSQL ecosystem is the independent evolution of many extensions and tools, lacking a unifier to synergize them. For instance, Hydra releases its own package and Docker image, and so does PostgresML, each distributing PostgreSQL images with their own extensions and only their own. These images and packages are far from comprehensive database services like AWS RDS.
Even service providers and ecosystem integrators like AWS fall short in front of numerous extensions, unable to include many due to various reasons (AGPLv3 license, security challenges with multi-tenancy), thus failing to leverage the synergistic amplification potential of PostgreSQL ecosystem extensions.
Extesion Category Pigsty RDS & PGDG AWS RDS PG Aliyun RDS PG Add Extension Free to Install Not Allowed Not Allowed Geo Spatial PostGIS 3.4.2 PostGIS 3.4.1 PostGIS 3.3.4 Time Series TimescaleDB 2.14.2 Distributive Citus 12.1 AI / ML PostgresML 2.8.1 Columnar Hydra 1.1.1 Vector PGVector 0.6 PGVector 0.6 pase 0.0.1 Sparse Vector PG Sparse 0.5.6 Full-Text Search pg_bm25 0.5.6 Graph Apache AGE 1.5.0 GraphQL PG GraphQL 1.5.0 Message Queue pgq 3.5.0 OLAP pg_analytics 0.5.6 DuckDB duckdb_fdw 1.1 CDC wal2json 2.5.3 wal2json 2.5 Bloat Control pg_repack 1.5.0 pg_repack 1.5.0 pg_repack 1.4.8 Point Cloud PG PointCloud 1.2.5 Ganos PointCloud 6.1 Many important extensions are not available on Cloud RDS (PG 16, 2024-02-29)
Extensions are the soul of PostgreSQL. A Postgres without the freedom to use extensions is like cooking without salt, a giant constrained.
Addressing this issue is one of our primary goals.
Our Resolution: Pigsty
Despite earlier exposure to MySQL Oracle, and MSSQL, when I first used PostgreSQL in 2015, I was convinced of its future dominance in the database realm. Nearly a decade later, I’ve transitioned from a user and administrator to a contributor and developer, witnessing PG’s march toward that goal.
Interactions with diverse users revealed that the database field’s shortcoming isn’t the kernel anymore — PostgreSQL is already sufficient. The real issue is leveraging the kernel’s capabilities, which is the reason behind RDS’s booming success.
However, I believe this capability should be as accessible as free software, like the PostgreSQL kernel itself — available to every user, not just renting from cyber feudal lords.
Thus, I created Pigsty, a battery-included, local-first PostgreSQL distribution as an open-source RDS Alternative, which aims to harness the collective power of PostgreSQL ecosystem extensions and democratize access to production-grade database services.
Pigsty stands for PostgreSQL in Great STYle, representing the zenith of PostgreSQL.
We’ve defined six core propositions addressing the central issues in PostgreSQL database services:
Extensible Postgres, Reliable Infras, Observable Graphics, Available Services, Maintainable Toolbox, and Composable Modules.
The initials of these value propositions offer another acronym for Pigsty:
Postgres, Infras, Graphics, Service, Toolbox, Yours.
Your graphical Postgres infrastructure service toolbox.
Extensible PostgreSQL is the linchpin of this distribution. In the recently launched Pigsty v2.6, we integrated DuckDB FDW and ParadeDB extensions, massively boosting PostgreSQL’s analytical capabilities and ensuring every user can easily harness this power.
Our aim is to integrate the strengths within the PostgreSQL ecosystem, creating a synergistic force akin to the Ubuntu of the database world. I believe the kernel debate is settled, and the real competitive frontier lies here.
- PostGIS: Provides geospatial data types and indexes, the de facto standard for GIS (& pgPointCloud, pgRouting).
- TimescaleDB: Adds time-series, continuous aggregates, distributed, columnar storage, and automatic compression capabilities.
- PGVector: Support AI vectors/embeddings and ivfflat, hnsw vector indexes (& pg_sparse for sparse vectors).
- Citus: Transforms classic master-slave PG clusters into horizontally partitioned distributed database clusters.
- Hydra: Adds columnar storage and analytics, rivaling ClickHouse’s analytic capabilities.
- ParadeDB: Elevates full-text search and mixed retrieval to ElasticSearch levels (& zhparser for Chinese tokenization).
- Apache AGE: Graph database extension, adding Neo4J-like OpenCypher query support to PostgreSQL.
- PG GraphQL: Adds native built-in GraphQL query language support to PostgreSQL.
- DuckDB FDW: Enables direct access to DuckDB’s powerful embedded analytic database files through PostgreSQL (& DuckDB CLI).
- Supabase: An open-source Firebase alternative based on PostgreSQL, providing a complete app development storage solution.
- FerretDB: An open-source MongoDB alternative based on PostgreSQL, compatible with MongoDB APIs/drivers.
- PostgresML: Facilitates classic machine learning algorithms, calling, deploying, and training AI models with SQL.
Developers, your choices will shape the future of the database world. I hope my work helps you better utilize the world’s most advanced open-source database kernel: PostgreSQL.
Read in Pigsty’s Blog | GitHub Repo: Pigsty | Official Website
PostgreSQL Convention 2024
- Background
- 0x01 Naming Convention
- 0x01 Design Convention
- 0x01 Query Convention
- 0x01 Admin Convention
Roughly translated from PostgreSQL Convention 2024 with Google.
0x00 Background
No Rules, No Lines
The functions of PostgreSQL are very powerful, but to use PostgreSQL well requires the cooperation of backend, operation and maintenance, and DBA.
This article has compiled a development/operation and maintenance protocol based on the principles and characteristics of the PostgreSQL database, hoping to reduce the confusion you encounter when using the PostgreSQL database: hello, me, everyone.
The first version of this article is mainly for PostgreSQL 9.4 - PostgreSQL 10. The latest version has been updated and adjusted for PostgreSQL 15/16.
0x01 naming convention
There are only two hard problems in computer science: cache invalidation and naming .
Generic naming rules (Generic)
- This rule applies to all objects in the database , including: library names, table names, index names, column names, function names, view names, serial number names, aliases, etc.
- The object name must use only lowercase letters, underscores, and numbers, and the first letter must be a lowercase letter.
- The length of the object name must not exceed 63 characters, and the naming
snake_case
style must be uniform. - The use of SQL reserved words is prohibited, use
select pg_get_keywords();
to obtain a list of reserved keywords. - Dollar signs are prohibited
$
, Chinese characters are prohibited, and do notpg
begin with . - Improve your wording taste and be honest and elegant; do not use pinyin, do not use uncommon words, and do not use niche abbreviations.
Cluster naming rules (Cluster)
- The name of the PostgreSQL cluster will be used as the namespace of the cluster resource and must be a valid DNS domain name without any dots or underscores.
- The cluster name should start with a lowercase letter, contain only lowercase letters, numbers, and minus signs, and conform to the regular expression:
[a-z][a-z0-9-]*
. - PostgreSQL database cluster naming usually follows a three-part structure:
pg-<biz>-<tld>
. Database type/business name/business line or environment biz
The English words that best represent the characteristics of the business should only consist of lowercase letters and numbers, and should not contain hyphens-
.- When using a backup cluster to build a delayed slave database of an existing cluster,
biz
the name should be<biz>delay
, for examplepg-testdelay
. - When branching an existing cluster, you can
biz
add a number at the end of : for example,pg-user1
you can branch frompg-user2
,pg-user3
etc. - For horizontally sharded clusters,
biz
the name should includeshard
and be preceded by the shard number, for examplepg-testshard1
,pg-testshard2
,… <tld>
It is the top-level business line and can also be used to distinguish different environments: for example-tt
,-dev
,-uat
,-prod
etc. It can be omitted if not required.
Service naming rules (Service)
- Each PostgreSQL cluster will provide 2 to 6 types of external services, which use fixed naming rules by default.
- The service name is prefixed with the cluster name and the service type is suffixed, for example
pg-test-primary
,pg-test-replica
. - Read-write services are uniformly
primary
named with the suffix, and read-only services are uniformlyreplica
named with the suffix. These two services are required. - ETL pull/individual user query is
offline
named with the suffix, and direct connection to the main database/ETL write isdefault
named with the suffix, which is an optional service. - The synchronous read service is
standby
named with the suffix, and the delayed slave library service isdelayed
named with the suffix. A small number of core libraries can provide this service.
Instance naming rules (Instance)
- A PostgreSQL cluster consists of at least one instance, and each instance has a unique instance number assigned from zero or one within the cluster.
- The instance name
-
is composed of the cluster name + instance number with hyphens , for example:pg-test-1
,pg-test-2
. - Once assigned, the instance number cannot be modified until the instance is offline and destroyed, and cannot be reassigned for use.
- The instance name will be used as a label for monitoring system data
ins
and will be attached to all data of this instance. - If you are using a host/database 1:1 exclusive deployment, the node Hostname can use the database instance name.
Database naming rules (Database)
- The database name should be consistent with the cluster and application, and must be a highly distinguishable English word.
- The naming is
<tld>_<biz>
constructed in the form of ,<tld>
which is the top-level business line. It can also be used to distinguish different environments and can be omitted if not used. <biz>
For a specific business name, for example,pg-test-tt
the cluster can use the library namett_test
ortest
. This is not mandatory, i.e. it is allowed to create<biz>
other databases with different cluster names.- For sharded libraries,
<biz>
the section mustshard
end with but should not contain the shard number, for examplepg-testshard1
,pg-testshard2
bothtestshard
should be used. - Multiple parts use
-
joins. For example:<biz>-chat-shard
,<biz>-payment
etc., no more than three paragraphs in total.
Role naming convention (Role/User)
dbsu
There is only one database super user :postgres
, the user used for streaming replication is namedreplicator
.- The users used for monitoring are uniformly named
dbuser_monitor
, and the super users used for daily management are:dbuser_dba
. - The business user used by the program/service defaults to using
dbuser_<biz>
as the username, for exampledbuser_test
. Access from different services should be differentiated using separate business users. - The database user applied for by the individual user agrees to use
dbp_<name>
, where isname
the standard user name in LDAP. - The default permission group naming is fixed as:
dbrole_readonly
,dbrole_readwrite
,dbrole_admin
,dbrole_offline
.
Schema naming rules (Schema)
- The business uniformly uses a global
<prefix>
as the schema name, as short as possible, and is set tosearch_path
the first element by default. <prefix>
You must not usepublic
,monitor
, and must not conflict with any schema name used by PostgreSQL extensions, such as:timescaledb
,citus
,repack
,graphql
,net
,cron
,… It is not appropriate to use special names:dba
,trash
.- Sharding mode naming rules adopt:
rel_<partition_total_num>_<partition_index>
. The middle is the total number of shards, which is currently fixed at 8192. The suffix is the shard number, counting from 0. Such asrel_8192_0
,…,,,rel_8192_11
etc. - Creating additional schemas, or using
<prefix>
schema names other than , will require R&D to explain their necessity.
Relationship naming rules (Relation)
- The first priority for relationship naming is to have clear meaning. Do not use ambiguous abbreviations or be too lengthy. Follow general naming rules.
- Table names should use plural nouns and be consistent with historical conventions. Words with irregular plural forms should be avoided as much as possible.
- Views use
v_
as the naming prefix, materialized views usemv_
as the naming prefix, temporary tables usetmp_
as the naming prefix. - Inherited or partitioned tables should be prefixed by the parent table name and suffixed by the child table attributes (rules, shard ranges, etc.).
- The time range partition uses the starting interval as the naming suffix. If the first partition has no upper bound, the R&D will specify a far enough time point: grade partition:
tbl_2023
, month-level partitiontbl_202304
, day-level partitiontbl_20230405
, hour-level partitiontbl_2023040518
. The default partition_default
ends with . - The hash partition is named with the remainder as the suffix of the partition table name, and the list partition is manually specified by the R&D team with a reasonable partition table name corresponding to the list item.
Index naming rules (Index)
- When creating an index, the index name should be specified explicitly and consistent with the PostgreSQL default naming rules.
- Index names are prefixed with the table name, primary key indexes
_pkey
end with , unique indexes_key
end with , ordinary indexes end_idx
with , and indexes used forEXCLUDED
constraints_excl
end with . - When using conditional index/function index, the function and condition content used should be reflected in the index name. For example
tbl_md5_title_idx
,tbl_ts_ge_2023_idx
, but the length limit cannot be exceeded.
Field naming rules (Attribute)
- It is prohibited to use system column reserved field names:
oid
,xmin
,xmax
,cmin
,cmax
,ctid
. - Primary key columns are usually named with
id
or asid
a suffix. - The conventional name is the creation time field
created_time
, and the conventional name is the last modification time field.updated_time
is_
It is recommended to use , etc. as the prefix for Boolean fieldshas_
.- Additional flexible JSONB fields are fixed using
extra
as column names. - The remaining field names must be consistent with existing table naming conventions, and any field naming that breaks conventions should be accompanied by written design instructions and explanations.
Enumeration item naming (Enum)
- Enumeration items should be used by default
camelCase
, but other styles are allowed.
Function naming rules (Function)
- Function names start with verbs:
select
,insert
,delete
,update
,upsert
,create
,…. - Important parameters can be reflected in the function name through
_by_ids
the_by_user_ids
suffix of. - Avoid function overloading and try to keep only one function with the same name.
BIGINT/INTEGER/SMALLINT
It is forbidden to overload function signatures through integer types such as , which may cause ambiguity when calling.- Use named parameters for variables in stored procedures and functions, and avoid positional parameters (
$1
,$2
,…). - If the parameter name conflicts with the object name, add before the parameter
_
, for example_user_id
.
Comment specifications (Comment)
- Try your best to provide comments (
COMMENT
) for various objects. Comments should be in English, concise and concise, and one line should be used. - When the object’s schema or content semantics change, be sure to update the annotations to keep them in sync with the actual situation.
0x02 Design Convention
To each his own
Things to note when creating a table
- The DDL statement for creating a table needs to use the standard format, with SQL keywords in uppercase letters and other words in lowercase letters.
- Use lowercase letters uniformly in field names/table names/aliases, and try not to be case-sensitive. If you encounter a mixed case, or a name that conflicts with SQL keywords, you need to use double quotation marks for quoting.
- Use specialized type (NUMERIC, ENUM, INET, MONEY, JSON, UUID, …) if applicable, and avoid using
TEXT
type as much as possible. TheTEXT
type is not conducive to the database’s understanding of the data. Use these types to improve data storage, query, indexing, and calculation efficiency, and improve maintainability. - Optimizing column layout and alignment types can have additional performance/storage gains.
- Unique constraints must be guaranteed by the database, and any unique column must have a corresponding unique constraint.
EXCLUDE
Constraints are generalized unique constraints that can be used to ensure data integrity in low-frequency update scenarios.
Partition table considerations
- If a single table exceeds hundreds of TB, or the monthly incremental data exceeds more than ten GB, you can consider table partitioning.
- A guideline for partitioning is to keep the size of each partition within the comfortable range of 1GB to 64GB.
- Tables that are conditionally partitioned by time range are first partitioned by time range. Commonly used granularities include: decade, year, month, day, and hour. The partitions required in the future should be created at least three months in advance.
- For extremely skewed data distributions, different time granularities can be combined, for example: 1900 - 2000 as one large partition, 2000 - 2020 as year partitions, and after 2020 as month partitions. When using time partitioning, the table name uses the value of the lower limit of the partition (if infinity, use a value that is far enough back).
Notes on wide tables
- Wide tables (for example, tables with dozens of fields) can be considered for vertical splitting, with mutual references to the main table through the same primary key.
- Because of the PostgreSQL MVCC mechanism, the write amplification phenomenon of wide tables is more obvious, reducing frequent updates to wide tables.
- In Internet scenarios, it is allowed to appropriately lower the normalization level and reduce multi-table connections to improve performance.
Primary key considerations
- Every table must have an identity column , and in principle it must have a primary key. The minimum requirement is to have a non-null unique constraint .
- The identity column is used to uniquely identify any tuple in the table, and logical replication and many third-party tools depend on it.
- If the primary key contains multiple columns, it should be specified using a single column after creating the field list of the table DDL
PRIMARY KEY(a,b,...)
. - In principle, it is recommended to use integer
UUID
types for primary keys, which can be used with caution and text types with limited length. Using other types requires explicit explanation and evaluation. - The primary key usually uses a single integer column. In principle, it is recommended to use it
BIGINT
. Use it with cautionINTEGER
and it is not allowedSMALLINT
. - The primary key should be used to
GENERATED ALWAYS AS IDENTITY
generate a unique primary key;SERIAL
,BIGSERIAL
which is only allowed when compatibility with PG versions below 10 is required. - The primary key can use
UUID
the type as the primary key, and it is recommended to use UUID v1/v7; use UUIDv4 as the primary key with caution, as random UUID has poor locality and has a collision probability. - When using a string column as a primary key, you should add a length limit. Generally used
VARCHAR(64)
, use of longer strings should be explained and evaluated. INSERT/UPDATE
In principle, it is forbidden to modify the value of the primary key column, andINSERT RETURNING
it can be used to return the automatically generated primary key value.
Foreign key considerations
- When defining a foreign key, the reference must explicitly set the corresponding action:
SET NULL
,SET DEFAULT
,CASCADE
, and use cascading operations with caution. - The columns referenced by foreign keys need to be primary key columns in other tables/this table.
- Internet businesses, especially partition tables and horizontal shard libraries, use foreign keys with caution and can be solved at the application layer.
Null/Default Value Considerations
- If there is no distinction between zero and null values in the field semantics, null values are not allowed and
NOT NULL
constraints must be configured for the column. - If a field has a default value semantically,
DEFAULT
the default value should be configured.
Numeric type considerations
- Used for regular numeric fields
INTEGER
. Used for numeric columns whose capacity is uncertainBIGINT
. - Don’t use it without special reasons
SMALLINT
. The performance and storage improvements are very small, but there will be many additional problems. - Note that the SQL standard does not provide unsigned integers, and values exceeding
INTMAX
but not exceedingUINTMAX
need to be upgraded and stored. Do not store moreINT64MAX
values inBIGINT
the column as it will overflow into negative numbers. REAL
Represents a 4-byte floating point number,FLOAT
represents an 8-byte floating point number. Floating point numbers can only be used in scenarios where the final precision doesn’t matter, such as geographic coordinates. Remember not to use equality judgment on floating point numbers, except for zero values .- Use exact numeric types
NUMERIC
. If possible, useNUMERIC(p)
andNUMERIC(p,s)
to set the number of significant digits and the number of significant digits in the decimal part. For example, the temperature in Celsius (37.0
) canNUMERIC(3,1)
be stored with 3 significant digits and 1 decimal place using type. - Currency value type is used
MONEY
.
Text type considerations
- PostgreSQL text types include
char(n)
,varchar(n)
,text
. By default,text
the type can be used, which does not limit the string length, but is limited by the maximum field length of 1GB. - If conditions permit, it is preferable to use
varchar(n)
the type to set a maximum string length. This will introduce minimal additional checking overhead, but can avoid some dirty data and corner cases. - Avoid use
char(n)
, this type has unintuitive behavior (padding spaces and truncation) and has no storage or performance advantages in order to be compatible with the SQL standard.
Time type considerations
- There are only two ways to store time: with time zone
TIMESTAMPTZ
and without time zoneTIMESTAMP
. - It is recommended to use one with time zone
TIMESTAMPTZ
. If you useTIMESTAMP
storage, you must use 0 time zone standard time. - Please use it to generate 0 time zone time
now() AT TIME ZONE 'UTC'
. You cannot truncate the time zone directlynow()::TIMESTAMP
. - Uniformly use ISO-8601 format input and output time type:
2006-01-02 15:04:05
to avoid DMY and MDY problems. - Users in China can use
Asia/Hong_Kong
the +8 time zone uniformly because the Shanghai time zone abbreviationCST
is ambiguous.
Notes on enumeration types
- Fields that are more stable and have a small value space (within tens to hundreds) should use enumeration types instead of integers and strings.
- Enumerations are internally implemented using dynamic integers, which have readability advantages over integers and performance, storage, and maintainability advantages over strings.
- Enumeration items can only be added, not deleted, but existing enumeration values can be renamed.
ALTER TYPE <enum_name>
Used to modify enumerations.
UUID type considerations
- Please note that the fully random UUIDv4 has poor locality when used as a primary key. Consider using UUIDv1/v7 instead if possible.
- Some UUID generation/processing functions require additional extension plug-ins, such as
uuid-ossp
,pg_uuidv7
etc. If you have this requirement, please specify it during configuration.
JSON type considerations
- Unless there is a special reason, always use the binary storage
JSONB
type and related functions instead of the text versionJSON
. - Note the subtle differences between atomic types in JSON and their PostgreSQL counterparts: the zero character
text
is not allowed in the type corresponding to a JSON string\u0000
, and the andnumeric
is not allowed in the type corresponding to a JSON numeric type . Boolean values only accept lowercase and literal values.NaN``infinity``true``false
- Please note that objects in the JSON standard
null
and null values in the SQL standardNULL
are not the same concept.
Array type considerations
- When storing a small number of elements, array fields can be used instead of individually.
- Suitable for storing data with a relatively small number of elements and infrequent changes. If the number of elements in the array is very large or changes frequently, consider using a separate table to store the data and using foreign key associations.
- For high-dimensional floating-point arrays, consider using
pgvector
the dedicated data types provided by the extension.
GIS type considerations
- The GIS type uses the srid=4326 reference coordinate system by default.
- Longitude and latitude coordinate points should use the Geography type without explicitly specifying the reference system coordinates 4326
Trigger considerations
- Triggers will increase the complexity and maintenance cost of the database system, and their use is discouraged in principle. The use of rule systems is prohibited and such requirements should be replaced by triggers.
- Typical scenarios for triggers are to automatically modify a row to the current timestamp after modifying it
updated_time
, or to record additions, deletions, and modifications of a table to another log table, or to maintain business consistency between the two tables. - Operations in triggers are transactional, meaning if the trigger or operations in the trigger fail, the entire transaction is rolled back, so test and prove the correctness of your triggers thoroughly. Special attention needs to be paid to recursive calls, deadlocks in complex query execution, and the execution sequence of multiple triggers.
Stored procedure/function considerations
-
Functions/stored procedures are suitable for encapsulating transactions, reducing concurrency conflicts, reducing network round-trips, reducing the amount of returned data, and executing a small amount of custom logic.
-
Stored procedures are not suitable for complex calculations, and are not suitable for trivial/frequent type conversion and packaging. In critical high-load systems, unnecessary computationally intensive logic in the database should be removed, such as using SQL in the database to convert WGS84 to other coordinate systems. Calculation logic closely related to data acquisition and filtering can use functions/stored procedures: for example, geometric relationship judgment in PostGIS.
-
Replaced functions and stored procedures that are no longer in use should be taken offline in a timely manner to avoid conflicts with future functions.
-
Use a unified syntax format for function creation. The signature occupies a separate line (function name and parameters), the return value starts on a separate line, and the language is the first label. Be sure to mark the function volatility level:
IMMUTABLE
,STABLE
,VOLATILE
. Add attribute tags, such as:RETURNS NULL ON NULL INPUT
,PARALLEL SAFE
,ROWS 1
etc.CREATE OR REPLACE FUNCTION nspname.myfunc(arg1_ TEXT, arg2_ INTEGER) RETURNS VOID LANGUAGE SQL STABLE PARALLEL SAFE ROWS 1 RETURNS NULL ON NULL INPUT AS $function$ SELECT 1; $function$;
Use sensible Locale options
- Used by default
en_US.UTF8
and cannot be changed without special reasons. - The default
collate
rule must beC
, to avoid string indexing problems. - https://mp.weixin.qq.com/s/SEXcyRFmdXNI7rpPUB3Zew
Use reasonable character encoding and localization configuration
- Character encoding must be used
UTF8
, any other character encoding is strictly prohibited. - Must be used
C
asLC_COLLATE
the default collation, any special requirements must be explicitly specified in the DDL/query clause to implement. - Character set
LC_CTYPE
is used by defaulten_US.UTF8
, some extensions rely on character set information to work properly, such aspg_trgm
.
Notes on indexing
- All online queries must design corresponding indexes according to their access patterns, and full table scans are not allowed except for very small tables.
- Indexes have a price, and it is not allowed to create unused indexes. Indexes that are no longer used should be cleaned up in time.
- When building a joint index, columns with high differentiation and selectivity should be placed first, such as ID, timestamp, etc.
- GiST index can be used to solve the nearest neighbor query problem, and traditional B-tree index cannot provide good support for KNN problem.
- For data whose values are linearly related to the storage order of the heap table, if the usual query is a range query, it is recommended to use the BRIN index. The most typical scenario is to only append written time series data. BRIN index is more efficient than Btree.
- When retrieving against JSONB/array fields, you can use GIN indexes to speed up queries.
Clarify the order of null values in B-tree indexes
NULLS FIRST
If there is a sorting requirement on a nullable column, it needs to be explicitly specified in the query and indexNULLS LAST
.- Note that
DESC
the default rule for sorting isNULLS FIRST
that null values appear first in the sort, which is generally not desired behavior. - The sorting conditions of the index must match the query, such as:
CREATE INDEX ON tbl (id DESC NULLS LAST);
Disable indexing on large fields
- The size of the indexed field cannot exceed 2KB (1/3 of the page capacity). You need to be careful when creating indexes on text types. The text to be indexed should use
varchar(n)
types with length constraints. - When a text type is used as a primary key, a maximum length must be set. In principle, the length should not exceed 64 characters. In special cases, the evaluation needs to be explicitly stated.
- If there is a need for large field indexing, you can consider hashing the large field and establishing a function index. Or use another type of index (GIN).
Make the most of functional indexes
- Any redundant fields that can be inferred from other fields in the same row can be replaced using functional indexes.
- For statements that often use expressions as query conditions, you can use expression or function indexes to speed up queries.
- Typical scenario: Establish a hash function index on a large field, and establish a
reverse
function index for text columns that require left fuzzy query.
Take advantage of partial indexes
- For the part of the query where the query conditions are fixed, partial indexes can be used to reduce the index size and improve query efficiency.
- If a field to be indexed in a query has only a limited number of values, several corresponding partial indexes can also be established.
- If the columns in some indexes are frequently updated, please pay attention to the expansion of these indexes.
0x03 Query Convention
The limits of my language mean the limits of my world.
—Ludwig Wittgenstein
Use service access
- Access to the production database must be through domain name access services , and direct connection using IP addresses is strictly prohibited.
- VIP is used for services and access, LVS/HAProxy shields the role changes of cluster instance members, and master-slave switching does not require application restart.
Read and write separation
- Internet business scenario: Write requests must go through the main library and be accessed through the Primary service.
- In principle, read requests go from the slave library and are accessed through the Replica service.
- Exceptions: If you need “Read Your Write” consistency guarantees, and significant replication delays are detected, read requests can access the main library; or apply to the DBA to provide Standby services.
Separation of speed and slowness
- Queries within 1 millisecond in production are called fast queries, and queries that exceed 1 second in production are called slow queries.
- Slow queries must go to the offline slave database - Offline service/instance, and a timeout should be set during execution.
- In principle, the execution time of online general queries in production should be controlled within 1ms.
- If the execution time of an online general query in production exceeds 10ms, the technical solution needs to be modified and optimized before going online.
- Online queries should be configured with a Timeout of the order of 10ms or faster to avoid avalanches caused by accumulation.
- ETL data from the primary is prohibited, and the offline service should be used to retrieve data from a dedicated instance.
Use connection pool
- Production applications must access the database through a connection pool and the PostgreSQL database through a 1:1 deployed Pgbouncer proxy. Offline service, individual users are strictly prohibited from using the connection pool directly.
- Pgbouncer connection pool uses Transaction Pooling mode by default. Some session-level functions may not be available (such as Notify/Listen), so special attention is required. Pre-1.21 Pgbouncer does not support the use of Prepared Statements in this mode. In special scenarios, you can use Session Pooling or bypass the connection pool to directly access the database, which requires special DBA review and approval.
- When using a connection pool, it is prohibited to modify the connection status, including modifying connection parameters, modifying search paths, changing roles, and changing databases. The connection must be completely destroyed after modification as a last resort. Putting the changed connection back into the connection pool will lead to the spread of contamination. Use of pg_dump to dump data via Pgbouncer is strictly prohibited.
Configure active timeout for query statements
- Applications should configure active timeouts for all statements and proactively cancel requests after timeout to avoid avalanches. (Go context)
- Statements that are executed periodically must be configured with a timeout smaller than the execution period to avoid avalanches.
- HAProxy is configured with a default connection timeout of 24 hours for rolling expired long connections. Please do not run SQL that takes more than 1 day to execute on offline instances. This requirement will be specially adjusted by the DBA.
Pay attention to replication latency
- Applications must be aware of synchronization delays between masters and slaves and properly handle situations where replication delays exceed reasonable limits.
- Under normal circumstances, replication delays are on the order of 100µs/tens of KB, but in extreme cases, slave libraries may experience replication delays of minutes/hours. Applications should be aware of this phenomenon and have corresponding degradation plans - Select Read from the main library and try again later, or report an error directly.
Retry failed transactions
- Queries may be killed due to concurrency contention, administrator commands, etc. Applications need to be aware of this and retry if necessary.
- When the application reports a large number of errors in the database, it can trigger the circuit breaker to avoid an avalanche. But be careful to distinguish the type and nature of errors.
Disconnected and reconnected
- The database connection may be terminated for various reasons, and the application must have a disconnection reconnection mechanism.
- It can be used
SELECT 1
as a heartbeat packet query to detect the presence of messages on the connection and keep it alive periodically.
Online service application code prohibits execution of DDL
- It is strictly forbidden to execute DDL in production applications and do not make big news in the application code.
- Exception scenario: Creating new time partitions for partitioned tables can be carefully managed by the application.
- Special exception: Databases used by office systems, such as Gitlab/Jira/Confluence, etc., can grant application DDL permissions.
SELECT statement explicitly specifies column names
- Avoid using it
SELECT *
, orRETURNING
use it in a clause*
. Please use a specific field list and do not return unused fields. When the table structure changes (for example, a new value column), queries that use column wildcards are likely to encounter column mismatch errors. - After the fields of some tables are maintained, the order will change. For example: after
id
upgrading the INTEGER primary key toBIGINT
,id
the column order will be the last column. This problem can only be fixed during maintenance and migration. R&D developers should resist the compulsion to adjust the column order and explicitly specify the column order in the SELECT statement. - Exception: Wildcards are allowed when a stored procedure returns a specific table row type.
Disable online query full table scan
- Exceptions: constant minimal table, extremely low-frequency operations, table/return result set is very small (within 100 records/100 KB).
- Using negative operators such as on the first-level filter condition will result in a full table scan and must be
!=
avoided .<>
Disallow long waits in transactions
- Transactions must be committed or rolled back as soon as possible after being started. Transactions that exceed 10 minutes
IDEL IN Transaction
will be forcibly killed. - Applications should enable AutoCommit to avoid
BEGIN
unpairedROLLBACK
or unpaired applications laterCOMMIT
. - Try to use the transaction infrastructure provided by the standard library, and do not control transactions manually unless absolutely necessary.
Things to note when using count
count(*)
It is the standard syntax for counting rows and has nothing to do with null values.count(col)
The count is the number of non-null recordscol
in the column . NULL values in this column will not be counted.count(distinct col)
Whencol
deduplicating columns and counting them, null values are also ignored, that is, only the number of non-null distinct values is counted.count((col1, col2))
When counting multiple columns, even if the columns to be counted are all empty, they will still be counted.(NULL,NULL)
This is valid.a(distinct (col1, col2))
For multi-column deduplication counting, even if the columns to be counted are all empty, they will be counted,(NULL,NULL)
which is effective.
Things to note when using aggregate functions
- All
count
aggregate functions exceptNULL
Butcount(col)
in this case it will be returned0
as an exception. - If returning null from an aggregate function is not expected, use
coalesce
to set a default value.
Handle null values with caution
-
Clearly distinguish between zero values and null values. Use null values
IS NULL
for equivalence judgment, and use regular=
operators for zero values for equivalence judgment. -
When a null value is used as a function input parameter, it should have a type modifier, otherwise the overloaded function will not be able to identify which one to use.
-
Pay attention to the null value comparison logic: the result of any comparison operation involving null values is
unknown
you need to pay attention tonull
the logic involved in Boolean operations:and
:TRUE or NULL
Will return due to logical short circuitTRUE
.or
:FALSE and NULL
Will return due to logical short circuitFALSE
- In other cases, as long as the operand appears
NULL
, the result isNULL
-
The result of logical judgment between null value and any value is null value, for example,
NULL=NULL
the return result isNULL
notTRUE/FALSE
. -
For equality comparisons involving null values and non-null values, please use ``IS DISTINCT FROM
-
NULL values and aggregate functions: When all input values are NULL, the aggregate function returns NULL.
Note that the serial number is empty
- When using
Serial
types,INSERT
,UPSERT
and other operations will consume sequence numbers, and this consumption will not be rolled back when the transaction fails. - When using an integer
INTEGER
as the primary key and the table has frequent insertion conflicts, you need to pay attention to the problem of integer overflow.
The cursor must be closed promptly after use
Repeated queries using prepared statements
- Prepared Statements should be used for repeated queries to eliminate the CPU overhead of database hard parsing. Pgbouncer versions earlier than 1.21 cannot support this feature in transaction pooling mode, please pay special attention.
- Prepared statements will modify the connection status. Please pay attention to the impact of the connection pool on prepared statements.
Choose the appropriate transaction isolation level
- The default isolation level is read committed , which is suitable for most simple read and write transactions. For ordinary transactions, choose the lowest isolation level that meets the requirements.
- For write transactions that require transaction-level consistent snapshots, use the Repeatable Read isolation level.
- For write transactions that have strict requirements on correctness (such as money-related), use the serializable isolation level.
- When a concurrency conflict occurs between the RR and SR isolation levels, the application should actively retry depending on the error type.
rh 09 Do not use count when judging the existence of a result.
- It is faster than Count to
SELECT 1 FROM tbl WHERE xxx LIMIT 1
judge whether there are columns that meet the conditions. SELECT exists(SELECT * FROM tbl WHERE xxx LIMIT 1)
The existence result can be converted to a Boolean value using .
Use the RETURNING clause to retrieve the modified results in one go
RETURNING
The clause can be used after theINSERT
,UPDATE
,DELETE
statement to effectively reduce the number of database interactions.
Use UPSERT to simplify logic
- When the business has an insert-failure-update sequence of operations, consider using
UPSERT
substitution.
Use advisory locks to deal with hotspot concurrency .
- For extremely high-frequency concurrent writes (spike) of single-row records, advisory locks should be used to lock the record ID.
- If high concurrency contention can be resolved at the application level, don’t do it at the database level.
Optimize IN operator
- Use
EXISTS
clause instead ofIN
operator for better performance. - Use
=ANY(ARRAY[1,2,3,4])
insteadIN (1,2,3,4)
for better results. - Control the size of the parameter list. In principle, it should not exceed 10,000. If it exceeds, you can consider batch processing.
It is not recommended to use left fuzzy search
- Left fuzzy search
WHERE col LIKE '%xxx'
cannot make full use of B-tree index. If necessary,reverse
expression function index can be used.
Use arrays instead of temporary tables
- Consider using an array instead of a temporary table, for example when obtaining corresponding records for a series of IDs.
=ANY(ARRAY[1,2,3])
Better than temporary table JOIN.
0x04 Administration Convention
Use Pigsty to build PostgreSQL cluster and infrastructure
- The production environment uses the Pigsty trunk version uniformly, and deploys the database on x86_64 machines and CentOS 7.9 / RockyLinux 8.8 operating systems.
pigsty.yml
Configuration files usually contain highly sensitive and important confidential information. Git should be used for version management and access permissions should be strictly controlled.files/pki
The CA private key and other certificates generated within the system should be properly kept, regularly backed up to a secure area for storage and archiving, and access permissions should be strictly controlled.- All passwords are not allowed to use default values, and make sure they have been changed to new passwords with sufficient strength.
- Strictly control access rights to management nodes and configuration code warehouses, and only allow DBA login and access.
Monitoring system is a must
- Any deployment must have a monitoring system, and the production environment uses at least two sets of Infra nodes to provide redundancy.
Properly plan the cluster architecture according to needs
- Any production database cluster managed by a DBA must have at least one online slave database for online failover.
- The template is used by default
oltp
, the analytical database usesolap
the template, the financial database usescrit
the template, and the micro virtual machine (within four cores) usestiny
the template. - For businesses whose annual data volume exceeds 1TB, or for clusters whose write TPS exceeds 30,000 to 50,000, you can consider building a horizontal sharding cluster.
Configure cluster high availability using Patroni and Etcd
- The production database cluster uses Patroni as the high-availability component and etcd as the DCS.
etcd
Use a dedicated virtual machine cluster, with 3 to 5 nodes, strictly scattered and distributed on different cabinets.- Patroni Failsafe mode must be turned on to ensure that the cluster main library can continue to work when etcd fails.
Configure cluster PITR using pgBackRest and MinIO
- The production database cluster uses pgBackRest as the backup recovery/PITR solution and MinIO as the backup storage warehouse.
- MinIO uses a multi-node multi-disk cluster, and can also use S3/OSS/COS services instead. Password encryption must be set for cold backup.
- All database clusters perform a local full backup every day, retain the backup and WAL of the last week, and save a full backup every other month.
- When a WAL archiving error occurs, you should check the backup warehouse and troubleshoot the problem in time.
Core business database configuration considerations
- The core business cluster needs to configure at least two online slave libraries, one of which is a dedicated offline query instance.
- The core business cluster needs to build a delayed slave cluster with a 24-hour delay for emergency data recovery.
- Core business clusters usually use asynchronous submission, while those related to money use synchronous submission.
Financial database configuration considerations
- The financial database cluster requires at least two online slave databases, one of which is a dedicated synchronization Standby instance, and Standby service access is enabled.
- Money-related libraries must use
crit
templates with RPO = 0, enable synchronous submission to ensure zero data loss, and enable Watchdog as appropriate. - Money-related libraries must be forced to turn on data checksums and, if appropriate, turn on full DML logs.
Use reasonable character encoding and localization configuration
- Character encoding must be used
UTF8
, any other character encoding is strictly prohibited. - Must be used
C
asLC_COLLATE
the default collation, any special requirements must be explicitly specified in the DDL/query clause to implement. - Character set
LC_CTYPE
is used by defaulten_US.UTF8
, some extensions rely on character set information to work properly, such aspg_trgm
.
Business database management considerations
- Multiple different databases are allowed to be created in the same cluster, and Ansible scripts must be used to create new business databases.
- All business databases must exist synchronously in the Pgbouncer connection pool.
Business user management considerations
- Different businesses/services must use different database users, and Ansible scripts must be used to create new business users.
- All production business users must be synchronized in the user list file of the Pgbouncer connection pool.
- Individual users should set a password with a default validity period of 90 days and change it regularly.
- Individual users are only allowed to access authorized cluster offline instances or slave
pg_offline_query
libraries with from the springboard machine.
Notes on extension management
yum/apt
When installing a new extension, you must first install the corresponding major version of the extension binary package in all instances of the cluster .- Before enabling the extension, you need to confirm whether the extension needs to be added
shared_preload_libraries
. If necessary, a rolling restart should be arranged. - Note that
shared_preload_libraries
in order of priority,citus
,timescaledb
,pgml
are usually placed first. pg_stat_statements
andauto_explain
are required plugins and must be enabled in all clusters.- Install extensions uniformly using , and create them
dbsu
in the business database .CREATE EXTENSION
Database XID and age considerations
- Pay attention to the age of the database and tables to avoid running out of XID transaction numbers. If the usage exceeds 20%, you should pay attention; if it exceeds 50%, you should intervene immediately.
- When processing XID, execute the table one by one in order of age from largest to smallest
VACUUM FREEZE
.
Database table and index expansion considerations
- Pay attention to the expansion rate of tables and indexes to avoid index performance degradation, and use
pg_repack
online processing to handle table/index expansion problems. - Generally speaking, indexes and tables whose expansion rate exceeds 50% can be considered for reorganization.
- When dealing with table expansion exceeding 100GB, you should pay special attention and choose business low times.
Database restart considerations
- Before restarting the database, execute it
CHECKPOINT
twice to force dirty pages to be flushed, which can speed up the restart process. - Before restarting the database, perform
pg_ctl reload
reload configuration to confirm that the configuration file is available normally. - To restart the database, use
pg_ctl restart
patronictl or patronictl to restart the entire cluster at the same time. - Use
kill -9
to shut down any database process is strictly prohibited.
Replication latency considerations
- Monitor replication latency, especially when using replication slots.
New slave database data warm-up
- When adding a new slave database instance to a high-load business cluster, the new database instance should be warmed up, and the HAProxy instance weight should be gradually adjusted and applied in gradients: 4, 8, 16, 32, 64, and 100.
pg_prewarm
Hot data can be loaded into memory using .
Database publishing process
- Online database release requires several evaluation stages: R&D self-test, supervisor review, QA review (optional), and DBA review.
- During the R&D self-test phase, R&D should ensure that changes are executed correctly in the development and pre-release environments.
- If a new table is created, the record order magnitude, daily data increment estimate, and read and write throughput magnitude estimate should be given.
- If it is a new function, the average execution time and extreme case descriptions should be given.
- If it is a mode change, all upstream and downstream dependencies must be sorted out.
- If it is a data change and record revision, a rollback SQL must be given.
- The R&D Team Leader needs to evaluate and review changes and be responsible for the content of the changes.
- The DBA evaluates and reviews the form and impact of the release, puts forward review opinions, and calls back or implements them uniformly.
Data work order format
- Database changes are made through the platform, with one work order for each change.
- The title is clear: A certain business needs
xx
to perform an action in the databaseyy
. - The goal is clear: what operations need to be performed on which instances in each step, and how to verify the results.
- Rollback plan: Any changes need to provide a rollback plan, and new ones also need to provide a cleanup script.
- Any changes need to be recorded and archived, and have complete approval records. They are first approved by the R&D superior TL Review and then approved by the DBA.
Database change release considerations
- Using a unified release window, changes of the day will be collected uniformly at 16:00 every day and executed sequentially; requirements confirmed by TL after 16:00 will be postponed to the next day. Database release is not allowed after 19:00. For emergency releases, please ask TL to make special instructions and send a copy to the CTO for approval before execution.
- Database DDL changes and DML changes are uniformly
dbuser_dba
executed remotely using the administrator user to ensure that the default permissions work properly. - When the business administrator executes DDL by himself, he must
SET ROLE dbrole_admin
first execute the release to ensure the default permissions. - Any changes require a rollback plan before they can be executed, and very few operations that cannot be rolled back need to be handled with special caution (such as enumeration of value additions)
- Database changes use
psql
command line tools, connect to the cluster main database to execute, use\i
execution scripts or\e
manual execution in batches.
Things to note when deleting tables
- The production data table
DROP
should be renamed first and allowed to cool for 1 to 3 days to ensure that it is not accessed before being removed. - When cleaning the table, you must sort out all dependencies, including directly and indirectly dependent objects: triggers, foreign key references, etc.
- The temporary table to be deleted is usually placed in
trash
Schema andALTER TABLE SET SCHEMA
the schema name is modified. - In high-load business clusters, when removing particularly large tables (> 100G), select business valleys to avoid preempting I/O.
Things to note when creating and deleting indexes
- You must use
CREATE INDEX CONCURRENTLY
concurrent index creation andDROP INDEX CONCURRENTLY
concurrent index removal. - When rebuilding an index, always create a new index first, then remove the old index, and modify the new index name to be consistent with the old index.
- After index creation fails, you should remove
INVALID
the index in time. After modifying the index, useanalyze
to re-collect statistical data on the table. - When the business is idle, you can enable parallel index creation and set it
maintenance_work_mem
to a larger value to speed up index creation.
Make schema changes carefully
- Try to avoid full table rewrite changes as much as possible. Full table rewrite is allowed for tables within 1GB. The DBA should notify all relevant business parties when the changes are made.
- When adding new columns to an existing table, you should avoid using functions in default values
VOLATILE
to avoid a full table rewrite. - When changing a column type, all functions and views that depend on that type should be rebuilt if necessary, and
ANALYZE
statistics should be refreshed.
Control the batch size of data writing
- Large batch write operations should be divided into small batches to avoid generating a large amount of WAL or occupying I/O at one time.
- After a large batch
UPDATE
is executed,VACUUM
the space occupied by dead tuples is reclaimed. - The essence of executing DDL statements is to modify the system directory, and it is also necessary to control the number of DDL statements in a batch.
Data loading considerations
- Use
COPY
load data, which can be executed in parallel if necessary. - You can temporarily shut down before loading data
autovacuum
, disable triggers as needed, and create constraints and indexes after loading. - Turn it up
maintenance_work_mem
, increase itmax_wal_size
. - Executed after loading is complete
vacuum verbose analyze table
.
Notes on database migration and major version upgrades
- The production environment uniformly uses standard migration to build script logic, and realizes requirements such as non-stop cluster migration and major version upgrades through blue-green deployment.
- For clusters that do not require downtime, you can use
pg_dump | psql
logical export and import to stop and upgrade.
Data Accidental Deletion/Accidental Update Process
- After an accident occurs, immediately assess whether it is necessary to stop the operation to stop bleeding, assess the scale of the impact, and decide on treatment methods.
- If there is a way to recover on the R&D side, priority will be given to the R&D team to make corrections through SQL publishing; otherwise, use
pageinspect
andpg_dirtyread
to rescue data from the bad table. - If there is a delayed slave library, extract data from the delayed slave library for repair. First, confirm the time point of accidental deletion, and advance the delay to extract data from the database to the XID.
- A large area was accidentally deleted and written. After communicating with the business and agreeing, perform an in-place PITR rollback to a specific time.
Data corruption processing process
- Confirm whether the slave database data can be used for recovery. If the slave database data is intact, you can switchover to the slave database first.
- Temporarily shut down
auto_vacuum
, locate the root cause of the error, replace the failed disk and add a new slave database. - If the system directory is damaged, or use to
pg_filedump
recover data from table binaries. - If the CLOG is damaged, use
dd
to generate a fake submission record.
Things to note when the database connection is full
- When the connection is full (avalanche), immediately use the kill connection query to cure the symptoms and stop the loss:
pg_cancel_backend
orpg_terminate_backend
. - Use to
pg_terminate_backend
abort all normal backend processes,psql
\watch 1
starting with once per second ( ). And confirm the connection status from the monitoring system. If the accumulation continues, continue to increase the execution frequency of the connection killing query, for example, once every 0.1 seconds until there is no more accumulation. - After confirming that the bleeding has stopped from the monitoring system, try to stop the killing connection. If the accumulation reappears, immediately resume the killing connection. Immediately analyze the root cause and perform corresponding processing (upgrade, limit current, add index, etc.)
PostgreSQL, The most successful database
The StackOverflow 2023 Survey, featuring feedback from 90K developers across 185 countries, is out. PostgreSQL topped all three survey categories (used, loved, and wanted), earning its title as the undisputed “Decathlete Database” – it’s hailed as the “Linux of Database”!
What makes a database “successful”? It’s a mix of features, quality, security, performance, and cost, but success is mainly about adoption and legacy. The size, preference, and needs of its user base are what truly shape its ecosystem’s prosperity. StackOverflow’s annual surveys for seven years have provided a window into tech trends.
PostgreSQL is now the world’s most popular database.
PostgreSQL is developers’ favorite database!
PostgreSQL sees the highest demand among users!
Popularity, the used
reflects the past, the loved
indicates the present, and the wanted
suggests the future. These metrics vividly showcase the vitality of a technology. PostgreSQL stands strong in both stock and potential, unlikely to be rivaled soon.
As a dedicated user, community member, expert, evangelist, and contributor to PostgreSQL, witnessing this moment is profoundly moving. Let’s delve into the “Why” and “What” behind this phenomenon.
Source: Community Survey
Developers define the success of databases, and StackOverflow’s survey, with popularity, love, and demand metrics, captures this directly.
“Which database environments have you done extensive development work in over the past year, and which do you want to work in over the next year? If you both worked with the database and want to continue to do so, please check both boxes in that row.”
Each database in the survey had two checkboxes: one for current use, marking the user as “Used,” and one for future interest, marking them as “Wanted.” Those who checked both were labeled as “Loved/Admired.”
The percentage of “Used” respondents represents popularity or usage rate, shown as a bar chart, while “Wanted” indicates demand or desire, marked with blue dots. “Loved/Admired” shows as red dots, indicating love or reputation. In 2023, PostgreSQL outstripped MySQL in popularity, becoming the world’s most popular database, and led by a wide margin in demand and reputation.
Reviewing seven years of data and plotting the top 10 databases on a scatter chart of popularity vs. net love percentage (2*love% - 100), we gain insights into the database field’s evolution and sense of scale.
X: Popularity, Y: Net Love Index (2 * loved - 100)
The 2023 snapshot shows PostgreSQL in the top right, popular and loved, while MySQL, popular yet less favored, sits in the bottom right. Redis, moderately popular but much loved, is in the top left, and Oracle, neither popular nor loved, is in the bottom left. In the middle lie SQLite, MongoDB, and SQL Server.
Trends indicate PostgreSQL’s growing popularity and love; MySQL’s love remains flat with falling popularity. Redis and SQLite are progressing, MongoDB is peaking and declining, and the commercial RDBMSs SQL Server and Oracle are on a downward trend.
The takeaway: PostgreSQL’s standing in the database realm, akin to Linux in server OS, seems unshakeable for the foreseeable future.
Historical Accumulation: Popularity
PostgreSQL — The world’s most popular database
一项技术使用者占总体的比例,就是流行度。它的含义是:过去一年有多少比例的用户使用了这项技术。流行度代表过去一年的积累使用,是存量指标,也是最核心的事实指标。
在 2023 年, “最先进” PostgreSQL 在所有开发者中以 45.6% 的使用率,首次超过“最流行”数据库 MySQL 41.1%,领先 4.5%,使用率是第二名 MySQL 的1.1倍。 对于专业开发者(约占总样本的3/4)来说,PostgreSQL 的使用率在去年(2022)就已经超过 MySQL 了,以 46.5% vs 45.7% 领先0.8个百分点; 在 2023 年,这一差距进一步拉大到 49.1% vs 40.6,领先 8.5% —— 换句话说,专业开发者中,PostgreSQL 的使用率已经是 MySQL 的 1.2 倍了。
过去几年,MySQL 一直霸占着数据库流行榜的榜首,洋洋得意地打起了“世界上最流行的开源关系型数据库” 这一旗号。 不过这次,“最流行” 的桂冠真的要让给 PostgreSQL 了。在流行度上,其他数据库和 PostgreSQL / MySQL 比根本就不是一个重量级,自然就更不用说了。
更重要的的是变化趋势:在长期列入排名的十几款头部数据库中,只有 PostgreSQL 的流行度是持续上升的,保持着高歌猛进的增长势头,而其他所有的数据库使用率都在下行。 此消彼长,随着时间的推移,PostgreSQL 与其他数据库的流行度差距只会进一步拉大 —— 因此在相当长的一段时间内,恐怕是看不到有任何挑战者能撼动 PostgreSQL 现在的位置了。
值得一提的是,“国产数据库”的标杆 ”TiDB“ 这次也加入到 StackOverflow 排行榜中,并以 0.2% 的使用率,拿到了末位第 32 名的名次。
流行度反映的是当下数据库的规模势能,而喜爱度反映的是未来数据库的增长潜能。
Popularity is the percentage of total users who have used a technology in the past year. It reflects the accumulated usage over the past year and is a core metric of factual significance.
In 2023, PostgreSQL, branded as the “most advanced,” surpassed the “most popular” database MySQL with a usage rate of 45.6%, leading by 4.5% and reaching 1.1 times the usage rate of MySQL at 41.1%. Among professional developers (about three-quarters of the sample), PostgreSQL had already overtaken MySQL in 2022, with a 0.8 percentage point lead (46.5% vs 45.7%); this gap widened in 2023 to 49.1% vs 40.6%, or 1.2 times the usage rate among professional developers.
Over the past years, MySQL enjoyed the top spot in database popularity, proudly claiming the title of the “world’s most popular open-source relational database.” However, PostgreSQL has now claimed the crown. Compared to PostgreSQL and MySQL, other databases are not in the same league in terms of popularity.
The key trend to note is that among the top-ranked databases, only PostgreSQL has shown a consistent increase in popularity, demonstrating strong growth momentum, while all other databases have seen a decline in usage. As time progresses, the gap in popularity between PostgreSQL and other databases will likely widen, making it hard for any challenger to displace PostgreSQL in the near future.
Notably, the “domestic database” TiDB has entered the StackOverflow rankings for the first time, securing the 32nd spot with a 0.2% usage rate.
Popularity reflects the current scale and potential of a database, while love indicates its future growth potential.
Current Momentum: Love
PostgreSQL — The database developers love the most
Love or admiration is a measure of the percentage of users who are willing to continue using a technology, acting as an annual “retention rate” metric that reflects the user’s opinion and evaluation of the technology.
In 2023, PostgreSQL retained its title as the most loved database by developers. While Redis had been the favorite in previous years, PostgreSQL overtook Redis in 2022, becoming the top choice. PostgreSQL and Redis have maintained close reputation scores (around 70%), significantly outpacing other contenders.
In the 2022 PostgreSQL community survey, the majority of existing PostgreSQL users reported increased usage and deeper engagement, highlighting the stability of its core user base.
Redis, known for its simplicity and ease of use as a data structure cache server, is often paired with the relational database PostgreSQL, enjoying considerable popularity (20%, ranking sixth) among developers. Cross-analysis shows a strong connection between the two: 86% of Redis users are interested in using PostgreSQL, and 30% of PostgreSQL users want to use Redis. Other databases with positive reviews include SQLite, MongoDB, and SQL Server. MySQL and ElasticSearch receive mixed feedback, hovering around the 50% mark. The least favored databases include Access, IBM DB2, CouchDB, Couchbase, and Oracle.
Not all potential can be converted into kinetic energy. While user affection is significant, it doesn’t always translate into action, leading to the third metric of interest – demand.
Future Trends: Demand
PostgreSQL - The Most Wanted Database
The demand rate, or the level of desire, represents the percentage of users who will actually opt for a technology in the coming year. PostgreSQL stands out in demand/desire, significantly outpacing other databases with a 42.3% rate for the second consecutive year, showing relentless growth and widening the gap with its competitors.
In 2023, some databases saw notable demand increases, likely driven by the surge in large language model AI, spearheaded by OpenAI’s ChatGPT. This demand for intelligence has, in turn, fueled the need for robust data infrastructure. A decade ago, support for NoSQL features like JSONB/GIN laid the groundwork for PostgreSQL’s explosive growth during the internet boom. Today, the introduction of pgvector, the first vector extension built on a mature database, grants PostgreSQL a ticket into the AI era, setting the stage for growth in the next decade.
But Why?
PostgreSQL leads in demand, usage, and popularity, with the right mix of timing, location, and human support, making it arguably the most successful database with no visible challengers in the near future. The secret to its success lies in its slogan: “The World’s Most Advanced Open Source Relational Database.”
Relational databases are so prevalent and crucial that they might dwarf the combined significance of other types like key-value, document, search engine, time-series, graph, and vector databases. Typically, “database” implicitly refers to “relational database,” where no other category dares claim mainstream status. Last year’s “Why PostgreSQL Will Be the Most Successful Database?” delves into the competitive landscape of relational databases—a tripartite dominance. Excluding Microsoft’s relatively isolated SQL Server, the database scene, currently in a phase of consolidation, has three key players rooted in WireProtocol: Oracle, MySQL, and PostgreSQL, mirroring a “Three Kingdoms” saga in the relational database realm.
Oracle/MySQL are waning, while PostgreSQL is thriving. Oracle is an established commercial DB with deep tech history, rich features, and strong support, favored by well-funded, risk-averse enterprises, especially in finance. Yet, it’s pricey and infamous for litigious practices. MS SQL Server shares similar traits with Oracle. Commercial databases are facing a slow decline due to the open-source wave.
MySQL, popular yet beleaguered, lags in stringent transaction processing and data analysis compared to PostgreSQL. Its agile development approach is also outperformed by NoSQL alternatives. Oracle’s dominance, sibling rivalry with MariaDB, and competition from NewSQL players like TiDB/OB contribute to its decline.
Oracle, no doubt skilled, lacks integrity, hence “talented but unprincipled.” MySQL, despite its open-source merit, is limited in capability and sophistication, hence “limited talent, weak ethics.” PostgreSQL, embodying both capability and integrity, aligns with the open-source rise, popular demand, and advanced stability, epitomizing “talented and principled.”
Open Source & Advanced
The primary reasons for choosing PostgreSQL, as reflected in the TimescaleDB community survey, are its open-source nature and stability. Open-source implies free use, potential for modification, no vendor lock-in, and no “chokepoint” issues. Stability means reliable, consistent performance with a proven track record in large-scale production environments. Experienced developers value these attributes highly.
Broadly, aspects like extensibility, ecosystem, community, and protocols fall under “open-source.” Stability, ACID compliance, SQL support, scalability, and availability define “advanced.” These resonate with PostgreSQL’s slogan: “The world’s most advanced open source relational database.”
The Virtue of Open Source
powered by developers worldwide. Friendly BSD license, thriving ecosystem, extensive expansion. A robust Oracle alternative, leading the charge.
What is “virtue”? It’s the manifestation of “the way,” and this way is open source. PostgreSQL stands as a venerable giant among open-source projects, epitomizing global collaborative success.
Back in the day, developing software/information services required exorbitantly priced commercial databases. Just the software licensing fees could hit six or seven figures, not to mention similar costs for hardware and service subscriptions. Oracle’s licensing fee per CPU core could reach hundreds of thousands annually, prompting even giants like Alibaba to seek IOE alternatives. The rise of open-source databases like PostgreSQL and MySQL offered a fresh choice.
Open-source databases, free of charge, spurred an industry revolution: from tens of thousands per core per month for commercial licenses to a mere 20 bucks per core per month for hardware. Databases became accessible to regular businesses, enabling the provision of free information services.
Open source has been monumental: the history of the internet is a history of open-source software. The prosperity of the IT industry and the plethora of free information services owe much to open-source initiatives. Open source represents a form of successful Communism in software, with the industry’s core means of production becoming communal property, available to developers worldwide as needed. Developers contribute according to their abilities, embracing the ethos of mutual benefit.
An open-source programmer’s work encapsulates the intellect of countless top-tier developers. Programmers command high salaries because they are not mere laborers but contractors orchestrating software and hardware. They own the core means of production: software from the public domain and readily available server hardware. Thus, a few skilled engineers can swiftly tackle domain-specific problems leveraging the open-source ecosystem.
Open source synergizes community efforts, drastically reducing redundancy and propelling technical advancements at an astonishing pace. Its momentum, now unstoppable, continues to grow like a snowball. Open source dominates foundational software, and the industry now views insular development or so-called “self-reliance” in software, especially in foundational aspects, as a colossal joke.
For PostgreSQL, open source is its strongest asset against Oracle.
Oracle is advanced, but PostgreSQL holds its own. It’s the most Oracle-compatible open-source database, natively supporting 85% of Oracle’s features, with specialized distributions reaching 96% compatibility. However, the real game-changer is cost: PG’s open-source nature and significant cost advantage provide a substantial ecological niche. It doesn’t need to surpass Oracle in features; being “90% right at a fraction of the cost” is enough to outcompete Oracle.
PostgreSQL is like an open-source “Oracle,” the only real threat to Oracle’s dominance. As a leader in the “de-Oracle” movement, PG has spawned numerous “domestically controllable” database companies. According to CITIC, 36% of “domestic databases” are based on PG modifications or rebranding, with Huawei’s openGauss and GaussDB as prime examples. Crucially, PostgreSQL uses a BSD-Like license, permitting such adaptations — you can rebrand and sell without deceit. This open attitude is something Oracle-acquired, GPL-licensed MySQL can’t match.
The advanced in Talent
The talent of PG lies in its advancement. Specializing in multiple areas, PostgreSQL offers a full-stack, multi-model approach: “Self-managed, autonomous driving temporal-geospatial AI vector distributed document graph with full-text search, programmable hyper-converged, federated stream-batch processing in a single HTAP Serverless full-stack platform database”, covering almost all database needs with a single component.
PostgreSQL is not just a traditional OLTP “relational database” but a multi-modal database. For SMEs, a single PostgreSQL component can cover the vast majority of their data needs: OLTP, OLAP, time-series, GIS, tokenization and full-text search, JSON/XML documents, NoSQL features, graphs, vectors, and more.
Emperor of Databases — Self-managed, autonomous driving temporal-geospatial AI vector distributed document graph with full-text search, programmable hyper-converged, federated stream-batch processing in a single HTAP Serverless full-stack platform database.
The superiority of PostgreSQL is not only in its acclaimed kernel stability but also in its powerful extensibility. The plugin system transforms PostgreSQL from a single-threaded evolving database kernel to a platform with countless parallel-evolving extensions, exploring all possibilities simultaneously like quantum computing. PostgreSQL is omnipresent in every niche of data processing.
For instance, PostGIS for geospatial databases, TimescaleDB for time-series, Citus for distributed/columnar/HTAP databases, PGVector for AI vector databases, AGE for graph databases, PipelineDB for stream processing, and the ultimate trick — using Foreign Data Wrappers (FDW) for unified SQL access to all heterogeneous external databases. Thus, PG is a true full-stack database platform, far more advanced than a simple OLTP system like MySQL.
Within a significant scale, PostgreSQL can play multiple roles with a single component, greatly reducing project complexity and cost. Remember, designing for unneeded scale is futile and an example of premature optimization. If one technology can meet all needs, it’s the best choice rather than reimplementing it with multiple components.
Taking Tantan as an example, with 250 million TPS and 200 TB of unique TP data, a single PostgreSQL selection remains stable and reliable, covering a wide range of functions beyond its primary OLTP role, including caching, OLAP, batch processing, and even message queuing. However, as the user base approaches tens of millions daily active users, these additional functions will eventually need to be handled by dedicated components.
PostgreSQL’s advancement is also evident in its thriving ecosystem. Centered around the database kernel, there are specialized variants and “higher-level databases” built on it, like Greenplum, Supabase (an open-source alternative to Firebase), and the specialized graph database edgedb, among others. There are various open-source/commercial/cloud distributions integrating tools, like different RDS versions and the plug-and-play Pigsty; horizontally, there are even powerful mimetic components/versions emulating other databases without changing client drivers, like babelfish for SQL Server, FerretDB for MongoDB, and EnterpriseDB/IvorySQL for Oracle compatibility.
PostgreSQL’s advanced features are its core competitive strength against MySQL, another open-source relational database.
Advancement is PostgreSQL’s core competitive edge over MySQL.
MySQL’s slogan is “the world’s most popular open-source relational database,” characterized by being rough, fierce, and fast, catering to internet companies. These companies prioritize simplicity (mainly CRUD), data consistency and accuracy less than traditional sectors like banking, and can tolerate data inaccuracies over service downtime, unlike industries that cannot afford financial discrepancies.
However, times change, and PostgreSQL has rapidly advanced, surpassing MySQL in speed and robustness, leaving only “roughness” as MySQL’s remaining trait.
MySQL allows partial transaction commits by default, shocked
MySQL allows partial transaction commits by default, revealing a gap between “popular” and “advanced.” Popularity fades with obsolescence, while advancement gains popularity through innovation. In times of change, without advanced features, popularity is fleeting. Research shows MySQL’s pride in “popularity” cannot stand against PostgreSQL’s “advanced” superiority.
Advancement and open-source are PostgreSQL’s success secrets. While Oracle is advanced and MySQL is open-source, PostgreSQL boasts both. With the right conditions, success is inevitable.
Looking Ahead
The PostgreSQL database kernel’s role in the database ecosystem mirrors the Linux kernel’s in the operating system domain. For databases, particularly OLTP, the battle of kernels has settled—PostgreSQL is now a perfect engine.
However, users need more than an engine; they need the complete car, driving capabilities, and traffic services. The database competition has shifted from software to Software enabled Service—complete database distributions and services. The race for PostgreSQL-based distributions is just beginning. Who will be the PostgreSQL equivalent of Debian, RedHat, or Ubuntu?
This is why we created Pigsty — to develop an battery-included, open-source, local-first PostgreSQL distribution, making it easy for everyone to access and utilize a quality database service. Due to space limits, the detailed story is for another time.
参考阅读
2022-08 《PostgreSQL 到底有多强?》
2022-07 《为什么PostgreSQL是最成功的数据库?》
2022-06 《StackOverflow 2022数据库年度调查》
2021-05 《Why PostgreSQL Rocks!》
2021-05 《为什么说PostgreSQL前途无量?》
2018 《PostgreSQL 好处都有啥?》
2023 《更好的开源RDS替代:Pigsty》
2023 《StackOverflow 7年调研数据跟踪》
2022 《PostgreSQL 社区状态调查报告 2022》
Releases
Pigsty v3.0: Extension Exploding & Plugable Kernels
Get started with:
curl -fsSL https://repo.pigsty.io/get | bash
cd ~/pigsty; ./bootstrap; ./configure; ./install.yml
Highlight Features
Extension Exploding:
Pigsty now has an unprecedented 336 available extensions for PostgreSQL. This includes 121 extension RPM packages and 133 DEB packages, surpassing the total number of extensions provided by the PGDG official repository (135 RPM/109 DEB). Pigsty has ported unique PG extensions from the EL/DEB system to each other, achieving a great alignment of extension ecosystems between the two major distributions.
A crude list of the extension ecosystem is as follows:
- timescaledb periods temporal_tables emaj table_version pg_cron pg_later pg_background pg_timetable
- postgis pgrouting pointcloud pg_h3 q3c ogr_fdw geoip #pg_geohash #mobilitydb
- pgvector pgvectorscale pg_vectorize pg_similarity pg_tiktoken pgml #smlar
- pg_search pg_bigm zhparser hunspell
- hydra pg_lakehouse pg_duckdb duckdb_fdw pg_fkpart pg_partman plproxy #pg_strom citus
- pg_hint_plan age hll rum pg_graphql pg_jsonschema jsquery index_advisor hypopg imgsmlr pg_ivm pgmq pgq #rdkit
- pg_tle plv8 pllua plprql pldebugger plpgsql_check plprofiler plsh #pljava plr pgtap faker dbt2
- prefix semver pgunit md5hash asn1oid roaringbitmap pgfaceting pgsphere pg_country pg_currency pgmp numeral pg_rational pguint ip4r timestamp9 chkpass #pg_uri #pgemailaddr #acl #debversion #pg_rrule
- topn pg_gzip pg_http pg_net pg_html5_email_address pgsql_tweaks pg_extra_time pg_timeit count_distinct extra_window_functions first_last_agg tdigest aggs_for_arrays pg_arraymath pg_idkit pg_uuidv7 permuteseq pg_hashids
- sequential_uuids pg_math pg_random pg_base36 pg_base62 floatvec pg_financial pgjwt pg_hashlib shacrypt cryptint pg_ecdsa pgpcre icu_ext envvar url_encode #pg_zstd #aggs_for_vecs #quantile #lower_quantile #pgqr #pg_protobuf
- pg_repack pg_squeeze pg_dirtyread pgfincore pgdd ddlx pg_prioritize pg_checksums pg_readonly safeupdate pg_permissions pgautofailover pg_catcheck preprepare pgcozy pg_orphaned pg_crash pg_cheat_funcs pg_savior table_log pg_fio #pgpool pgagent
- pg_profile pg_show_plans pg_stat_kcache pg_stat_monitor pg_qualstats pg_store_plans pg_track_settings pg_wait_sampling system_stats pg_meta pgnodemx pg_sqlog bgw_replstatus pgmeminfo toastinfo pagevis powa pg_top #pg_statviz #pgexporter_ext #pg_mon
- passwordcheck supautils pgsodium pg_vault anonymizer pg_tde pgsmcrypto pgaudit pgauditlogtofile pg_auth_mon credcheck pgcryptokey pg_jobmon logerrors login_hook set_user pg_snakeoil pgextwlist pg_auditor noset #sslutils
- wrappers multicorn mysql_fdw tds_fdw sqlite_fdw pgbouncer_fdw mongo_fdw redis_fdw pg_redis_pubsub kafka_fdw hdfs_fdw firebird_fdw aws_s3 log_fdw #oracle_fdw #db2_fdw
- orafce pgtt session_variable pg_statement_rollback pg_dbms_metadata pg_dbms_lock pgmemcache #pg_dbms_job #wiltondb
- pglogical pgl_ddl_deploy pg_failover_slots wal2json wal2mongo decoderbufs decoder_raw mimeo pgcopydb pgloader pg_fact_loader pg_bulkload pg_comparator pgimportdoc pgexportdoc #repmgr #slony
- gis-stack rag-stack fdw-stack fts-stack etl-stack feat-stack olap-stack supa-stack stat-stack json-stack
Plugable Kernels:
Pigsty v3 allows you to replace the PostgreSQL kernel, currently supporting Babelfish (SQL Server compatible, with wire protocol emulation), IvorySQL (Oracle compatible), and RAC PolarDB for PostgreSQL. Additionally, self-hosted Supabase is now available on Debian systems. You can emulate MSSQL (via WiltonDB), Oracle (via IvorySQL), Oracle RAC (via PolarDB), MongoDB (via FerretDB), and Firebase (via Supabase) in Pigsty with production-grade PostgreSQL clusters featuring HA, IaC, PITR, and monitoring.
Pro Edition:
We now offer PGSTY Pro, a professional edition that provides value-added services on top of the open-source features. The professional edition includes additional modules: MSSQL, Oracle, Mongo, K8S, Victoria, Kafka, etc., and offers broader support for PG major versions, operating systems, and chip architectures. It provides offline installation packages customized for precise minor versions of all operating systems, and support for legacy systems like EL7, Debian 11, Ubuntu 20.04.
Major Changes
This Pigsty release updates the major version number from 2.x to 3.0, with several significant changes:
- Primary supported operating systems updated to: EL 8 / EL 9 / Debian 12 / Ubuntu 22.04
- EL7 / Debian 11 / Ubuntu 20.04 systems are now deprecated and no longer supported.
- Users needing to run on these systems should consider our subscription service.
- Default to online installation, offline packages are no longer provided to resolve minor OS version compatibility issues.
- The
bootstrap
process will no longer prompt for downloading offline packages, but if/tmp/pkg.tgz
exists, it will still use the offline package automatically. - For offline installation needs, please create offline packages yourself or consider our pro version.
- The
- Unified adjustment of upstream software repositories used by Pigsty, address changes, and GPG signing and verification for all packages.
- Standard repository:
https://repo.pigsty.io/{apt/yum}
- Domestic mirror:
https://repo.pigsty.cc/{apt/yum}
- Standard repository:
- API parameter changes and configuration template changes
- Configuration templates for EL and Debian systems are now consolidated, with differing parameters managed in the
roles/node_id/vars/
directory. - Configuration directory changes, all configuration file templates are now placed in the
conf
directory and categorized intodefault
,dbms
,demo
,build
.
- Configuration templates for EL and Debian systems are now consolidated, with differing parameters managed in the
Docker
is now completely treated as a separate module, and will not be downloaded by default- New beta module:
KAFKA
- New beta module:
KUBE
Other New Features
- Epic enhancement of PG OLAP analysis capabilities: DuckDB 1.0.0, DuckDB FDW, and PG Lakehouse, Hydra have been ported to the Debian system.
- Strengthened PG vector search and full-text search capabilities: Vectorscale provides DiskANN vector indexing, Hunspell dictionary support, pg_search 0.8.6.
- Resolved package build issues for ParadeDB, now available on Debian/Ubuntu.
- All required extensions for Supabase are now available on Debian/Ubuntu, making Supabase self-hostable across all OSes.
- Provided capability for scenario-based pre-configured extension stacks. If you’re unsure which extensions to install, we offer extension recommendation packages (Stacks) tailored for specific application scenarios.
- Created metadata tables, documentation, indexes, and name mappings for all PostgreSQL ecosystem extensions, ensuring alignment and usability for both EL and Debian systems.
- Enhanced
proxy_env
parameter functionality to mitigate DockerHub ban issues, simplifying configuration. - Established a new dedicated software repository offering all extension plugins for versions 12-17, with the PG16 extension repository implemented by default in Pigsty.
- Upgraded existing software repositories, employing standard signing and verification mechanisms to ensure package integrity and security. The APT repository adopts a new standard layout built through
reprepro
. - Provided sandbox environments for 1, 2, 3, 4, 43 nodes:
meta
,dual
,trio
,full
,prod
, and quick configuration templates for 7 major OS Distros. - Add PostgreSQL 17 and pgBouncer 1.23 metrics support in pg_exporter config, adding related dashboard panels.
- Add logs panel for PGSQL Pgbouncer / PGSQL Patroni Dashboard
- Add new playbook
cache.yml
to make offline packages, instead of bashbin/cache
andbin/release-pkg
API Changes
-
New parameter option:
pg_mode
now have several new options:pgsql
: Standard PostgreSQL high availability cluster.citus
: Citus horizontally distributed PostgreSQL native high availability cluster.gpsql
: Monitoring for Greenplum and GP compatible databases (Pro edition).mssql
: Install WiltonDB / Babelfish to provide Microsoft SQL Server compatibility mode for standard PostgreSQL high availability clusters, with wire protocol level support, extensions unavailable.ivory
: Install IvorySQL to provide Oracle compatibility for PostgreSQL high availability clusters, supporting Oracle syntax/data types/functions/stored procedures, extensions unavailable (Pro edition).polar
: Install PolarDB for PostgreSQL (PG RAC) open-source version to support localization database capabilities, extensions unavailable (Pro edition).
-
New parameter option:
pg_mode
now have several new options:pgsql
: Standard PostgreSQL high availability cluster.citus
: Citus horizontally distributed PostgreSQL native high availability cluster.gpsql
: Monitoring for Greenplum and GP compatible databases (Pro edition).mssql
: Install WiltonDB / Babelfish to provide Microsoft SQL Server compatibility mode for standard PostgreSQL high availability clusters, with wire protocol level support, extensions unavailable.ivory
: Install IvorySQL to provide Oracle compatibility for PostgreSQL high availability clusters, supporting Oracle syntax/data types/functions/stored procedures, extensions unavailable (Pro edition).polar
: Install PolarDB for PostgreSQL (PG RAC) open-source version to support localization database capabilities, extensions unavailable (Pro edition).
-
New parameter:
pg_parameters
, used to specify parameters inpostgresql.auto.conf
at the instance level, overriding cluster configurations for personalized settings on different instance members. -
New parameter:
pg_files
, used to specify additional files to be written to the PostgreSQL data directory, to support license feature required by some kernel forks. -
New parameter:
repo_extra_packages
, used to specify additional packages to download, to be used in conjunction withrepo_packages
, facilitating the specification of extension lists unique to OS versions. -
Parameter renaming:
patroni_citus_db
renamed topg_primary_db
, used to specify the primary database in the cluster (used in Citus mode). -
Parameter enhancement: Proxy server configurations in
proxy_env
will be written to the Docker Daemon to address internet access issues, and theconfigure -x
option will automatically write the proxy server configuration of the current environment. -
Parameter enhancement: Allow using
path
item ininfra_portal
entries, to expose local dir as web service rather than proxy to another upstream. -
Parameter enhancement: The
repo_url_packages
inrepo.pigsty.io
will automatically switch torepo.pigsty.cc
when the region is China, addressing internet access issues. Additionally, the downloaded file name can now be specified. -
Parameter enhancement: The
extension
field inpg_databases.extensions
now supports both dictionary and extension name string modes. The dictionary mode offersversion
support, allowing the installation of specific extension versions. -
Parameter enhancement: If the
repo_upstream
parameter is not explicitly overridden, it will extract the default value for the corresponding system fromrpm.yml
ordeb.yml
. -
Parameter enhancement: If the
repo_packages
parameter is not explicitly overridden, it will extract the default value for the corresponding system fromrpm.yml
ordeb.yml
. -
Parameter enhancement: If the
infra_packages
parameter is not explicitly overridden, it will extract the default value for the corresponding system fromrpm.yml
ordeb.yml
. -
Parameter enhancement: If the
node_default_packages
parameter is not explicitly overridden, it will extract the default value for the corresponding system fromrpm.yml
ordeb.yml
. -
Parameter enhancement: The extensions specified in
pg_packages
andpg_extensions
will now perform a lookup and translation from thepg_package_map
defined inrpm.yml
ordeb.yml
. -
Parameter enhancement: Packages specified in
node_packages
andpg_extensions
will be upgraded to the latest version upon installation. The default value innode_packages
is now[openssh-server]
, helping to fix the OpenSSH CVE. -
Parameter enhancement:
pg_dbsu_uid
will automatically adjust to26
(EL) or543
(Debian) based on the operating system type, avoiding manual adjustments. -
pgBouncer Parameter update,
max_prepared_statements = 128
enabled prepared statement support in transaction pooling mode, and setserver_lifetime
to 600. -
Patroni template parameter update, uniformly increase
max_worker_processes
+8 available backend processes, increasemax_wal_senders
andmax_replication_slots
to 50, and increase the OLAP template temporary file size limit to 1/5 of the main disk.
Software Upgrade
The main components of Pigsty are upgraded to the following versions (as of the release time):
- PostgreSQL 16.4, 15.8, 14.13, 13.16, 12.20
- pg_exporter : 0.7.0
- Patroni: 3.3.2
- pgBouncer: 1.23.1
- pgBackRest: 2.53.1
- duckdb : 1.0.0
- etcd : 3.5.15
- pg_timetable: 5.9.0
- ferretdb: 1.23.1
- vip-manager: 2.6.0
- minio: 20240817012454
- mcli: 20240817113350
- grafana : 11.1.4
- loki : 3.1.1
- promtail : 3.0.0
- prometheus : 2.54.0
- pushgateway : 1.9.0
- alertmanager : 0.27.0
- blackbox_exporter : 0.25.0
- nginx_exporter : 1.3.0
- node_exporter : 1.8.2
- keepalived_exporter : 0.7.0
- pgbackrest_exporter 0.18.0
- mysqld_exporter : 0.15.1
- redis_exporter : v1.62.0
- kafka_exporter : 1.8.0
- mongodb_exporter : 0.40.0
- VictoriaMetrics : 1.102.1
- VictoriaLogs : v0.28.0
- sealos: 5.0.0
- vector : 0.40.0
The complete list of PostgreSQL extensions can be found here.
Extension (URL) | Alias | Repo | Version | Category | License | LOAD |
DDL |
TRUST |
RELOC |
Description |
---|---|---|---|---|---|---|---|---|---|---|
timescaledb | timescaledb |
PGDG | 2.15.3 | TIME |
Timescale | Enables scalable inserts and complex queries for time-series data (Apache 2 Edition) | ||||
periods | periods |
PGDG | 1.2 | TIME |
PostgreSQL | Provide Standard SQL functionality for PERIODs and SYSTEM VERSIONING | ||||
temporal_tables | temporal_tables |
PGDG | 1.2.2 | TIME |
BSD 2 | temporal tables | ||||
emaj | emaj |
PGDG | 4.4.0 | TIME |
GPLv3 | E-Maj extension enables fine-grained write logging and time travel on subsets of the database. | ||||
table_version | table_version |
PGDG | 1.10.3 | TIME |
BSD 3 | PostgreSQL table versioning extension | ||||
pg_cron | pg_cron |
PGDG | 1.6 | TIME |
PostgreSQL | Job scheduler for PostgreSQL | ||||
pg_later | pg_later |
PIGSTY | 0.1.1 | TIME |
PostgreSQL | pg_later: Run queries now and get results later | ||||
pg_background | pg_background |
PGDG | 1.0 | TIME |
GPLv3 | Run SQL queries in the background | ||||
pg_timetable | pg_timetable |
PGDG | 5.9.0 | TIME |
PostgreSQL | Advanced scheduling for PostgreSQL | ||||
postgis | postgis |
PGDG | 3.4.2 | GIS |
GPLv2 | PostGIS geometry and geography spatial types and functions | ||||
postgis_topology | postgis |
PGDG | 3.4.2 | GIS |
GPLv2 | PostGIS topology spatial types and functions | ||||
postgis_raster | postgis |
PGDG | 3.4.2 | GIS |
GPLv2 | PostGIS raster types and functions | ||||
postgis_sfcgal | postgis |
PGDG | 3.4.2 | GIS |
GPLv2 | PostGIS SFCGAL functions | ||||
postgis_tiger_geocoder | postgis |
PGDG | 3.4.2 | GIS |
GPLv2 | PostGIS tiger geocoder and reverse geocoder | ||||
address_standardizer | postgis |
PGDG | 3.4.2 | GIS |
GPLv2 | Used to parse an address into constituent elements. Generally used to support geocoding address normalization step. | ||||
address_standardizer_data_us | postgis |
PGDG | 3.4.2 | GIS |
GPLv2 | Address Standardizer US dataset example | ||||
pgrouting | pgrouting |
PGDG | 3.6.0 | GIS |
GPLv2 | pgRouting Extension | ||||
pointcloud | pointcloud |
PIGSTY | 1.2.5 | GIS |
BSD 3 | data type for lidar point clouds | ||||
pointcloud_postgis | pointcloud |
PGDG | 1.2.5 | GIS |
BSD 3 | integration for pointcloud LIDAR data and PostGIS geometry data | ||||
h3 | pg_h3 |
PGDG | 4.1.3 | GIS |
Apache-2.0 | H3 bindings for PostgreSQL | ||||
h3_postgis | pg_h3 |
PGDG | 4.1.3 | GIS |
Apache-2.0 | H3 PostGIS integration | ||||
q3c | q3c |
PIGSTY | 2.0.1 | GIS |
GPLv2 | q3c sky indexing plugin | ||||
ogr_fdw | ogr_fdw |
PGDG | 1.1 | GIS |
MIT | foreign-data wrapper for GIS data access | ||||
geoip | geoip |
PGDG | 0.3.0 | GIS |
BSD 2 | IP-based geolocation query | ||||
pg_geohash | pg_geohash |
PIGSTY | 1.0 | GIS |
MIT | Handle geohash based functionality for spatial coordinates | ||||
mobilitydb | mobilitydb |
PGDG | 1.1.1 | GIS |
GPLv3 | MobilityDB geospatial trajectory data management & analysis platform | ||||
earthdistance | earthdistance |
CONTRIB | 1.1 | GIS |
PostgreSQL | calculate great-circle distances on the surface of the Earth | ||||
vector | pgvector |
PGDG | 0.7.3 | RAG |
PostgreSQL | vector data type and ivfflat and hnsw access methods | ||||
vectorscale | pgvectorscale |
PIGSTY | 0.2.0 | RAG |
PostgreSQL | pgvectorscale: Advanced indexing for vector data | ||||
vectorize | pg_vectorize |
PIGSTY | 0.17.0 | RAG |
PostgreSQL | The simplest way to do vector search on Postgres | ||||
pg_similarity | pg_similarity |
PIGSTY | 1.0 | RAG |
BSD 3 | support similarity queries | ||||
smlar | smlar |
PIGSTY | 1.0 | RAG |
PostgreSQL | Effective similarity search | ||||
pg_tiktoken | pg_tiktoken |
PIGSTY | 0.0.1 | RAG |
Apache-2.0 | pg_tictoken: tiktoken tokenizer for use with OpenAI models in postgres | ||||
pgml | pgml |
PIGSTY | 2.9.3 | RAG |
MIT | PostgresML: Run AL/ML workloads with SQL interface | ||||
pg_search | pg_search |
PIGSTY | 0.9.1 | FTS |
AGPLv3 | pg_search: Full text search for PostgreSQL using BM25 | ||||
pg_bigm | pg_bigm |
PGDG | 1.2 | FTS |
PostgreSQL | create 2-gram (bigram) index for faster full text search. | ||||
zhparser | zhparser |
PIGSTY | 2.2 | FTS |
PostgreSQL | a parser for full-text search of Chinese | ||||
hunspell_cs_cz | hunspell_cs_cz |
PIGSTY | 1.0 | FTS |
PostgreSQL | Czech Hunspell Dictionary | ||||
hunspell_de_de | hunspell_de_de |
PIGSTY | 1.0 | FTS |
PostgreSQL | German Hunspell Dictionary | ||||
hunspell_en_us | hunspell_en_us |
PIGSTY | 1.0 | FTS |
PostgreSQL | en_US Hunspell Dictionary | ||||
hunspell_fr | hunspell_fr |
PIGSTY | 1.0 | FTS |
PostgreSQL | French Hunspell Dictionary | ||||
hunspell_ne_np | hunspell_ne_np |
PIGSTY | 1.0 | FTS |
PostgreSQL | Nepali Hunspell Dictionary | ||||
hunspell_nl_nl | hunspell_nl_nl |
PIGSTY | 1.0 | FTS |
PostgreSQL | Dutch Hunspell Dictionary | ||||
hunspell_nn_no | hunspell_nn_no |
PIGSTY | 1.0 | FTS |
PostgreSQL | Norwegian (norsk) Hunspell Dictionary | ||||
hunspell_pt_pt | hunspell_pt_pt |
PIGSTY | 1.0 | FTS |
PostgreSQL | Portuguese Hunspell Dictionary | ||||
hunspell_ru_ru | hunspell_ru_ru |
PIGSTY | 1.0 | FTS |
PostgreSQL | Russian Hunspell Dictionary | ||||
hunspell_ru_ru_aot | hunspell_ru_ru_aot |
PIGSTY | 1.0 | FTS |
PostgreSQL | Russian Hunspell Dictionary (from AOT.ru group) | ||||
fuzzystrmatch | fuzzystrmatch |
CONTRIB | 1.2 | FTS |
PostgreSQL | determine similarities and distance between strings | ||||
pg_trgm | pg_trgm |
CONTRIB | 1.6 | FTS |
PostgreSQL | text similarity measurement and index searching based on trigrams | ||||
citus | citus |
PGDG | 12.1-1 | OLAP |
AGPLv3 | Distributed PostgreSQL as an extension | ||||
citus_columnar | citus |
PGDG | 11.3-1 | OLAP |
AGPLv3 | Citus columnar storage engine | ||||
columnar | hydra |
PIGSTY | 11.1-11 | OLAP |
AGPLv3 | Hydra Columnar extension | ||||
pg_lakehouse | pg_lakehouse |
PIGSTY | 0.9.0 | OLAP |
AGPLv3 | pg_lakehouse: An analytical query engine for Postgres | ||||
pg_duckdb | pg_duckdb |
PIGSTY | 0.0.1 | OLAP |
MIT | DuckDB Embedded in Postgres | ||||
duckdb_fdw | duckdb_fdw |
PIGSTY | 1.0.0 | OLAP |
MIT | DuckDB Foreign Data Wrapper | ||||
parquet_s3_fdw | parquet_s3_fdw |
PIGSTY | 0.3.1 | OLAP |
MIT | foreign-data wrapper for parquet on S3 | ||||
pg_fkpart | pg_fkpart |
PGDG | 1.7 | OLAP |
GPLv2 | Table partitioning by foreign key utility | ||||
pg_partman | pg_partman |
PGDG | 5.1.0 | OLAP |
PostgreSQL | Extension to manage partitioned tables by time or ID | ||||
plproxy | plproxy |
PGDG | 2.11.0 | OLAP |
BSD 0 | Database partitioning implemented as procedural language | ||||
pg_strom | pg_strom |
PGDG | 5.1 | OLAP |
PostgreSQL | PG-Strom - big-data processing acceleration using GPU and NVME | ||||
tablefunc | tablefunc |
CONTRIB | 1.0 | OLAP |
PostgreSQL | functions that manipulate whole tables, including crosstab | ||||
age | age |
PIGSTY | 1.5.0 | FEAT |
Apache-2.0 | AGE graph database extension | ||||
hll | hll |
PGDG | 2.18 | FEAT |
Apache-2.0 | type for storing hyperloglog data | ||||
rum | rum |
PGDG | 1.3 | FEAT |
PostgreSQL | RUM index access method | ||||
pg_graphql | pg_graphql |
PIGSTY | 1.5.7 | FEAT |
Apache-2.0 | pg_graphql: GraphQL support | ||||
pg_jsonschema | pg_jsonschema |
PIGSTY | 0.3.1 | FEAT |
Apache-2.0 | PostgreSQL extension providing JSON Schema validation | ||||
jsquery | jsquery |
PGDG | 1.1 | FEAT |
PostgreSQL | data type for jsonb inspection | ||||
pg_hint_plan | pg_hint_plan |
PGDG | 1.6.0 | FEAT |
BSD 3 | Give PostgreSQL ability to manually force some decisions in execution plans. | ||||
hypopg | hypopg |
PGDG | 1.4.1 | FEAT |
PostgreSQL | Hypothetical indexes for PostgreSQL | ||||
index_advisor | index_advisor |
PIGSTY | 0.2.0 | FEAT |
PostgreSQL | Query index advisor | ||||
imgsmlr | imgsmlr |
PIGSTY | 1.0 | FEAT |
PostgreSQL | Image similarity with haar | ||||
pg_ivm | pg_ivm |
PGDG | 1.8 | FEAT |
PostgreSQL | incremental view maintenance on PostgreSQL | ||||
pgmq | pgmq |
PIGSTY | 1.2.1 | FEAT |
PostgreSQL | A lightweight message queue. Like AWS SQS and RSMQ but on Postgres. | ||||
pgq | pgq |
PGDG | 3.5.1 | FEAT |
ISC | Generic queue for PostgreSQL | ||||
rdkit | rdkit |
PGDG | 4.3.0 | FEAT |
BSD 3 | Cheminformatics functionality for PostgreSQL. | ||||
bloom | bloom |
CONTRIB | 1.0 | FEAT |
PostgreSQL | bloom access method - signature file based index | ||||
pg_tle | pg_tle |
PIGSTY | 1.2.0 | LANG |
Apache-2.0 | Trusted Language Extensions for PostgreSQL | ||||
plv8 | plv8 |
PIGSTY | 3.2.2 | LANG |
PostgreSQL | PL/JavaScript (v8) trusted procedural language | ||||
plluau | pllua |
PGDG | 2.0 | LANG |
MIT | Lua as an untrusted procedural language | ||||
hstore_plluau | pllua |
PGDG | 1.0 | LANG |
MIT | Hstore transform for untrusted Lua | ||||
pllua | pllua |
PGDG | 2.0 | LANG |
MIT | Lua as a procedural language | ||||
hstore_pllua | pllua |
PGDG | 1.0 | LANG |
MIT | Hstore transform for Lua | ||||
plprql | plprql |
PIGSTY | 0.1.0 | LANG |
Apache-2.0 | Use PRQL in PostgreSQL - Pipelined Relational Query Language | ||||
pldbgapi | pldebugger |
PGDG | 1.1 | LANG |
Artistic | server-side support for debugging PL/pgSQL functions | ||||
plpgsql_check | plpgsql_check |
PGDG | 2.7 | LANG |
MIT | extended check for plpgsql functions | ||||
plprofiler | plprofiler |
PGDG | 4.2 | LANG |
Artistic | server-side support for profiling PL/pgSQL functions | ||||
plsh | plsh |
PGDG | 2 | LANG |
MIT | PL/sh procedural language | ||||
pljava | pljava |
PGDG | 1.6.6 | LANG |
BSD 3 | PL/Java procedural language (https://tada.github.io/pljava/) | ||||
plr | plr |
PGDG | 8.4.6 | LANG |
GPLv2 | load R interpreter and execute R script from within a database | ||||
pgtap | pgtap |
PGDG | 1.3.1 | LANG |
PostgreSQL | Unit testing for PostgreSQL | ||||
faker | faker |
PGDG | 0.5.3 | LANG |
PostgreSQL | Wrapper for the Faker Python library | ||||
dbt2 | dbt2 |
PGDG | 0.45.0 | LANG |
Artistic | OSDL-DBT-2 test kit | ||||
pltcl | pltcl |
CONTRIB | 1.0 | LANG |
PostgreSQL | PL/Tcl procedural language | ||||
pltclu | pltcl |
CONTRIB | 1.0 | LANG |
PostgreSQL | PL/TclU untrusted procedural language | ||||
plperl | plperl |
CONTRIB | 1.0 | LANG |
PostgreSQL | PL/Perl procedural language | ||||
bool_plperl | plperl |
CONTRIB | 1.0 | LANG |
PostgreSQL | transform between bool and plperl | ||||
hstore_plperl | plperl |
CONTRIB | 1.0 | LANG |
PostgreSQL | transform between hstore and plperl | ||||
jsonb_plperl | plperl |
CONTRIB | 1.0 | LANG |
PostgreSQL | transform between jsonb and plperl | ||||
plperlu | plperlu |
CONTRIB | 1.0 | LANG |
PostgreSQL | PL/PerlU untrusted procedural language | ||||
bool_plperlu | plperlu |
CONTRIB | 1.0 | LANG |
PostgreSQL | transform between bool and plperlu | ||||
jsonb_plperlu | plperlu |
CONTRIB | 1.0 | LANG |
PostgreSQL | transform between jsonb and plperlu | ||||
hstore_plperlu | plperlu |
CONTRIB | 1.0 | LANG |
PostgreSQL | transform between hstore and plperlu | ||||
plpgsql | plpgsql |
CONTRIB | 1.0 | LANG |
PostgreSQL | PL/pgSQL procedural language | ||||
plpython3u | plpython3u |
CONTRIB | 1.0 | LANG |
PostgreSQL | PL/Python3U untrusted procedural language | ||||
jsonb_plpython3u | plpython3u |
CONTRIB | 1.0 | LANG |
PostgreSQL | transform between jsonb and plpython3u | ||||
ltree_plpython3u | plpython3u |
CONTRIB | 1.0 | LANG |
PostgreSQL | transform between ltree and plpython3u | ||||
hstore_plpython3u | plpython3u |
CONTRIB | 1.0 | LANG |
PostgreSQL | transform between hstore and plpython3u | ||||
prefix | prefix |
PGDG | 1.2.0 | TYPE |
PostgreSQL | Prefix Range module for PostgreSQL | ||||
semver | semver |
PGDG | 0.32.1 | TYPE |
PostgreSQL | Semantic version data type | ||||
unit | pgunit |
PGDG | 7 | TYPE |
GPLv3 | SI units extension | ||||
md5hash | md5hash |
PIGSTY | 1.0.1 | TYPE |
BSD 2 | type for storing 128-bit binary data inline | ||||
asn1oid | asn1oid |
PIGSTY | 1 | TYPE |
GPLv3 | asn1oid extension | ||||
roaringbitmap | roaringbitmap |
PIGSTY | 0.5 | TYPE |
Apache-2.0 | support for Roaring Bitmaps | ||||
pgfaceting | pgfaceting |
PIGSTY | 0.2.0 | TYPE |
BSD 3 | fast faceting queries using an inverted index | ||||
pg_sphere | pgsphere |
PIGSTY | 1.5.1 | TYPE |
BSD 3 | spherical objects with useful functions, operators and index support | ||||
country | pg_country |
PIGSTY | 0.0.3 | TYPE |
PostgreSQL | Country data type, ISO 3166-1 | ||||
currency | pg_currency |
PIGSTY | 0.0.3 | TYPE |
MIT | Custom PostgreSQL currency type in 1Byte | ||||
pgmp | pgmp |
PGDG | 1.1 | TYPE |
LGPLv3 | Multiple Precision Arithmetic extension | ||||
numeral | numeral |
PIGSTY | 1 | TYPE |
GPLv2 | numeral datatypes extension | ||||
pg_rational | pg_rational |
PIGSTY | 0.0.2 | TYPE |
MIT | bigint fractions | ||||
uint | pguint |
PGDG | 0 | TYPE |
PostgreSQL | unsigned integer types | ||||
ip4r | ip4r |
PGDG | 2.4 | TYPE |
PostgreSQL | IPv4/v6 and IPv4/v6 range index type for PostgreSQL | ||||
uri | pg_uri |
PIGSTY | 1.20151224 | TYPE |
PostgreSQL | URI Data type for PostgreSQL | ||||
pgemailaddr | pgemailaddr |
PIGSTY | 0 | TYPE |
PostgreSQL | Email address type for PostgreSQL | ||||
acl | acl |
PIGSTY | 1.0.4 | TYPE |
BSD-2 | ACL Data type | ||||
debversion | debversion |
PGDG | 1.1 | TYPE |
PostgreSQL | Debian version number data type | ||||
pg_rrule | pg_rrule |
PGDG | 0.2.0 | TYPE |
MIT | RRULE field type for PostgreSQL | ||||
timestamp9 | timestamp9 |
PGDG | 1.4.0 | TYPE |
MIT | timestamp nanosecond resolution | ||||
chkpass | chkpass |
PIGSTY | 1.0 | TYPE |
PostgreSQL | data type for auto-encrypted passwords | ||||
isn | isn |
CONTRIB | 1.2 | TYPE |
PostgreSQL | data types for international product numbering standards | ||||
seg | seg |
CONTRIB | 1.4 | TYPE |
PostgreSQL | data type for representing line segments or floating-point intervals | ||||
cube | cube |
CONTRIB | 1.5 | TYPE |
PostgreSQL | data type for multidimensional cubes | ||||
ltree | ltree |
CONTRIB | 1.2 | TYPE |
PostgreSQL | data type for hierarchical tree-like structures | ||||
hstore | hstore |
CONTRIB | 1.8 | TYPE |
PostgreSQL | data type for storing sets of (key, value) pairs | ||||
citext | citext |
CONTRIB | 1.6 | TYPE |
PostgreSQL | data type for case-insensitive character strings | ||||
xml2 | xml2 |
CONTRIB | 1.1 | TYPE |
PostgreSQL | XPath querying and XSLT | ||||
topn | topn |
PGDG | 2.6.0 | FUNC |
AGPLv3 | type for top-n JSONB | ||||
gzip | pg_gzip |
PGDG | 1.0 | FUNC |
MIT | gzip and gunzip functions. | ||||
zstd | pg_zstd |
PIGSTY | 1.1.0 | FUNC |
ISC | Zstandard compression algorithm implementation in PostgreSQL | ||||
http | pg_http |
PGDG | 1.6 | FUNC |
MIT | HTTP client for PostgreSQL, allows web page retrieval inside the database. | ||||
pg_net | pg_net |
PGDG | 0.8.0 | FUNC |
Apache-2.0 | Async HTTP Requests | ||||
pg_html5_email_address | pg_html5_email_address |
PIGSTY | 1.2.3 | FUNC |
PostgreSQL | PostgreSQL email validation that is consistent with the HTML5 spec | ||||
pgsql_tweaks | pgsql_tweaks |
PGDG | 0.10.3 | FUNC |
PostgreSQL | Some functions and views for daily usage | ||||
pg_extra_time | pg_extra_time |
PGDG | 1.1.3 | FUNC |
PostgreSQL | Some date time functions and operators that, | ||||
timeit | pg_timeit |
PIGSTY | 1.0 | FUNC |
PostgreSQL | High-accuracy timing of SQL expressions | ||||
count_distinct | count_distinct |
PGDG | 3.0.1 | FUNC |
BSD 2 | An alternative to COUNT(DISTINCT …) aggregate, usable with HashAggregate | ||||
extra_window_functions | extra_window_functions |
PGDG | 1.0 | FUNC |
PostgreSQL | Extra Window Functions for PostgreSQL | ||||
first_last_agg | first_last_agg |
PIGSTY | 0.1.4 | FUNC |
PostgreSQL | first() and last() aggregate functions | ||||
tdigest | tdigest |
PGDG | 1.4.1 | FUNC |
Apache-2.0 | Provides tdigest aggregate function. | ||||
aggs_for_vecs | aggs_for_vecs |
PIGSTY | 1.3.0 | FUNC |
MIT | Aggregate functions for array inputs | ||||
aggs_for_arrays | aggs_for_arrays |
PIGSTY | 1.3.2 | FUNC |
MIT | Various functions for computing statistics on arrays of numbers | ||||
arraymath | pg_arraymath |
PIGSTY | 1.1 | FUNC |
MIT | Array math and operators that work element by element on the contents of arrays | ||||
quantile | quantile |
PIGSTY | 1.1.7 | FUNC |
BSD | Quantile aggregation function | ||||
lower_quantile | lower_quantile |
PIGSTY | 1.0.0 | FUNC |
BSD-2 | Lower quantile aggregate function | ||||
pg_idkit | pg_idkit |
PIGSTY | 0.2.3 | FUNC |
Apache-2.0 | multi-tool for generating new/niche universally unique identifiers (ex. UUIDv6, ULID, KSUID) | ||||
pg_uuidv7 | pg_uuidv7 |
PGDG | 1.5 | FUNC |
MPLv2 | pg_uuidv7: create UUIDv7 values in postgres | ||||
permuteseq | permuteseq |
PIGSTY | 1.2 | FUNC |
PostgreSQL | Pseudo-randomly permute sequences with a format-preserving encryption on elements | ||||
pg_hashids | pg_hashids |
PIGSTY | 1.3 | FUNC |
MIT | Short unique id generator for PostgreSQL, using hashids | ||||
sequential_uuids | sequential_uuids |
PGDG | 1.0.2 | FUNC |
MIT | generator of sequential UUIDs | ||||
pg_math | pg_math |
PIGSTY | 1.0 | FUNC |
GPLv3 | GSL statistical functions for postgresql | ||||
random | pg_random |
PIGSTY | 2.0.0-dev | FUNC |
PostgreSQL | random data generator | ||||
base36 | pg_base36 |
PIGSTY | 1.0.0 | FUNC |
MIT | Integer Base36 types | ||||
base62 | pg_base62 |
PIGSTY | 0.0.1 | FUNC |
MIT | Base62 extension for PostgreSQL | ||||
floatvec | floatvec |
PIGSTY | 1.0.1 | FUNC |
MIT | Math for vectors (arrays) of numbers | ||||
financial | pg_financial |
PIGSTY | 1.0.1 | FUNC |
PostgreSQL | Financial aggregate functions | ||||
pgjwt | pgjwt |
PIGSTY | 0.2.0 | FUNC |
MIT | JSON Web Token API for Postgresql | ||||
pg_hashlib | pg_hashlib |
PIGSTY | 1.1 | FUNC |
PostgreSQL | Stable hash functions for Postgres | ||||
shacrypt | shacrypt |
PIGSTY | 1.1 | FUNC |
PostgreSQL | Implements SHA256-CRYPT and SHA512-CRYPT password encryption schemes | ||||
cryptint | cryptint |
PIGSTY | 1.0.0 | FUNC |
PostgreSQL | Encryption functions for int and bigint values | ||||
pguecc | pg_ecdsa |
PIGSTY | 1.0 | FUNC |
BSD-2 | uECC bindings for Postgres | ||||
pgpcre | pgpcre |
PIGSTY | 1 | FUNC |
PostgreSQL | Perl Compatible Regular Expression functions | ||||
icu_ext | icu_ext |
PIGSTY | 1.8 | FUNC |
PostgreSQL | Access ICU functions | ||||
pgqr | pgqr |
PIGSTY | 1.0 | FUNC |
BSD-3 | QR Code generator from PostgreSQL | ||||
envvar | envvar |
PIGSTY | 1.0.0 | FUNC |
PostgreSQL | Fetch the value of an environment variable | ||||
pg_protobuf | pg_protobuf |
PIGSTY | 1.0 | FUNC |
MIT | Protobuf support for PostgreSQL | ||||
url_encode | url_encode |
PIGSTY | 1.2.5 | FUNC |
PostgreSQL | url_encode, url_decode functions | ||||
refint | refint |
CONTRIB | 1.0 | FUNC |
PostgreSQL | functions for implementing referential integrity (obsolete) | ||||
autoinc | autoinc |
CONTRIB | 1.0 | FUNC |
PostgreSQL | functions for autoincrementing fields | ||||
insert_username | insert_username |
CONTRIB | 1.0 | FUNC |
PostgreSQL | functions for tracking who changed a table | ||||
moddatetime | moddatetime |
CONTRIB | 1.0 | FUNC |
PostgreSQL | functions for tracking last modification time | ||||
tsm_system_time | tsm_system_time |
CONTRIB | 1.0 | FUNC |
PostgreSQL | TABLESAMPLE method which accepts time in milliseconds as a limit | ||||
dict_xsyn | dict_xsyn |
CONTRIB | 1.0 | FUNC |
PostgreSQL | text search dictionary template for extended synonym processing | ||||
tsm_system_rows | tsm_system_rows |
CONTRIB | 1.0 | FUNC |
PostgreSQL | TABLESAMPLE method which accepts number of rows as a limit | ||||
tcn | tcn |
CONTRIB | 1.0 | FUNC |
PostgreSQL | Triggered change notifications | ||||
uuid-ossp | uuid-ossp |
CONTRIB | 1.1 | FUNC |
PostgreSQL | generate universally unique identifiers (UUIDs) | ||||
btree_gist | btree_gist |
CONTRIB | 1.7 | FUNC |
PostgreSQL | support for indexing common datatypes in GiST | ||||
btree_gin | btree_gin |
CONTRIB | 1.3 | FUNC |
PostgreSQL | support for indexing common datatypes in GIN | ||||
intarray | intarray |
CONTRIB | 1.5 | FUNC |
PostgreSQL | functions, operators, and index support for 1-D arrays of integers | ||||
intagg | intagg |
CONTRIB | 1.1 | FUNC |
PostgreSQL | integer aggregator and enumerator (obsolete) | ||||
dict_int | dict_int |
CONTRIB | 1.0 | FUNC |
PostgreSQL | text search dictionary template for integers | ||||
unaccent | unaccent |
CONTRIB | 1.1 | FUNC |
PostgreSQL | text search dictionary that removes accents | ||||
pg_repack | pg_repack |
PGDG | 1.5.0 | ADMIN |
BSD 3 | Reorganize tables in PostgreSQL databases with minimal locks | ||||
pg_squeeze | pg_squeeze |
PGDG | 1.6 | ADMIN |
BSD 2 | A tool to remove unused space from a relation. | ||||
pg_dirtyread | pg_dirtyread |
PIGSTY | 2 | ADMIN |
BSD 3 | Read dead but unvacuumed rows from table | ||||
pgfincore | pgfincore |
PGDG | 1.3.1 | ADMIN |
BSD 3 | examine and manage the os buffer cache | ||||
pgdd | pgdd |
PIGSTY | 0.5.2 | ADMIN |
MIT | An in-database data dictionary providing database introspection via standard SQL query syntax. Developed using pgx (https://github.com/zombodb/pgx). | ||||
ddlx | ddlx |
PGDG | 0.27 | ADMIN |
PostgreSQL | DDL eXtractor functions | ||||
prioritize | pg_prioritize |
PGDG | 1.0 | ADMIN |
PostgreSQL | get and set the priority of PostgreSQL backends | ||||
pg_checksums | pg_checksums |
PGDG | 1.1 | ADMIN |
BSD 2 | Activate/deactivate/verify checksums in offline Postgres clusters | ||||
pg_readonly | pg_readonly |
PGDG | 1.0.0 | ADMIN |
PostgreSQL | cluster database read only | ||||
safeupdate | safeupdate |
PGDG | 1.4 | ADMIN |
ISC | Require criteria for UPDATE and DELETE | ||||
pg_permissions | pg_permissions |
PGDG | 1.3 | ADMIN |
BSD 2 | view object permissions and compare them with the desired state | ||||
pgautofailover | pgautofailover |
PGDG | 2.1 | ADMIN |
PostgreSQL | pg_auto_failover | ||||
pg_catcheck | pg_catcheck |
PGDG | 1.4.0 | ADMIN |
BSD 3 | Diagnosing system catalog corruption | ||||
pre_prepare | preprepare |
PIGSTY | 0.4 | ADMIN |
PostgreSQL | Pre Prepare your Statement server side | ||||
pgcozy | pgcozy |
PIGSTY | 1.0 | ADMIN |
PostgreSQL | Pre-warming shared buffers according to previous pg_buffercache snapshots for PostgreSQL. | ||||
pg_orphaned | pg_orphaned |
PIGSTY | 1.0 | ADMIN |
PostgreSQL | Deal with orphaned files | ||||
pg_crash | pg_crash |
PIGSTY | 1.0 | ADMIN |
BSD-3 | Send random signals to random processes | ||||
pg_cheat_funcs | pg_cheat_funcs |
PIGSTY | 1.0 | ADMIN |
PostgreSQL | Provides cheat (but useful) functions | ||||
pg_savior | pg_savior |
PIGSTY | 0.0.1 | ADMIN |
Apache-2.0 | Postgres extension to save OOPS mistakes | ||||
table_log | table_log |
PIGSTY | 0.6.1 | ADMIN |
PostgreSQL | record table modification logs and PITR for table/row | ||||
pg_fio | pg_fio |
PIGSTY | 1.0 | ADMIN |
BSD-3 | PostgreSQL File I/O Functions | ||||
pgpool_adm | pgpool |
PGDG | 1.5 | ADMIN |
PostgreSQL | Administrative functions for pgPool | ||||
pgpool_recovery | pgpool |
PGDG | 1.4 | ADMIN |
PostgreSQL | recovery functions for pgpool-II for V4.3 | ||||
pgpool_regclass | pgpool |
PGDG | 1.0 | ADMIN |
PostgreSQL | replacement for regclass | ||||
pgagent | pgagent |
PGDG | 4.2 | ADMIN |
PostgreSQL | A PostgreSQL job scheduler | ||||
vacuumlo | vacuumlo |
CONTRIB | 16.3 | ADMIN |
PostgreSQL | utility program that will remove any orphaned large objects from a PostgreSQL database | ||||
pg_prewarm | pg_prewarm |
CONTRIB | 1.2 | ADMIN |
PostgreSQL | prewarm relation data | ||||
oid2name | oid2name |
CONTRIB | 16.3 | ADMIN |
PostgreSQL | utility program that helps administrators to examine the file structure used by PostgreSQL | ||||
lo | lo |
CONTRIB | 1.1 | ADMIN |
PostgreSQL | Large Object maintenance | ||||
basic_archive | basic_archive |
CONTRIB | 16.3 | ADMIN |
PostgreSQL | an example of an archive module | ||||
basebackup_to_shell | basebackup_to_shell |
CONTRIB | 16.3 | ADMIN |
PostgreSQL | adds a custom basebackup target called shell | ||||
old_snapshot | old_snapshot |
CONTRIB | 1.0 | ADMIN |
PostgreSQL | utilities in support of old_snapshot_threshold | ||||
adminpack | adminpack |
CONTRIB | 2.1 | ADMIN |
PostgreSQL | administrative functions for PostgreSQL | ||||
amcheck | amcheck |
CONTRIB | 1.3 | ADMIN |
PostgreSQL | functions for verifying relation integrity | ||||
pg_surgery | pg_surgery |
CONTRIB | 1.0 | ADMIN |
PostgreSQL | extension to perform surgery on a damaged relation | ||||
pg_profile | pg_profile |
PGDG | 4.6 | STAT |
BSD 2 | PostgreSQL load profile repository and report builder | ||||
pg_show_plans | pg_show_plans |
PGDG | 2.1 | STAT |
PostgreSQL | show query plans of all currently running SQL statements | ||||
pg_stat_kcache | pg_stat_kcache |
PGDG | 2.2.3 | STAT |
BSD 3 | Kernel statistics gathering | ||||
pg_stat_monitor | pg_stat_monitor |
PGDG | 2.0 | STAT |
BSD 3 | The pg_stat_monitor is a PostgreSQL Query Performance Monitoring tool, based on PostgreSQL contrib module pg_stat_statements. pg_stat_monitor provides aggregated statistics, client information, plan details including plan, and histogram information. | ||||
pg_qualstats | pg_qualstats |
PGDG | 2.1.0 | STAT |
BSD 3 | An extension collecting statistics about quals | ||||
pg_store_plans | pg_store_plans |
PGDG | 1.8 | STAT |
BSD 3 | track plan statistics of all SQL statements executed | ||||
pg_track_settings | pg_track_settings |
PGDG | 2.1.2 | STAT |
PostgreSQL | Track settings changes | ||||
pg_wait_sampling | pg_wait_sampling |
PGDG | 1.1 | STAT |
PostgreSQL | sampling based statistics of wait events | ||||
system_stats | system_stats |
PGDG | 2.0 | STAT |
PostgreSQL | EnterpriseDB system statistics for PostgreSQL | ||||
meta | pg_meta |
PIGSTY | 0.4.0 | STAT |
BSD-2 | Normalized, friendlier system catalog for PostgreSQL | ||||
pgnodemx | pgnodemx |
PIGSTY | 1.6 | STAT |
Apache-2.0 | Capture node OS metrics via SQL queries | ||||
pg_proctab | pgnodemx |
PIGSTY | 0.0.10-compat | STAT |
BSD 3 | PostgreSQL extension to access the OS process table | ||||
pg_sqlog | pg_sqlog |
PIGSTY | 1.6 | STAT |
BSD 3 | Provide SQL interface to logs | ||||
bgw_replstatus | bgw_replstatus |
PGDG | 1.0.6 | STAT |
PostgreSQL | Small PostgreSQL background worker to report whether a node is a replication master or standby | ||||
pgmeminfo | pgmeminfo |
PGDG | 1.0 | STAT |
MIT | show memory usage | ||||
toastinfo | toastinfo |
PIGSTY | 1 | STAT |
PostgreSQL | show details on toasted datums | ||||
pg_mon | pg_mon |
PIGSTY | 1.0 | STAT |
MIT | PostgreSQL extension to enhance query monitoring | ||||
pg_statviz | pg_statviz |
PGDG | 0.6 | STAT |
BSD 3 | stats visualization and time series analysis | ||||
pgexporter_ext | pgexporter_ext |
PGDG | 0.2.3 | STAT |
BSD 3 | pgexporter extension for extra metrics | ||||
pg_top | pg_top |
PGDG | 3.7.0 | STAT |
BSD 3 | Monitor PostgreSQL processes similar to unix top | ||||
pagevis | pagevis |
PIGSTY | 0.1 | STAT |
MIT | Visualise database pages in ascii code | ||||
powa | powa |
PGDG | 4.2.2 | STAT |
PostgreSQL | PostgreSQL Workload Analyser-core | ||||
pageinspect | pageinspect |
CONTRIB | 1.12 | STAT |
PostgreSQL | inspect the contents of database pages at a low level | ||||
pgrowlocks | pgrowlocks |
CONTRIB | 1.2 | STAT |
PostgreSQL | show row-level locking information | ||||
sslinfo | sslinfo |
CONTRIB | 1.2 | STAT |
PostgreSQL | information about SSL certificates | ||||
pg_buffercache | pg_buffercache |
CONTRIB | 1.4 | STAT |
PostgreSQL | examine the shared buffer cache | ||||
pg_walinspect | pg_walinspect |
CONTRIB | 1.1 | STAT |
PostgreSQL | functions to inspect contents of PostgreSQL Write-Ahead Log | ||||
pg_freespacemap | pg_freespacemap |
CONTRIB | 1.2 | STAT |
PostgreSQL | examine the free space map (FSM) | ||||
pg_visibility | pg_visibility |
CONTRIB | 1.2 | STAT |
PostgreSQL | examine the visibility map (VM) and page-level visibility info | ||||
pgstattuple | pgstattuple |
CONTRIB | 1.5 | STAT |
PostgreSQL | show tuple-level statistics | ||||
auto_explain | auto_explain |
CONTRIB | 16.3 | STAT |
PostgreSQL | Provides a means for logging execution plans of slow statements automatically | ||||
pg_stat_statements | pg_stat_statements |
CONTRIB | 1.10 | STAT |
PostgreSQL | track planning and execution statistics of all SQL statements executed | ||||
passwordcheck_cracklib | passwordcheck |
PGDG | 3.0.0 | SEC |
LGPLv2 | Strengthen PostgreSQL user password checks with cracklib | ||||
supautils | supautils |
PIGSTY | 3.1.9 | SEC |
Apache-2.0 | Extension that secures a cluster on a cloud environment | ||||
pgsodium | pgsodium |
PGDG | 3.1.9 | SEC |
BSD 3 | Postgres extension for libsodium functions | ||||
supabase_vault | pg_vault |
PIGSTY | 0.2.8 | SEC |
Apache-2.0 | Supabase Vault Extension | ||||
anon | anonymizer |
PGDG | 1.3.2 | SEC |
PostgreSQL | Data anonymization tools | ||||
pg_tde | pg_tde |
PIGSTY | 1.0 | SEC |
MIT | pg_tde access method | ||||
pgsmcrypto | pgsmcrypto |
PIGSTY | 0.1.0 | SEC |
MIT | PostgreSQL SM Algorithm Extension | ||||
pgaudit | pgaudit |
PGDG | 16.0 | SEC |
PostgreSQL | provides auditing functionality | ||||
pgauditlogtofile | pgauditlogtofile |
PGDG | 1.6 | SEC |
PostgreSQL | pgAudit addon to redirect audit log to an independent file | ||||
pg_auth_mon | pg_auth_mon |
PGDG | 1.1 | SEC |
MIT | monitor connection attempts per user | ||||
credcheck | credcheck |
PGDG | 2.7.0 | SEC |
MIT | credcheck - postgresql plain text credential checker | ||||
pgcryptokey | pgcryptokey |
PGDG | 1.0 | SEC |
PostgreSQL | cryptographic key management | ||||
pg_jobmon | pg_jobmon |
PGDG | 1.4.1 | SEC |
PostgreSQL | Extension for logging and monitoring functions in PostgreSQL | ||||
logerrors | logerrors |
PGDG | 2.1 | SEC |
BSD 3 | Function for collecting statistics about messages in logfile | ||||
login_hook | login_hook |
PGDG | 1.5 | SEC |
GPLv3 | login_hook - hook to execute login_hook.login() at login time | ||||
set_user | set_user |
PGDG | 4.0.1 | SEC |
PostgreSQL | similar to SET ROLE but with added logging | ||||
pg_snakeoil | pg_snakeoil |
PIGSTY | 1 | SEC |
PostgreSQL | The PostgreSQL Antivirus | ||||
pgextwlist | pgextwlist |
PIGSTY | 1.17 | SEC |
PostgreSQL | PostgreSQL Extension Whitelisting | ||||
pg_auditor | pg_auditor |
PIGSTY | 0.2 | SEC |
BSD-3 | Audit data changes and provide flashback ability | ||||
sslutils | sslutils |
PIGSTY | 1.3 | SEC |
PostgreSQL | A Postgres extension for managing SSL certificates through SQL | ||||
noset | noset |
PIGSTY | 0.3.0 | SEC |
AGPLv3 | Module for blocking SET variables for non-super users. | ||||
sepgsql | sepgsql |
CONTRIB | 16.3 | SEC |
PostgreSQL | label-based mandatory access control (MAC) based on SELinux security policy. | ||||
auth_delay | auth_delay |
CONTRIB | 16.3 | SEC |
PostgreSQL | pause briefly before reporting authentication failure | ||||
pgcrypto | pgcrypto |
CONTRIB | 1.3 | SEC |
PostgreSQL | cryptographic functions | ||||
passwordcheck | passwordcheck |
CONTRIB | 16.3 | SEC |
PostgreSQL | checks user passwords and reject weak password | ||||
wrappers | wrappers |
PIGSTY | 0.4.1 | FDW |
Apache-2.0 | Foreign data wrappers developed by Supabase | ||||
multicorn | multicorn |
PGDG | 3.0 | FDW |
PostgreSQL | Fetch foreign data in Python in your PostgreSQL server. | ||||
mysql_fdw | mysql_fdw |
PGDG | 1.2 | FDW |
BSD 3 | Foreign data wrapper for querying a MySQL server | ||||
oracle_fdw | oracle_fdw |
PGDG | 1.2 | FDW |
PostgreSQL | foreign data wrapper for Oracle access | ||||
tds_fdw | tds_fdw |
PGDG | 2.0.3 | FDW |
PostgreSQL | Foreign data wrapper for querying a TDS database (Sybase or Microsoft SQL Server) | ||||
db2_fdw | db2_fdw |
PGDG | 6.0.1 | FDW |
PostgreSQL | foreign data wrapper for DB2 access | ||||
sqlite_fdw | sqlite_fdw |
PGDG | 1.1 | FDW |
PostgreSQL | SQLite Foreign Data Wrapper | ||||
pgbouncer_fdw | pgbouncer_fdw |
PGDG | 1.1.0 | FDW |
PostgreSQL | Extension for querying PgBouncer stats from normal SQL views & running pgbouncer commands from normal SQL functions | ||||
mongo_fdw | mongo_fdw |
PGDG | 1.1 | FDW |
LGPLv3 | foreign data wrapper for MongoDB access | ||||
redis_fdw | redis_fdw |
PIGSTY | 1.0 | FDW |
PostgreSQL | Foreign data wrapper for querying a Redis server | ||||
redis | pg_redis_pubsub |
PIGSTY | 0.0.1 | FDW |
MIT | Send redis pub/sub messages to Redis from PostgreSQL Directly | ||||
kafka_fdw | kafka_fdw |
PIGSTY | 0.0.3 | FDW |
PostgreSQL | kafka Foreign Data Wrapper for CSV formated messages | ||||
hdfs_fdw | hdfs_fdw |
PGDG | 2.0.5 | FDW |
BSD 3 | foreign-data wrapper for remote hdfs servers | ||||
firebird_fdw | firebird_fdw |
PIGSTY | 1.4.0 | FDW |
PostgreSQL | Foreign data wrapper for Firebird | ||||
aws_s3 | aws_s3 |
PIGSTY | 0.0.1 | FDW |
Apache-2.0 | aws_s3 postgres extension to import/export data from/to s3 | ||||
log_fdw | log_fdw |
PIGSTY | 1.4 | FDW |
Apache-2.0 | foreign-data wrapper for Postgres log file access | ||||
dblink | dblink |
CONTRIB | 1.2 | FDW |
PostgreSQL | connect to other PostgreSQL databases from within a database | ||||
file_fdw | file_fdw |
CONTRIB | 1.0 | FDW |
PostgreSQL | foreign-data wrapper for flat file access | ||||
postgres_fdw | postgres_fdw |
CONTRIB | 1.1 | FDW |
PostgreSQL | foreign-data wrapper for remote PostgreSQL servers | ||||
orafce | orafce |
PGDG | 4.10 | SIM |
BSD 0 | Functions and operators that emulate a subset of functions and packages from the Oracle RDBMS | ||||
pgtt | pgtt |
PGDG | 4.0.0 | SIM |
ISC | Extension to add Global Temporary Tables feature to PostgreSQL | ||||
session_variable | session_variable |
PIGSTY | 3.3 | SIM |
GPLv3 | Registration and manipulation of session variables and constants | ||||
pg_statement_rollback | pg_statement_rollback |
PGDG | 1.4 | SIM |
ISC | Server side rollback at statement level for PostgreSQL like Oracle or DB2 | ||||
pg_dbms_metadata | pg_dbms_metadata |
PGDG | 1.0.0 | SIM |
PostgreSQL | Extension to add Oracle DBMS_METADATA compatibility to PostgreSQL | ||||
pg_dbms_lock | pg_dbms_lock |
PGDG | 1.0.0 | SIM |
PostgreSQL | Extension to add Oracle DBMS_LOCK full compatibility to PostgreSQL | ||||
pg_dbms_job | pg_dbms_job |
PGDG | 1.5.0 | SIM |
PostgreSQL | Extension to add Oracle DBMS_JOB full compatibility to PostgreSQL | ||||
babelfishpg_common | babelfishpg_common |
WILTON | 3.3.3 | SIM |
Apache-2.0 | SQL Server Transact SQL Datatype Support | ||||
babelfishpg_tsql | babelfishpg_tsql |
WILTON | 3.3.1 | SIM |
Apache-2.0 | SQL Server Transact SQL compatibility | ||||
babelfishpg_tds | babelfishpg_tds |
WILTON | 1.0.0 | SIM |
Apache-2.0 | SQL Server TDS protocol extension | ||||
babelfishpg_money | babelfishpg_money |
WILTON | 1.1.0 | SIM |
Apache-2.0 | SQL Server Money Data Type | ||||
pgmemcache | pgmemcache |
PGDG | 2.3.0 | SIM |
MIT | memcached interface | ||||
pglogical | pglogical |
PGDG | 2.4.4 | ETL |
PostgreSQL | PostgreSQL Logical Replication | ||||
pglogical_origin | pglogical |
PGDG | 1.0.0 | ETL |
PostgreSQL | Dummy extension for compatibility when upgrading from Postgres 9.4 | ||||
pglogical_ticker | pglogical |
PGDG | 1.4 | ETL |
PostgreSQL | Have an accurate view on pglogical replication delay | ||||
pgl_ddl_deploy | pgl_ddl_deploy |
PGDG | 2.2 | ETL |
MIT | automated ddl deployment using pglogical | ||||
pg_failover_slots | pg_failover_slots |
PIGSTY | 1.0.1 | ETL |
PostgreSQL | PG Failover Slots extension | ||||
wal2json | wal2json |
PGDG | 2.5.3 | ETL |
BSD 3 | Changing data capture in JSON format | ||||
wal2mongo | wal2mongo |
PIGSTY | 1.0.7 | ETL |
Apache-2.0 | PostgreSQL logical decoding output plugin for MongoDB | ||||
decoderbufs | decoderbufs |
PGDG | 0.1.0 | ETL |
MIT | Logical decoding plugin that delivers WAL stream changes using a Protocol Buffer format | ||||
decoder_raw | decoder_raw |
PIGSTY | 1.0 | ETL |
PostgreSQL | Output plugin for logical replication in Raw SQL format | ||||
test_decoding | test_decoding |
CONTRIB | 16.3 | ETL |
PostgreSQL | SQL-based test/example module for WAL logical decoding | ||||
mimeo | mimeo |
PIGSTY | 1.5.1 | ETL |
PostgreSQL | Extension for specialized, per-table replication between PostgreSQL instances | ||||
repmgr | repmgr |
PGDG | 5.4 | ETL |
GPLv3 | Replication manager for PostgreSQL | ||||
pgcopydb | pgcopydb |
PGDG | 0.15 | ETL |
PostgreSQL | Copy a Postgres database to a target Postgres server | ||||
pgloader | pgloader |
PGDG | 3.6.10 | ETL |
PostgreSQL | Migrate to PostgreSQL in a single command! | ||||
pg_fact_loader | pg_fact_loader |
PGDG | 2.0 | ETL |
MIT | build fact tables with Postgres | ||||
pg_bulkload | pg_bulkload |
PGDG | 3.1.21 | ETL |
BSD 3 | pg_bulkload is a high speed data loading utility for PostgreSQL | ||||
pg_comparator | pg_comparator |
PGDG | 2.2.5 | ETL |
BSD 3 | Comparation of testing and production services PostgreSQL databases. | ||||
pgimportdoc | pgimportdoc |
PGDG | 0.1.4 | ETL |
BSD 2 | command line utility for importing XML, JSON, BYTEA document to PostgreSQL | ||||
pgexportdoc | pgexportdoc |
PGDG | 0.1.4 | ETL |
BSD 2 | export XML, JSON and BYTEA documents from PostgreSQL |
Docker Application
Pigsty now offers out-of-the-box Dify and Odoo Docker Compose templates:
There two new beta modules available in Pigsty Pro version:
KAFKA
: Deploy a high-availability Kafka cluster supported by the Kraft protocol.KUBE
: Deploy a Kubernetes cluster managed by Pigsty using cri-dockerd or containerd.
Bug Fix
- Fixed CVE-2024-6387 by automatically repairing during the Pigsty installation process using the default value
[openssh-server]
innode_packages
. - Fixed memory consumption issues caused by Loki parsing Nginx log tag cardinality being too large.
- Fixed bootstrap failure caused by upstream Ansible dependency changes in EL8 systems (python3.11-jmespath upgraded to python3.12-jmespath).
v2.7: Extension Overwhelming
The Pigsty community is thrilled to announce Pigsty v2.7.0, which has 255 unique extensions available, to the free PostgreSQL distribution and RDS alternative. We also have introduced some new docker-compose templates for Odoo, Jupyter, PolarDB, and GA Supabase.
About Pigsty
Pigsty is a Battery-included, local-first PostgreSQL Distribution as a Free RDS alternative.
Links: Website | GitHub | Demo | Blog | Install | Feature
Images: Introduction | Extensions | Architecture | Dashboards
Getting Started with the latest v2.7.0 release with: curl -L https://get.pigsty.cc/install | bash
Pigsty v2.7: Extension Overwhelming
I wrote a popular article last month - Postgres is eating the database world, explaining why extensions matter to the PostgreSQL ecosystem.
Based on this idea, we’ve packaged 20 brand-new extensions in v2.7. With these extensions added, Pigsty offers 157 non-contrib extensions for EL Distros and 116 for Debian/Ubuntu Distros. Combined with 73 built-in Contrib extensions, Pigsty now has a total of 255 unique extensions available, which takes PostgreSQL’s versatility to a whole new level!
Complete list of available extensions: https://pigsty.io/docs/reference/extension/
v2.7.0 Release Note
Highlight
Adding numerous new extensions written in rust
& pgrx
:
- pg_search v0.7.0 : Full text search over SQL tables using the BM25 algorithm
- pg_lakehouse v0.7.0 : Query engine over object stores like S3 and table formats like Delta Lake
- pg_analytics v0.6.1 : Accelerates analytical query processing inside Postgres
- pg_graphql v1.5.4 : GraphQL support to your PostgreSQL database.
- pg_jsonschema v0.3.1 : PostgreSQL extension providing JSON Schema validation
- wrappers v0.3.1 : Postgres Foreign Data Wrappers Collections by Supabase
- pgmq v1.5.2 : A lightweight message queue. Like AWS SQS and RSMQ but on Postgres.
- pg_tier v0.0.3 : Postgres Extension written in Rust, to enable data tiering to AWS S3
- pg_vectorize v0.15.0 : The simplest way to orchestrate vector search on Postgres
- pg_later v0.1.0 : Execute SQL now and get the results later.
- pg_idkit v0.2.3 : Generating many popular types of identifiers
- plprql v0.1.0 : Use PRQL in PostgreSQL
- pgsmcrypto v0.1.0 : PostgreSQL SM Algorithm Extension
- pg_tiktoken v0.0.1 : OpenAI tiktoken tokenizer for PostgreSQL
- pgdd v0.5.2 : Access Data Dictionary metadata with pure SQL
And some new extensions in plain C & C++:
- parquet_s3_fdw 1.1.0 : ParquetS3 Foreign Data Wrapper for PostgreSQL
- plv8 3.2.2 : V8 Engine Javascript Procedural Language add-on for PostgreSQL
- md5hash 1.0.1 : Custom data type for storing MD5 hashes rather than text
- pg_tde 1.0 alpha: Experimental encrypted access method for PostgreSQL
- pg_dirtyread 2.6 : Read dead but unvacuumed tuples from a PostgreSQL relation
- New deb PGDG extensions:
pg_roaringbitmap
,pgfaceting
,mobilitydb
,pgsql-http
,pg_hint_plan
,pg_statviz
,pg_rrule
- New rpm PGDG extensions:
pg_profile
,pg_show_plans
, use PGDG’spgsql_http
,pgsql_gzip
,pg_net
,pg_bigm
instead of Pigsty RPM.
New Features
- Prepare arm64 packages for infra & pgsql packages for el & deb distros.
- New installation script to download from Cloudflare, and more hints.
- New monitoring dashboard PGSQL PITR to assist the PITR procedure.
- Make preparations for running pigsty inside docker VM containers
- Add a fool-proof design for running pgsql.yml on a node that is not managed by Pigsty
- Add separated template for each OS distro: el7, el8, el9, debian11, debian12, ubuntu20, ubuntu22
New Docker Application
- Odoo: launch open-source ERP over PostgreSQL.
- Jupyter: Run Jupyter notebook containers and expose the HTTP service.
- PolarDB: run the demo playground for the shared-storage version of OSS PG.
- supabase: bump to the latest GA version.
- bytebase: use the
latest
tag instead of the ad hoc version. - pg_exporter: update docker image example
Software Upgrade
- PostgreSQL 16.3, 15.7, 14.12, 13.15, 12.19
- Patroni 3.3.0
- pgBackRest 2.51
- vip-manager v2.5.0
- Haproxy 2.9.7
- Grafana 10.4.2
- Prometheus 2.51
- Loki & Promtail: 3.0.0 (breaking changes!)
- Alertmanager 0.27.0
- BlackBox Exporter 0.25.0
- Node Exporter 1.8.0
- pgBackrest Exporter 0.17.0
- duckdb 0.10.2
- etcd 3.5.13
- minio-20240510014138 / mcli-20240509170424
- pev2 v1.8.0 → v1.11.0
- pgvector 0.6.1 → 0.7.0
- pg_tle: v1.3.4 → v1.4.0
- hydra: v1.1.1 → v1.1.2
- duckdb_fdw: v1.1.0 recompile with libduckdb 0.10.2
- pg_bm25 0.5.6 → pg_search 0.7.0
- pg_analytics: 0.5.6 → 0.6.1
- pg_graphql: 1.5.0 → 1.5.4
- pg_net 0.8.0 → 0.9.1
- pg_sparse (deprecated due to pgvector 0.7)
Fixed Issues
- Fix role pg_exporters white space in variable templates
- Fix
minio_cluster
not commented in global variables - Fix the non-existent
postgis34
package name in theel7
config template - Fix EL8
python3.11-cryptography
deps topython3-cryptography
according to upstream - Fix
/pg/bin/pg-role
can not get OS user name from environ in non-interact mode - Fix
/pg/bin/pg-pitr
can not hint-X
-P
flag properly
API Change
- New parameter
node_write_etc_hosts
to control whether to write/etc/hosts
file on target nodes. - Relocatable prometheus target directory with new parameter
prometheus_sd_dir
. - Add
-x|--proxy
flag to enable and use the value of global proxy env by @waitingsong in https://github.com/Vonng/pigsty/pull/405 - No longer parse infra nginx log details since it brings too many labels to the log.
- Use alertmanager API Version
v2
instead ofv1
in prometheus config. - Use
/pg/cert/ca.crt
instead of/etc/pki/ca.crt
in rolepgsql
.
Acknowledgment
A huge thank you to all our users who contributed patches reported bugs and proposed new features.
Pigsty thrives on community contributions. We warmly welcome your ideas, feature requests, or patches. Please share your contributions on our GitHub page. We look forward to your feedback on Pigsty 2.7 and your continued support in making Pigsty even better.
Best regards,
Ruohang Feng (@vonng), [email protected] , The Pigsty Community
Battery-Included PostgreSQL Distro as a Free RDS Alternative, with:
- Extensible Postgres with 255 extensions available: PostGIS, Timescale, Citus, PGVector, AGE, PGML, ParadeDB, Hydra, DuckFDW, GraphQL, ……
- Reliable Infras: Create self-healing HA PostgreSQL clusters with pre-configured PITR, built-in ACL, & SSL, and secure your infra with local CA & best practices.
- Observable Graphics: Unparalleled monitoring best practices build upon the modern Prometheus & Grafana stack. Reuse them to monitor existing DBs & cloud RDS. Check our Gallery & Demo
- High-Available Service: Deliver auto-routed, high-performance, pooled, reliable, flexible database Services Access via Pgbouncer DNSMasq, Keepalived, vip-manager, and HAProxy.
- Maintainable Toolbox: Infra as Code, Declarative API & Idempotent Playbooks, Vagrant sandbox & Terraform IaaS provisioning specs. Local repo, offline package, delivered without Internet access.
- Composable Modules: Modular design, flexible arch with many bonus features. Redis, MinIO, ETCD, FerretDB, DuckDB, Supabase, and Docker compose templates for software that uses Postgres.
- Painless Experience: Easy to use: Download, Install, and Configure in one command. Built-in configuration templates for different scenarios, auto-tuned params, admin SOP, and zero-downtime blue-green migration plans.
- Compatible Distros: Run on base OS without containerization support: EL 7, 8, 9 and Rocky, Alma, CentOS, OracleLinux,… and Ubuntu 20.04 / 22.04 and Debian 11 / 12 Support.
- Open-Source RDS: Free software open-sourced under the AGPLv3 license, a free RDS for PostgreSQL alternative.
v2.7.0
Highlight
Extension Overwhelming, adding numerous new extensions written in rust
& pgrx
:
- pg_search v0.7.0 : Full text search over SQL tables using the BM25 algorithm
- pg_lakehouse v0.7.0 : Query engine over object stores like S3 and table formats like Delta Lake
- pg_analytics v0.6.1 : Accelerates analytical query processing inside Postgres
- pg_graphql v1.5.4 : GraphQL support to your PostgreSQL database.
- pg_jsonschema v0.3.1 : PostgreSQL extension providing JSON Schema validation
- wrappers v0.3.1 : Postgres Foreign Data Wrappers Collections by Supabase
- pgmq v1.5.2 : A lightweight message queue. Like AWS SQS and RSMQ but on Postgres.
- pg_tier v0.0.3 : Postgres Extension written in Rust, to enable data tiering to AWS S3
- pg_vectorize v0.15.0 : The simplest way to orchestrate vector search on Postgres
- pg_later v0.1.0 : Execute SQL now and get the results later.
- pg_idkit v0.2.3 : Generating many popular types of identifiers
- plprql v0.1.0 : Use PRQL in PostgreSQL
- pgsmcrypto v0.1.0 : PostgreSQL SM Algorithm Extension
- pg_tiktoken v0.0.1 : OpenAI tiktoken tokenizer for postgres
- pgdd v0.5.2 : Access Data Dictionary metadata with pure SQL
And some new extensions in plain C & C++
- parquet_s3_fdw 1.1.0 : ParquetS3 Foreign Data Wrapper for PostgresSQL
- plv8 3.2.2 : V8 Engine Javascript Procedural Language add-on for PostgreSQL
- md5hash 1.0.1 : Custom data type for storing MD5 hashes rather than text
- pg_tde 1.0 alpha: Experimental encrypted access method for PostgreSQL
- pg_dirtyread 2.6 : Read dead but unvacuumed tuples from a PostgreSQL relation
- New deb PGDG extensions:
pg_roaringbitmap
,pgfaceting
,mobilitydb
,pgsql-http
,pg_hint_plan
,pg_statviz
,pg_rrule
- New rpm PGDG extensions:
pg_profile
,pg_show_plans
, use PGDG’spgsql_http
,pgsql_gzip
,pg_net
,pg_bigm
instead of Pigsty RPM.
New Features
- running on certain
docker
containers. - prepare arm64 packages for infra & pgsql packages for el & deb distros.
- new installation script to download from cloudflare, and more hint.
- new monitoring dashboard for PGSQL PITR to assist the PITR procedure.
- make preparation for running pigsty inside docker VM containers
- add a fool-proof design for running pgsql.yml on node that is not managed by pigsty
- add config template for each major version: el7, el8, el9, debian11, debian12, ubuntu20, ubuntu22
Software Upgrade
- PostgreSQL 16.3
- Patroni 3.3.0
- pgBackRest 2.51
- vip-manager v2.5.0
- Haproxy 2.9.7
- Grafana 10.4.2
- Prometheus 2.51
- Loki & Promtail: 3.0.0 (breaking changes!)
- Alertmanager 0.27.0
- BlackBox Exporter 0.25.0
- Node Exporter 1.8.0
- pgBackrest Exporter 0.17.0
- duckdb 0.10.2
- etcd 3.5.13
- minio-20240510014138 / mcli-20240509170424
- pev2 v1.8.0 -> v1.11.0
- pgvector 0.6.1 -> 0.7.0
- pg_tle: v1.3.4 -> v1.4.0
- hydra: v1.1.1 -> v1.1.2
- duckdb_fdw: v1.1.0 recompile with libduckdb 0.10.2
- pg_bm25 0.5.6 -> pg_search 0.7.0
- pg_analytics: 0.5.6 -> 0.6.1
- pg_graphql: 1.5.0 -> 1.5.4
- pg_net 0.8.0 -> 0.9.1
- pg_sparse (deprecated)
Docker Application
- Odoo: launch open source ERP and plugins
- Jupyter: run jupyter notebook container
- PolarDB: run the demo PG RAC playground.
- supabase: bump to the latest GA version.
- bytebase: use the
latest
tag instead of ad hoc version. - pg_exporter: update docker image example
Bug Fix
- Fix role pg_exporters white space in variable templates
- Fix
minio_cluster
not commented in global variables - Fix the non-exist
postgis34
in el7 config template - Fix EL8
python3.11-cryptography
deps topython3-cryptography
according to upstream - Fix
/pg/bin/pg-role
can not get OS user name from environ in non-interact mode - Fix
/pg/bin/pg-pitr
can not hint -X -P flag properly
API Change
- New parameter
node_write_etc_hosts
to control whether to write/etc/hosts
file on target nodes. - Relocatable prometheus target directory with new parameter
prometheus_sd_dir
. - Add
-x|--proxy
flag to enable and use value of global proxy env by @waitingsong in https://github.com/Vonng/pigsty/pull/405 - No longer parse infra nginx log details since it brings too much labels to the log.
- Use alertmanager API Version v2 instead of v1 in prometheus config.
- Use
/pg/cert/ca.crt
instead of/etc/pki/ca.crt
in pgsql roles.
New Contributors
- @NeroSong made their first contribution in https://github.com/Vonng/pigsty/pull/373
- @waitingsong made their first contribution in https://github.com/Vonng/pigsty/pull/405
Package Checksums
ec271a1d34b2b1360f78bfa635986c3a pigsty-pkg-v2.7.0.el8.x86_64.tgz
f3304bfd896b7e3234d81d8ff4b83577 pigsty-pkg-v2.7.0.debian12.x86_64.tgz
5b071c2a651e8d1e68fc02e7e922f2b3 pigsty-pkg-v2.7.0.ubuntu22.x86_64.tgz
v2.6: the OLAP New Challenger
v2.6.0
Highlight
- Use PostgreSQL 16 as the default major version (16.2)
- Introduce ParadeDB extensions:
pg_analytics
,pg_bm25
, andpg_sparse
- Introduce DuckDB and corresponding foreign data wrapper:
duckdb_fdw
- Cloudflare CDN https://repo.pigsty.io and QCloud CDN https://repo.pigsty.cc
Configuration
- Disable Grafana Unified Alert to work around the “Database Locked” error。
- add
node_repo_modules
to add upstream repos (including local one) to node - remove
node_local_repo_urls
, replaced bynode_repo_modules
&repo_upstream
. - remove
node_repo_method
, replaced bynode_repo_modules
. - add the new
local
repo intorepo_upstream
instead ofnode_local_repo_urls
- add
chrony
intonode_default_packages
- remove redis,minio,postgresql client from infra packages
- replace
repo_upstream.baseurl
$releasever for pgdg el8/el9 withmajor.minor
instead ofmajor
version
Software Upgrade
- Grafana 10.3.3
- Prometheus 2.47
- node_exporter 1.7.0
- HAProxy 2.9.5
- Loki / Promtail 2.9.4
- minio-20240216110548 / mcli-20240217011557
- etcd 3.5.11
- Redis 7.2.4
- Bytebase 2.13.2
- HAProxy 2.9.5
- DuckDB 0.10.0
- FerretDB 1.19
- Metabase: new docker compose app template added
PostgreSQL x Pigsty Extensions
- PostgreSQL Minor Version Upgrade 16.2, 15.6, 14.11, 13.14, 12.18
- PostgreSQL 16 is now used as the default major version
- pg_exporter 0.6.1, security fix
- Patroni 3.2.2
- pgBadger 12.4
- pgBouncer 1.22
- pgBackRest 2.50
- vip-manager 2.3.0
- PostGIS 3.4.2
- PGVector 0.6.0
- TimescaleDB 2.14.1
- New Extension duckdb_fdw v1.1
- New Extension pgsql-gzip v1.0.0
- New Extension pg_sparse from ParadeDB: v0.5.6
- New Extension pg_bm25 from ParadeDB: v0.5.6
- New Extension pg_analytics from ParadeDB: v0.5.6
- Bump AI/ML Extension pgml to v2.8.1 with pg16 support
- Bump Columnar Extension hydra to v1.1.1 with pg16 support
- Bump Graph Extension age to v1.5.0 with pg16 support
- Bump Packaging Extension pg_tle to v1.3.4 with pg16 support
- Bump GraphQL Extension pg_graphql to v1.5.0 to support supabase
330e9bc16a2f65d57264965bf98174ff pigsty-v2.6.0.tgz
81abcd0ced798e1198740ab13317c29a pigsty-pkg-v2.6.0.debian11.x86_64.tgz
7304f4458c9abd3a14245eaf72f4eeb4 pigsty-pkg-v2.6.0.debian12.x86_64.tgz
f914fbb12f90dffc4e29f183753736bb pigsty-pkg-v2.6.0.el7.x86_64.tgz
fc23d122d0743d1c1cb871ca686449c0 pigsty-pkg-v2.6.0.el8.x86_64.tgz
9d258dbcecefd232f3a18bcce512b75e pigsty-pkg-v2.6.0.el9.x86_64.tgz
901ee668621682f99799de8932fb716c pigsty-pkg-v2.6.0.ubuntu20.x86_64.tgz
39872cf774c1fe22697c428be2fc2c22 pigsty-pkg-v2.6.0.ubuntu22.x86_64.tgz
v2.5: Debian / Ubuntu / PG16
v2.5.0
curl https://get.pigsty.cc/latest | bash
Highlights
-
Dedicate yum/apt repo on
repo.pigsty.cc
and mirror on packagecloud.io -
Anolis OS Support (EL 8.8 Compatible)
-
PG Major Candidate: Use PostgreSQL 16 instead of PostgreSQL 14.
-
New Dashboard PGSQL Exporter, PGSQL Patroni, rework on PGSQL Query
-
Extensions Update:
- Bump PostGIS version to v3.4 on el8, el9, ubuntu22, keep postgis 33 on EL7
- Remove extension
pg_embedding
because it is no longer maintained, usepgvector
instead. - New extension on EL:
pointcloud
with LIDAR data type support. - New extension on EL:
imgsmlr
,pg_similarity
,pg_bigm
扩展。 - Include columnar extension
hydra
and removecitus
from default installed extension list. - Recompile
pg_filedump
as PG major version independent package.
-
Software Version Upgrade:
- Grafana to v10.1.5
- Prometheus to v2.47
- Promtail/Loki to v2.9.1
- Node Exporter to v1.6.1
- Bytebase to v2.10.0
- patroni to v3.1.2
- pgbouncer to v1.21.0
- pg_exporter to v0.6.0
- pgbackrest to v2.48.0
- pgbadger to v12.2
- pg_graphql to v1.4.0
- pg_net to v0.7.3
- ferretdb to v0.12.1
- sealos to 4.3.5
- Supabase support to
20231013070755
Ubuntu Support
Pigsty has two ubuntu LTS support: 22.04 (jammy) and 20.04 (focal), and ship corresponding offline packages for them.
Some parameters need to be specified explicitly when deploying on Ubuntu, please refer to ubuntu.yml
repo_upstream
: Adjust according to ubuntu / debian repo.repo_packages
: Adjust according to ubuntu / debian naming conventionnode_repo_local_urls
: use the default value:['deb [trusted=yes] http://${admin_ip}/pigsty ./']
node_default_packages
:zlib
->zlib1g
,readline
->libreadline-dev
vim-minimal
->vim-tiny
,bind-utils
->dnsutils
,perf
->linux-tools-generic
,- new packages
acl
to ensure ansible tmp file privileges are set correctly
infra_packages
: replace all_
with-
in names, and replacepostgresql16
withpostgresql-client-16
pg_packages
: replace all_
with-
in names,patroni-etcd
not needed on ubuntupg_extensions
: different naming convention, nopasswordcheck_cracklib
on ubuntu.pg_dbsu_uid
: You have to manually specifypg_dbsu_uid
on ubuntu, because PGDG deb package does not specify pg dbsu uid.
API Changes
default values of following parameters have changed:
-
repo_modules
:infra,node,pgsql,redis,minio
-
repo_upstream
: Now add Pigsty Infra/MinIO/Redis/PGSQL modular upstream repo. -
repo_packages
: remove unusedkarma,mtail,dellhw_exporter
and pg 14 extra extensions, adding pg 16 extra extensions. -
node_default_packages
now addpython3-pip
as default packages. -
pg_libs
:timescaledb
is remove from shared_preload_libraries by default. -
pg_extensions
: citus is nolonger installed by default, andpasswordcheck_cracklib
is installed by default- pg_repack_${pg_version}* wal2json_${pg_version}* passwordcheck_cracklib_${pg_version}* - postgis34_${pg_version}* timescaledb-2-postgresql-${pg_version}* pgvector_${pg_version}*
87e0be2edc35b18709d7722976e305b0 pigsty-pkg-v2.5.0.el7.x86_64.tgz
e71304d6f53ea6c0f8e2231f238e8204 pigsty-pkg-v2.5.0.el8.x86_64.tgz
39728496c134e4352436d69b02226ee8 pigsty-pkg-v2.5.0.el9.x86_64.tgz
e3f548a6c7961af6107ffeee3eabc9a7 pigsty-pkg-v2.5.0.debian11.x86_64.tgz
1e469cc86a19702e48d7c1a37e2f14f9 pigsty-pkg-v2.5.0.debian12.x86_64.tgz
cc3af3b7c12f98969d3c6962f7c4bd8f pigsty-pkg-v2.5.0.ubuntu20.x86_64.tgz
c5b2b1a4867eee624e57aed58ac65a80 pigsty-pkg-v2.5.0.ubuntu22.x86_64.tgz
v2.5.1
Routine update with v16.1, v15.5, 14.10, 13.13, 12.17, 11.22
Now PostgreSQL 16 has all the core extensions available (pg_repack
& timescaledb
added)
- Software Version Upgrade:
- PostgreSQL to v16.1, v15.5, 14.10, 13.13, 12.17, 11.22
- Patroni v3.2.0
- PgBackrest v2.49
- Citus 12.1
- TimescaleDB 2.13.0 (with PG 16 support)
- Grafana v10.2.2
- FerretDB 1.15
- SealOS 4.3.7
- Bytebase 2.11.1
- Remove
monitor
schema prefix from PGCAT dashboard queries - New template
wool.yml
for Aliyun free ECS singleton - Add
python3-jmespath
in addition topython3.11-jmespath
for el9
31ee48df1007151009c060e0edbd74de pigsty-pkg-v2.5.1.el7.x86_64.tgz
a40f1b864ae8a19d9431bcd8e74fa116 pigsty-pkg-v2.5.1.el8.x86_64.tgz
c976cd4431fc70367124fda4e2eac0a7 pigsty-pkg-v2.5.1.el9.x86_64.tgz
7fc1b5bdd3afa267a5fc1d7cb1f3c9a7 pigsty-pkg-v2.5.1.debian11.x86_64.tgz
add0731dc7ed37f134d3cb5b6646624e pigsty-pkg-v2.5.1.debian12.x86_64.tgz
99048d09fa75ccb8db8e22e2a3b41f28 pigsty-pkg-v2.5.1.ubuntu20.x86_64.tgz
431668425f8ce19388d38e5bfa3a948c pigsty-pkg-v2.5.1.ubuntu22.x86_64.tgz
v2.4: Monitoring Cloud RDS
v2.4.0
Get started with bash -c "$(curl -fsSL https://get.pigsty.cc/latest)"
.
Highlights
- PostgreSQL 16 support
- The first LTS version with business support and consulting service
- Monitoring existing PostgreSQL, RDS for PostgreSQL / PolarDB with PGRDS Dashboards
- New extension: Apache AGE, openCypher graph query engine on PostgreSQL
- New extension: zhparser, full text search for Chinese language
- New extension: pg_roaringbitmap, roaring bitmap for PostgreSQL
- New extension: pg_embedding, hnsw alternative to pgvector
- New extension: pg_tle, admin / manage stored procedure extensions
- New extension: pgsql-http, issue http request with SQL interface
- Add extensions: pg_auth_mon pg_checksums pg_failover_slots pg_readonly postgresql-unit pg_store_plans pg_uuidv7 set_user
- Redis enhancement: add monitoring panels for redis sentinel, and auto HA configuration for redis ms cluster.
API Change
- New Parameter:
REDIS
.redis_sentinel_monitor
: specify masters monitor by redis sentinel cluster
Bug Fix
- Fix Grafana 10.1 registered datasource will use random uid rather than
ins.datname
MD5 (pigsty-pkg-v2.4.0.el7.x86_64.tgz) = 257443e3c171439914cbfad8e9f72b17
MD5 (pigsty-pkg-v2.4.0.el8.x86_64.tgz) = 41ad8007ffbfe7d5e8ba5c4b51ff2adc
MD5 (pigsty-pkg-v2.4.0.el9.x86_64.tgz) = 9a950aed77a6df90b0265a6fa6029250
v2.3: Ecosystem Applications
v2.3.0
PGSQL/REDIS Update, NODE VIP, Mongo/FerretDB, MYSQL Stub
Get started with bash -c "$(curl -fsSL https://get.pigsty.cc/latest)"
Highlight
- INFRA: NODE/PGSQL VIP monitoring support
- NODE: Allow bind
node_vip
to node cluster withkeepalived
- REPO: Dedicate yum repo, enable https for
get.pigsty.cc
anddemo.pigsty.cc
- PGSQL: Fix CVE-2023-39417 with PostgreSQL 15.4, 14.9, 13.12, 12.16, bump patroni version to v3.1.0
- APP: Bump
app/bytebase
to v2.6.0,app/ferretdb
version to v1.8, new application nocodb - REDIS: bump to v7.2 and rework on dashboards
- MONGO: basic deploy & monitor support with FerretDB 1.8
- MYSQL: add prometheus/grafana/ca stub for future implementation.
API Change
Add 1 new section NODE
.NODE_VIP
with 8 new parameter
NODE
.VIP
.vip_enabled
: enable vip on this node cluster?NODE
.VIP
.vip_address
: node vip address in ipv4 format, required if vip is enabledNODE
.VIP
.vip_vrid
: required, integer, 1-255 should be unique among same VLANNODE
.VIP
.vip_role
:master/backup
, backup by default, use as init roleNODE
.VIP
.vip_preempt
: optional,true/false
, false by default, enable vip preemptionNODE
.VIP
.vip_interface
: node vip network interface to listen,eth0
by defaultNODE
.VIP
.vip_dns_suffix
: node vip dns name suffix,.vip
by defaultNODE
.VIP
.vip_exporter_port
: keepalived exporter listen port, 9650 by default
MD5 (pigsty-pkg-v2.3.0.el7.x86_64.tgz) = 81db95f1c591008725175d280ad23615
MD5 (pigsty-pkg-v2.3.0.el8.x86_64.tgz) = 6f4d169b36f6ec4aa33bfd5901c9abbe
MD5 (pigsty-pkg-v2.3.0.el9.x86_64.tgz) = 4bc9ae920e7de6dd8988ca7ee681459d
v2.3.1
Get started with bash -c "$(curl -fsSL https://get.pigsty.cc/latest)"
.
Highlights
- PGVector 0.5 with HNSW index support
- PostgreSQL 16 RC1 for el8/el9 ** Adding SealOS for kubernetes support
Bug Fix
- Fix
infra
.repo
.repo_pkg
task when downloading rpm with*
in their names inrepo_packages
.- if
/www/pigsty
already have package name match that pattern, some rpm will be skipped.
- if
- Change default value of
vip_dns_suffix
to''
empty string rather than.vip
- Grant sudo privilege for postgres dbsu when
pg_dbsu_sudo
=limit
andpatroni_watchdog_mode
=required
/usr/bin/sudo /sbin/modprobe softdog
: enable watchdog module before launching patroni/usr/bin/sudo /bin/chown {{ pg_dbsu }} /dev/watchdog
: chown watchdog before launching patroni
Documentation Update
- Add details to English documentation
- Add Chinese/zh-cn documentation
Software Upgrade
- PostgreSQL 16 RC1 on el8/el9
- PGVector 0.5.0 with hnsw index
- TimescaleDB 2.11.2
- grafana 10.1.0
- loki & promtail 2.8.4
- mcli-20230829225506 / minio-20230829230735
- ferretdb 1.9
- sealos 4.3.3
- pgbadger 1.12.2
ce69791eb622fa87c543096cdf11f970 pigsty-pkg-v2.3.1.el7.x86_64.tgz
495aba9d6d18ce1ebed6271e6c96b63a pigsty-pkg-v2.3.1.el8.x86_64.tgz
38b45582cbc337ff363144980d0d7b64 pigsty-pkg-v2.3.1.el9.x86_64.tgz
v2.2: Observability Overhaul
v2.2.0
https://github.com/Vonng/pigsty/releases/tag/v2.2.0
Get started with bash -c "$(curl -fsSL https://get.pigsty.cc/latest)"
Release Note: https://doc.pigsty.cc/#/RELEASENOTE?id=v220
Highlight
- Monitoring Dashboards Overhaul: https://demo.pigsty.cc
- Vagrant Sandbox Overhaul: libvirt support and new templates
- Pigsty EL Yum Repo: Building simplified
- OS Compatibility: UOS-v20-1050e support
- New config template: prod simulation with 42 nodes
- Use official pgdg citus distribution for el7
Software Upgrade
- PostgreSQL 16 beta2
- Citus 12 / PostGIS 3.3.3 / TimescaleDB 2.11.1 / PGVector 0.44
- patroni 3.0.4 / pgbackrest 2.47 / pgbouncer 1.20
- grafana 10.0.3 / loki/promtail/logcli 2.8.3
- etcd 3.5.9 / haproxy v2.8.1 / redis v7.0.12
- minio 20230711212934 / mcli 20230711233044
Bug Fix
- Fix docker group ownership issue [29434bd]https://github.com/Vonng/pigsty/commit/29434bdd39548d95d80a236de9099874ed564f9b
- Append infra os group rather than set it as primary group
- Fix redis sentinel systemd enable status 5c96feb
- Loose
bootstrap
&configure
if/etc/redhat-release
not exists - Fix grafana 9.x CVE-2023-1410 with 10.0.2
- Add PG 14 - 16 new command tags and error codes for
pglog
schema
API Change
Add 1 new parameter
INFRA
.NGINX
.nginx_exporter_enabled
: now you can disable nginx_exporter with this parameter
Default value changes:
repo_modules
:node,pgsql,infra
: redis is removed from itrepo_upstream
:- add
pigsty-el
: distribution independent rpms: such as grafana, minio, pg_exporter, etc… - add
pigsty-misc
: distribution aware rpms: such as redis, prometheus stack binaries, etc… - remove
citus
repo since pgdg now have full official citus support (on el7) - remove
remi
, since redis is now included inpigsty-misc
- remove
grafana
in build config for acceleration
- add
repo_packages
:- ansible python3 python3-pip python3-requests python3.11-jmespath dnf-utils modulemd-tools # el7: python36-requests python36-idna yum-utils
- grafana loki logcli promtail prometheus2 alertmanager karma pushgateway node_exporter blackbox_exporter nginx_exporter redis_exporter
- redis etcd minio mcli haproxy vip-manager pg_exporter nginx createrepo_c sshpass chrony dnsmasq docker-ce docker-compose-plugin flamegraph
- lz4 unzip bzip2 zlib yum pv jq git ncdu make patch bash lsof wget uuid tuned perf nvme-cli numactl grubby sysstat iotop htop rsync tcpdump
- netcat socat ftp lrzsz net-tools ipvsadm bind-utils telnet audit ca-certificates openssl openssh-clients readline vim-minimal
- postgresql13* wal2json_13* pg_repack_13* passwordcheck_cracklib_13* postgresql12* wal2json_12* pg_repack_12* passwordcheck_cracklib_12* postgresql16* timescaledb-tools
- postgresql15 postgresql15* citus_15* pglogical_15* wal2json_15* pg_repack_15* pgvector_15* timescaledb-2-postgresql-15* postgis33_15* passwordcheck_cracklib_15* pg_cron_15*
- postgresql14 postgresql14* citus_14* pglogical_14* wal2json_14* pg_repack_14* pgvector_14* timescaledb-2-postgresql-14* postgis33_14* passwordcheck_cracklib_14* pg_cron_14*
- patroni patroni-etcd pgbouncer pgbadger pgbackrest pgloader pg_activity pg_partman_15 pg_permissions_15 pgaudit17_15 pgexportdoc_15 pgimportdoc_15 pg_statement_rollback_15*
- orafce_15* mysqlcompat_15 mongo_fdw_15* tds_fdw_15* mysql_fdw_15 hdfs_fdw_15 sqlite_fdw_15 pgbouncer_fdw_15 multicorn2_15* powa_15* pg_stat_kcache_15* pg_stat_monitor_15* pg_qualstats_15 pg_track_settings_15 pg_wait_sampling_15 system_stats_15
- plprofiler_15* plproxy_15 plsh_15* pldebugger_15 plpgsql_check_15* pgtt_15 pgq_15* pgsql_tweaks_15 count_distinct_15 hypopg_15 timestamp9_15* semver_15* prefix_15* rum_15 geoip_15 periods_15 ip4r_15 tdigest_15 hll_15 pgmp_15 extra_window_functions_15 topn_15
- pg_background_15 e-maj_15 pg_catcheck_15 pg_prioritize_15 pgcopydb_15 pg_filedump_15 pgcryptokey_15 logerrors_15 pg_top_15 pg_comparator_15 pg_ivm_15* pgsodium_15* pgfincore_15* ddlx_15 credcheck_15 safeupdate_15 pg_squeeze_15* pg_fkpart_15 pg_jobmon_15
repo_url_packages
:node_default_packages
:- lz4,unzip,bzip2,zlib,yum,pv,jq,git,ncdu,make,patch,bash,lsof,wget,uuid,tuned,nvme-cli,numactl,grubby,sysstat,iotop,htop,rsync,tcpdump
- netcat,socat,ftp,lrzsz,net-tools,ipvsadm,bind-utils,telnet,audit,ca-certificates,openssl,readline,vim-minimal,node_exporter,etcd,haproxy,python3,python3-pip
infra_packages
- grafana,loki,logcli,promtail,prometheus2,alertmanager,karma,pushgateway
- node_exporter,blackbox_exporter,nginx_exporter,redis_exporter,pg_exporter
- nginx,dnsmasq,ansible,postgresql15,redis,mcli,python3-requests
PGSERVICE
in.pigsty
is removed, replaced withPGDATABASE=postgres
.
FHS Changes:
bin/dns
andbin/ssh
now moved tovagrant/
MD5 (pigsty-pkg-v2.2.0.el7.x86_64.tgz) = 5fb6a449a234e36c0d895a35c76add3c
MD5 (pigsty-pkg-v2.2.0.el8.x86_64.tgz) = c7211730998d3b32671234e91f529fd0
MD5 (pigsty-pkg-v2.2.0.el9.x86_64.tgz) = 385432fe86ee0f8cbccbbc9454472fdd
v2.1: Vector Embedding & RAG
v2.1.0
PostgreSQL 12 ~ 16 support and pgvector for AI embedding.
https://github.com/Vonng/pigsty/releases/tag/v2.1.0
Highlight
- PostgreSQL 16 beta support, and 12 ~ 15 support.
- Add PGVector for AI Embedding for 12 - 15
- Add 6 extra panel & datasource plugins for grafana
- Add
bin/profile
to profile remote process and generate flamegraph - Add
bin/validate
to validate pigsty.yml configuration file - Add
bin/repo-add
to add upstream repo files to /etc/yum.repos.d - PostgreSQL 16 observability:
pg_stat_io
and corresponding dashboards
Software Upgrade
- PostgreSQL 15.3 , 14.8, 13.11, 12.15, 11.20, and 16 beta1
- pgBackRest 2.46
- pgbouncer 1.19
- Redis 7.0.11
- Grafana v9.5.3
- Loki / Promtail / Logcli 2.8.2
- Prometheus 2.44
- TimescaleDB 2.11.0
- minio-20230518000536 / mcli-20230518165900
- Bytebase v2.2.0
Enhancement
- Now use all
id*.pub
when installing local user’s public key
v2.0: Free RDS PG Alternative
v2.0.0
“PIGSTY” is now the abbr of “PostgreSQL in Great STYle”
or “PostgreSQL & Infrastructure & Governance System allTogether for You”.
Get pigsty v2.0.0 release via the following command:
curl -fsSL http://download.pigsty.cc/get) | bash
Download directly from GitHub Release
bash -c "$(curl -fsSL https://raw.githubusercontent.com/Vonng/pigsty/master/bin/get)"
# or download tarball directly with curl (EL9)
curl -L https://github.com/Vonng/pigsty/releases/download/v2.0.0/pigsty-v2.0.0.tgz -o ~/pigsty.tgz
curl -L https://github.com/Vonng/pigsty/releases/download/v2.0.0/pigsty-pkg-v2.0.0.el9.x86_64.tgz -o /tmp/pkg.tgz
# EL7: https://github.com/Vonng/pigsty/releases/download/v2.0.0/pigsty-pkg-v2.0.0.el7.x86_64.tgz
# EL8: https://github.com/Vonng/pigsty/releases/download/v2.0.0/pigsty-pkg-v2.0.0.el8.x86_64.tgz
Highlights
- PostgreSQL 15.2, PostGIS 3.3, Citus 11.2, TimescaleDB 2.10 now works together and unite as one.
- Now works on EL 7,8,9 for RHEL, CentOS, Rocky, AlmaLinux, and other EL compatible distributions
- Security enhancement with self-signed CA, full SSL support,
scram-sha-256
pwd encryption, and more. - Patroni 3.0 with native HA citus cluster support and dcs failsafe mode to prevent global DCS failures.
- Auto-Configured, Battery-Included PITR for PostgreSQL powered by
pgbackrest
, local or S3/minio. - Dedicate module
ETCD
which can be easily deployed and scaled in/out. Used as DCS instead of Consul. - Dedicate module
MINIO
, local S3 alternative for the optional central backup repo for PGSQL PITR. - Better config templates with adaptive tuning for Node & PG according to your hardware spec.
- Use AGPL v3.0 license instead of Apache 2.0 license due to Grafana & MinIO reference.
Compatibility
- Pigsty now works on EL7, EL8, EL9, and offers corresponding pre-packed offline packages.
- Pigsty now works on EL compatible distributions: RHEL, CentOS, Rocky, AlmaLinux, OracleLinux,…
- Pigsty now use RockyLinux 9 as default developing & testing environment instead of CentOS 7
- EL version, CPU arch, and pigsty version string are part of source & offline package names.
- PGSQL: PostgreSQL 15.2 / PostGIS 3.3 / TimescaleDB 2.10 / Citus 11.2 now works together.
- PGSQL: Patroni 3.0 is used as default HA solution for PGSQL, and etcd is used as default DCS.
- Patroni 3.0 with DCS failsafe mode to prevent global DCS failures (demoting all primary)
- Patroni 3.0 with native HA citus cluster support, with entirely open sourced v11 citus.
- vip-manager 2.x with ETCDv3 API, ETCDv2 API is deprecated, so does patroni.
- PGSQL: pgBackRest v2.44 is introduced to provide battery-include PITR for PGSQL.
- it will use local backup FS on primary by default for a two-day retention policy
- it will use S3/minio as an alternative central backup repo for a two-week retention policy
- ETCD is used as default DCS instead of Consul, And V3 API is used instead of V2 API.
- NODE module now consist of
node
itself,haproxy
,docker
,node_exporter
, andpromtail
chronyd
is used as default NTP client instead ofntpd
- HAPROXY now attach to
NODE
instead ofPGSQL
, which can be used for exposing services - You can register PG Service to dedicate haproxy clusters rather than local cluster nodes.
- You can expose ad hoc service in a NodePort manner with haproxy, not limited to pg services.
- INFRA now consist of
dnsmasq
,nginx
,prometheus
,grafana
,loki
- DNSMASQ is enabled on all infra nodes, and added to all nodes as the default resolver.
- Add blackbox_exporter for ICMP probe, add pushgateway for batch job metrics.
- Switch to official loki & promtail rpm packages. Use official Grafana Echarts Panel.
- Add infra dashboards for self-monitoring, add patroni & pg15 metrics to monitoring system
- Software Upgrade
- PostgreSQL 15.2 / PostGIS 3.3 / TimescaleDB 2.10 / Citus 11.2
- Patroni 3.0 / Pgbouncer 1.18 / pgBackRest 2.44 / vip-manager 2.1
- HAProxy 2.7 / Etcd 3.5 / MinIO 20230222182345 / mcli 20230216192011
- Prometheus 2.42 / Grafana 9.3 / Loki & Promtail 2.7 / Node Exporter 1.5
Security
- A full-featured self-signed CA enabled by default
- Redact password in postgres logs.
- SSL for Nginx (you have to trust the self-signed CA or use
thisisunsafe
to dismiss warning) - SSL for etcd peer/client traffics by @alemacci
- SSL for postgres/pgbouncer/patroni by @alemacci
scram-sha-256
auth for postgres password encryption by @alemacci- Pgbouncer Auth Query by @alemacci
- Use
AES-256-CBC
forpgbackrest
encryption by @alemacci - Adding a security enhancement config template which enforce global SSL
- Now all hba rules are defined in config inventory, no default rules.
Maintainability
- Adaptive tuning template for PostgreSQL & Patroni by @Vonng, @alemacci
- configurable log dir for Patroni & Postgres & Pgbouncer & Pgbackrest by @alemacci
- Replace fixed ip placeholder
10.10.10.10
with${admin_ip}
that can be referenced - Adaptive upstream repo definition that can be switched according EL ver,
region
& arch. - Terraform Templates for AWS CN & Aliyun, which can be used for sandbox IaaS provisioning
- Vagrant Templates:
meta
,full
,el7
el8
,el9
,build
,minio
,citus
, etc… - New playbook
pgsql-monitor.yml
for monitoring existing pg instance or RDS PG. - New playbook
pgsql-migration.yml
for migrating existing pg instance to pigsty manged pg. - New shell utils under
bin/
to simplify the daily administration tasks. - Optimize ansible role implementation. which can be used without default parameter values.
- Now you can define pgbouncer parameters on database & user level
API Changes
69 parameters added, 16 parameters removed, rename 14 parameters
INFRA
.META
.admin_ip
: primary meta node ip addressINFRA
.META
.region
: upstream mirror region: default|china|europeINFRA
.META
.os_version
: enterprise linux release version: 7,8,9INFRA
.CA
.ca_cn
: ca common name, pigsty-ca by defaultINFRA
.CA
.cert_validity
: cert validity, 20 years by defaultINFRA
.REPO
.repo_enabled
: build a local yum repo on infra node?INFRA
.REPO
.repo_upstream
: list of upstream yum repo definitionINFRA
.REPO
.repo_home
: home dir of local yum repo, usually same as nginx_home ‘/www’INFRA
.NGINX
.nginx_ssl_port
: https listen portINFRA
.NGINX
.nginx_ssl_enabled
: nginx https enabled?INFRA
.PROMTETHEUS
.alertmanager_endpoint
: altermanager endpoint in (ip|domain):port formatNODE
.NODE_TUNE
.node_hugepage_count
: number of 2MB hugepage, take precedence overnode_hugepage_ratio
NODE
.NODE_TUNE
.node_hugepage_ratio
: mem hugepage ratio, 0 disable it by defaultNODE
.NODE_TUNE
.node_overcommit_ratio
: node mem overcommit ratio, 0 disable it by defaultNODE
.HAPROXY
.haproxy_service
: list of haproxy service to be exposedPGSQL
.PG_ID
.pg_mode
: pgsql cluster mode: pgsql,citus,gpsqlPGSQL
.PG_BUSINESS
.pg_dbsu_password
: dbsu password, empty string means no dbsu password by defaultPGSQL
.PG_INSTALL
.pg_log_dir
: postgres log dir,/pg/data/log
by defaultPGSQL
.PG_BOOTSTRAP
.pg_storage_type
: SSD|HDD, SSD by defaultPGSQL
.PG_BOOTSTRAP
.patroni_log_dir
: patroni log dir,/pg/log
by defaultPGSQL
.PG_BOOTSTRAP
.patroni_ssl_enabled
: secure patroni RestAPI communications with SSL?PGSQL
.PG_BOOTSTRAP
.patroni_username
: patroni rest api usernamePGSQL
.PG_BOOTSTRAP
.patroni_password
: patroni rest api password (IMPORTANT: CHANGE THIS)PGSQL
.PG_BOOTSTRAP
.patroni_citus_db
: citus database managed by patroni, postgres by defaultPGSQL
.PG_BOOTSTRAP
.pg_max_conn
: postgres max connections,auto
will use recommended valuePGSQL
.PG_BOOTSTRAP
.pg_shared_buffer_ratio
: postgres shared buffer memory ratio, 0.25 by default, 0.1~0.4PGSQL
.PG_BOOTSTRAP
.pg_rto
: recovery time objective, ttl to failover, 30s by defaultPGSQL
.PG_BOOTSTRAP
.pg_rpo
: recovery point objective, 1MB data loss at most by defaultPGSQL
.PG_BOOTSTRAP
.pg_pwd_enc
: algorithm for encrypting passwords: md5|scram-sha-256PGSQL
.PG_BOOTSTRAP
.pgbouncer_log_dir
: pgbouncer log dir,/var/log/pgbouncer
by defaultPGSQL
.PG_BOOTSTRAP
.pgbouncer_auth_query
: if enabled, query pg_authid table to retrieve biz users instead of populating userlistPGSQL
.PG_BOOTSTRAP
.pgbouncer_sslmode
: SSL for pgbouncer client: disable|allow|prefer|require|verify-ca|verify-fullPGSQL
.PG_BACKUP
.pgbackrest_enabled
: pgbackrest enabled?PGSQL
.PG_BACKUP
.pgbackrest_clean
: remove pgbackrest data during init ?PGSQL
.PG_BACKUP
.pgbackrest_log_dir
: pgbackrest log dir,/pg/log
by defaultPGSQL
.PG_BACKUP
.pgbackrest_method
: pgbackrest backup repo method, local or minioPGSQL
.PG_BACKUP
.pgbackrest_repo
: pgbackrest backup repo configPGSQL
.PG_SERVICE
.pg_service_provider
: dedicate haproxy node group name, or empty string for local nodes by defaultPGSQL
.PG_SERVICE
.pg_default_service_dest
: default service destination if svc.dest=‘default’PGSQL
.PG_SERVICE
.pg_vip_enabled
: enable a l2 vip for pgsql primary? false by defaultPGSQL
.PG_SERVICE
.pg_vip_address
: vip address in<ipv4>/<mask>
format, require if vip is enabledPGSQL
.PG_SERVICE
.pg_vip_interface
: vip network interface to listen, eth0 by defaultPGSQL
.PG_SERVICE
.pg_dns_suffix
: pgsql cluster dns name suffix, ’’ by defaultPGSQL
.PG_SERVICE
.pg_dns_target
: auto, primary, vip, none, or ad hoc ipETCD
.etcd_seq
: etcd instance identifier, REQUIREDETCD
.etcd_cluster
: etcd cluster & group name, etcd by defaultETCD
.etcd_safeguard
: prevent purging running etcd instance?ETCD
.etcd_clean
: purging existing etcd during initialization?ETCD
.etcd_data
: etcd data directory, /data/etcd by defaultETCD
.etcd_port
: etcd client port, 2379 by defaultETCD
.etcd_peer_port
: etcd peer port, 2380 by defaultETCD
.etcd_init
: etcd initial cluster state, new or existingETCD
.etcd_election_timeout
: etcd election timeout, 1000ms by defaultETCD
.etcd_heartbeat_interval
: etcd heartbeat interval, 100ms by defaultMINIO
.minio_seq
: minio instance identifier, REQUIREDMINIO
.minio_cluster
: minio cluster name, minio by defaultMINIO
.minio_clean
: cleanup minio during init?, false by defaultMINIO
.minio_user
: minio os user,minio
by defaultMINIO
.minio_node
: minio node name patternMINIO
.minio_data
: minio data dir(s), use {x…y} to specify multi driversMINIO
.minio_domain
: minio external domain name,sss.pigsty
by defaultMINIO
.minio_port
: minio service port, 9000 by defaultMINIO
.minio_admin_port
: minio console port, 9001 by defaultMINIO
.minio_access_key
: root access key,minioadmin
by defaultMINIO
.minio_secret_key
: root secret key,minioadmin
by defaultMINIO
.minio_extra_vars
: extra environment variables for minio serverMINIO
.minio_alias
: alias name for local minio deploymentMINIO
.minio_buckets
: list of minio bucket to be createdMINIO
.minio_users
: list of minio user to be created
Removed Parameters
INFRA
.CA
.ca_homedir
: ca home dir, now fixed as/etc/pki/
INFRA
.CA
.ca_cert
: ca cert filename, now fixed asca.key
INFRA
.CA
.ca_key
: ca key filename, now fixed asca.key
INFRA
.REPO
.repo_upstreams
: replaced byrepo_upstream
PGSQL
.PG_INSTALL
.pgdg_repo
: now taken care by node playbooksPGSQL
.PG_INSTALL
.pg_add_repo
: now taken care by node playbooksPGSQL
.PG_IDENTITY
.pg_backup
: not used and conflict with section namePGSQL
.PG_IDENTITY
.pg_preflight_skip
: not used anymore, replace bypg_id
DCS
.dcs_name
: removed due to using etcdDCS
.dcs_servers
: replaced by using ad hoc groupetcd
DCS
.dcs_registry
: removed due to using etcdDCS
.dcs_safeguard
: replaced byetcd_safeguard
DCS
.dcs_clean
: replaced byetcd_clean
PGSQL
.PG_VIP
.vip_mode
: replaced bypg_vip_enabled
PGSQL
.PG_VIP
.vip_address
: replaced bypg_vip_address
PGSQL
.PG_VIP
.vip_interface
: replaced bypg_vip_interface
Renamed Parameters
nginx_upstream
->infra_portal
repo_address
->repo_endpoint
pg_hostname
->node_id_from_pg
pg_sindex
->pg_group
pg_services
->pg_default_services
pg_services_extra
->pg_services
pg_hba_rules_extra
->pg_hba_rules
pg_hba_rules
->pg_default_hba_rules
pgbouncer_hba_rules_extra
->pgb_hba_rules
pgbouncer_hba_rules
->pgb_default_hba_rules
node_packages_default
->node_default_packages
node_packages_meta
->infra_packages
node_packages_meta_pip
->infra_packages_pip
node_data_dir
->node_data
Checksums
MD5 (pigsty-pkg-v2.0.0.el7.x86_64.tgz) = 9ff3c973fa5915f65622b91419817c9b
MD5 (pigsty-pkg-v2.0.0.el8.x86_64.tgz) = bd108a6c8f026cb79ee62c3b68b72176
MD5 (pigsty-pkg-v2.0.0.el9.x86_64.tgz) = e24288770f240af0511b0c38fa2f4774
Special thanks to @alemacci for his great contribution!
v2.0.1
Bug fix for v2.0.0 and security improvement.
Enhancement
- Replace the pig shape logo for compliance with the PostgreSQL trademark policy.
- Bump grafana version to v9.4 with better UI and bugfix.
- Bump patroni version to v3.0.1 with some bugfix.
- Change: rollback grafana systemd service file to rpm default.
- Use slow
copy
instead ofrsync
to copy grafana dashboards. - Enhancement: add back default repo files after bootstrap
- Add asciinema video for various administration tasks.
- Security Enhance Mode: restrict monitor user privilege.
- New config template:
dual.yml
for two-node deployment. - Enable
log_connections
andlog_disconnections
incrit.yml
template. - Enable
$lib/passwordcheck
inpg_libs
incrit.yml
template. - Explicitly grant monitor view permission to
pg_monitor
role. - Remove default
dbrole_readonly
fromdbuser_monitor
to limit monitor user privilege - Now patroni listen on
{{ inventory_hostname }}
instead of0.0.0.0
- Now you can control postgres/pgbouncer listen to address with
pg_listen
- Now you can use placeholder
${ip}
,${lo}
,${vip}
inpg_listen
- Bump Aliyun terraform image to rocky Linux 9 instead of centos 7.9
- Bump bytebase to v1.14.0
Bug Fixes
- Add missing advertise address for alertmanager
- Fix missing
pg_mode
error when adding postgres user withbin/pgsql-user
- Add
-a password
to redis-join task @redis.yml
- Fix missing default value in
infra-rm.yml
.remove infra data
- Fix prometheus targets file ownership to
prometheus
- Use admin user rather than root to delete metadata in DCS
- Fix Meta datasource missing database name due to grafana 9.4 bug.
Caveats
Official EL8 pgdg upstream is broken now, DO use it with caution!
Affected packages: postgis33_15, pgloader, postgresql_anonymizer_15*, postgresql_faker_15
How to Upgrade
cd ~/pigsty; tar -zcf /tmp/files.tgz files; rm -rf ~/pigsty # backup files dir and remove
cd ~; bash -c "$(curl -fsSL https://get.pigsty.cc/latest)" # get latest pigsty source
cd ~/pigsty; rm -rf files; tar -xf /tmp/files.tgz -C ~/pigsty # restore files dir
Checksums
MD5 (pigsty-pkg-v2.0.1.el7.x86_64.tgz) = 5cfbe98fd9706b9e0f15c1065971b3f6
MD5 (pigsty-pkg-v2.0.1.el8.x86_64.tgz) = c34aa460925ae7548866bf51b8b8759c
MD5 (pigsty-pkg-v2.0.1.el9.x86_64.tgz) = 055057cebd93c473a67fb63bcde22d33
Special thanks to @cocoonkid for his feedback.
v2.0.2
Highlight
Store OpenAI embedding and search similar vectors with pgvector
- New extension
pgvector
- MinIO CVE-2023-28432 fix, and upgrade to 20230324 with new policy API:
Changes
- New extension
pgvector
for storing OpenAI embedding and searching similar vectors. - MinIO CVE-2023-28432 fix, and upgrade to 20230324 with new policy API.
- Add reload functionality to DNSMASQ systemd services
- Bump pev to v1.8
- Bump grafana to v9.4.7
- Bump MinIO and MCLI version to 20230324
- Bump bytebase version to v1.15.0
- Upgrade monitoring dashboards and fix dead links
- Upgrade aliyun terraform template image to rockylinux 9
- Adopt grafana provisioning API change since v9.4
- Add asciinema videos for various administration tasks
- Fix broken EL8 pgsql deps: remove anonymizer_15 faker_15 and pgloader
MD5 (pigsty-pkg-v2.0.2.el7.x86_64.tgz) = d46440a115d741386d29d6de646acfe2
MD5 (pigsty-pkg-v2.0.2.el8.x86_64.tgz) = 5fa268b5545ac96b40c444210157e1e1
MD5 (pigsty-pkg-v2.0.2.el9.x86_64.tgz) = c8b113d57c769ee86a22579fc98e8345
v1.5.0 Release Note
v1.5.0
Highlights
- Complete Docker Support, enable on meta nodes by default with lot’s of software templates.
- bytebase pgadmin4 pgweb postgrest kong minio,…
- Infra Self Monitoring: Nginx, ETCD, Consul, Grafana, Prometheus, Loki, etc…
- New CMDB design compatible with redis & greenplum, visualize with CMDB Overview
- Service Discovery : Consul SD now works again for prometheus targets management
- Redis playbook now works on single instance with
redis_port
option. - Better cold backup support: crontab for backup, delayed standby with
pg_delay
- Use ETCD as DCS, alternative to Consul
Monitoring
Dashboards
- CMDB Overview: Visualize CMDB Inventory
- DCS Overview: Show consul & etcd metrics
- Nginx Overview: Visualize nginx metrics & access/error logs
- Grafana Overview: Grafana self Monitoring
- Prometheus Overview:Prometheus self Monitoring
- INFRA Dashboard & Home Dashboard Reforge
Architecture
- Infra monitoring targets now have a separated target dir
targets/infra
- Consul SD is available for prometheus
- etcd , consul , patroni, docker metrics
- Now infra targets are managed by role
infra_register
- Upgrade pg_exporter to v0.5.0 with
scale
anddefault
supportpg_bgwriter
,pg_wal
,pg_query
,pg_db
,pgbouncer_stat
now use seconds instead of ms and µspg_table
counters now have default value 0 instead of NaNpg_class
is replaced bypg_table
andpg_index
pg_table_size
is now enabled with 300s ttl
Provisioning
- New optional package
docker.tgz
contains: Pgadmin, Pgweb, Postgrest, ByteBase, Kong, Minio, etc. - New Role
etcd
to deploy & monitor etcd dcs service - Specify which type of DCS to use with
pg_dcs_type
(etcd
now available) - Add
pg_checksum
option to enable data checksum - Add
pg_delay
option to setup delayed standby leaders - Add
node_crontab
andnode_crontab_overwrite
to create routine jobs such as cold backup - Add a series of
*_enable
options to control components - Loki and Promtail are now installed using the RPM package made by
frpm
.
Software Updates
- Upgrade PostgreSQL to 14.3
- Upgrade Redis to 6.2.7
- Upgrade PG Exporter to 0.5.0
- Upgrade Consul to 1.12.0
- Upgrade vip-manager to v1.0.2
- Upgrade Grafana to v8.5.2
- Upgrade HAproxy to 2.5.7 without rsyslog dependency
- Upgrade Loki & Promtail to v2.5.0 with RPM packages
- New packages:
pg_probackup
New software / application based on docker:
- bytebase : DDL Schema Migrator
- pgadmin4 : Web Admin UI for PostgreSQL
- pgweb : Web Console for PostgreSQL
- postgrest : Auto generated REST API for PostgreSQL
- kong : API Gateway which use PostgreSQL as backend storage
- swagger openapi : API Specification Generator
- Minio : S3-compatible object storage
Bug Fix
- Fix loki & promtail
/etc/default
config file name issue - Now
node_data_dir (/data)
is created before consul init if not exists - Fix haproxy silence
/var/log/messages
with inappropriate rsyslog dependency
API Change
New Variable
node_data_dir
: major data mount path, will be created if not exist.node_crontab_overwrite
: overwrite/etc/crontab
instead of appendnode_crontab
: node crontab to be appended or overwrittennameserver_enabled
: enable nameserver on this meta node?prometheus_enabled
: enable prometheus on this meta node?grafana_enabled
: enable grafana on this meta node?loki_enabled
: enable loki on this meta node?docker_enable
: enable docker on this node?consul_enable
: enable consul server/agent?etcd_enable
: enable etcd server/clients?pg_checksum
: enable pg cluster data-checksum?pg_delay
: recovery min apply delay for standby leader
Reforge
Now *_clean
are boolean flags to clean up existing instance during init.
And *_safeguard
are boolean flags to avoid purging running instance when executing any playbook.
pg_exists_action
->pg_clean
pg_disable_purge
->pg_safeguard
dcs_exists_action
->dcs_clean
dcs_disable_purge
->dcs_safeguard
Rename
node_ntp_config
->node_ntp_enabled
node_admin_setup
->node_admin_enabled
node_admin_pks
->node_admin_pk_list
node_dns_hosts
->node_etc_hosts_default
node_dns_hosts_extra
->node_etc_hosts
node_dns_server
->node_dns_method
node_local_repo_url
->node_repo_local_urls
node_packages
->node_packages_default
node_extra_packages
->node_packages
node_packages_meta
->node_packages_meta
node_meta_pip_install
->node_packages_meta_pip
node_sysctl_params
->node_tune_params
app_list
->nginx_indexes
grafana_plugin
->grafana_plugin_method
grafana_cache
->grafana_plugin_cache
grafana_plugins
->grafana_plugin_list
grafana_git_plugin_git
->grafana_plugin_git
haproxy_admin_auth_enabled
->haproxy_auth_enabled
pg_shared_libraries
->pg_libs
dcs_type
->pg_dcs_type
v1.5.1
Highlights
WARNING: CREATE INDEX|REINDEX CONCURRENTLY
PostgreSQL 14.0 - 14.3 may lead to index data corruption!
Please upgrade postgres to 14.4 ASAP.
Software Upgrade
- upgrade postgres to 14.4
- Upgrade haproxy to 2.6.0
- Upgrade grafana to 9.0.0
- Upgrade prometheus 2.36.0
- Upgrade patroni to 2.1.4
Bug fix:
- Fix typo in
pgsql-migration.yml
- remove pid file in haproxy config
- remove i686 packages when using repotrack under el7
- Fix redis service systemctl enabled issue
- Fix patroni systemctl service enabled=no by default issue
API Changes
- Mark
grafana_database
andgrafana_pgurl
as obsolete
New Apps
- wiki.js : Local wiki with Postgres
v1.4.0 Release Note
v1.4.0
Architecture
- Decouple system into 4 major categories:
INFRA
,NODES
,PGSQL
,REDIS
, which makes pigsty far more clear and more extensible. - Single Node Deployment =
INFRA
+NODES
+PGSQL
- Deploy pgsql clusters =
NODES
+PGSQL
- Deploy redis clusters =
NODES
+REDIS
- Deploy other databases =
NODES
+ xxx (e.gMONGO
,KAFKA
, … TBD)
Accessibility
- CDN for mainland China.
- Get the latest source with
bash -c "$(curl -fsSL http://download.pigsty.cc/get)"
- Download & Extract packages with new
download
script.
Monitor Enhancement
- Split monitoring system into 5 major categories:
INFRA
,NODES
,REDIS
,PGSQL
,APP
- Logging enabled by default
- now
loki
andpromtail
are enabled by default. with prebuilt loki-rpm
- now
- Models & Labels
- A hidden
ds
prometheus datasource variable is added for all dashboards, so you can easily switch different datasource simply by select a new one rather than modifying Grafana Datasources & Dashboards - An
ip
label is added for all metrics, and will be used as join key between database metrics & nodes metrics
- A hidden
- INFRA Monitoring
- Home dashboard for infra: INFRA Overview
- Add logging Dashboards : Logs Instance
- PGLOG Analysis & PGLOG Session now treated as an example Pigsty APP.
- NODES Monitoring Application
- If you don’t care database at all, Pigsty now can be used as host monitoring software alone!
- Consist of 4 core dashboards: Nodes Overview & Nodes Cluster & Nodes Instance & Nodes Alert
- Introduce new identity variables for nodes:
node_cluster
andnodename
- Variable
pg_hostname
now means set hostname same as postgres instance name to keep backward-compatible - Variable
nodename_overwrite
control whether overwrite node’s hostname with nodename - Variable
nodename_exchange
will write nodename to each other’s/etc/hosts
- All nodes metrics reference are overhauled, join by
ip
- Nodes monitoring targets are managed alone under
/etc/prometheus/targets/nodes
- PGSQL Monitoring Enhancement
- Complete new PGSQL Cluster which simplify and focus on important stuff among cluster.
- New Dashboard PGSQL Databases which is cluster level object monitoring. Such as tables & queries among the entire cluster rather than single instance.
- PGSQL Alert dashboard now only focus on pgsql alerts.
- PGSQL Shard are added to PGSQL
- Redis Monitoring Enhancement
- Add nodes monitoring for all redis dashboards.
MatrixDB Support
- MatrixDB (Greenplum 7) can be deployed via
pigsty-matrix.yml
playbook - MatrixDB Monitor Dashboards : PGSQL MatrixDB
- Example configuration added:
pigsty-mxdb.yml
Provisioning Enhancement
Now pigsty work flow works as this:
infra.yml ---> install pigsty on single meta node
| then add more nodes under pigsty's management
|
nodes.yml ---> prepare nodes for pigsty (node setup, dcs, node_exporter, promtail)
| then choose one playbook to deploy database clusters on those nodes
|
^--> pgsql.yml install postgres on prepared nodes
^--> redis.yml install redis on prepared nodes
infra-demo.yml =
infra.yml -l meta +
nodes.yml -l pg-test +
pgsql.yml -l pg-test +
infra-loki.yml + infra-jupyter.yml + infra-pgweb.yml
nodes.yml
to setup & prepare nodes for pigsty- setup node, node_exporter, consul agent on nodes
node-remove.yml
are used for node de-register
pgsql.yml
now only works on prepared nodespgsql-remove
now only responsible for postgres itself. (dcs and node monitor are taken bynode.yml
)- Add a series of new options to reuse
postgres
role in greenplum/matrixdb
redis.yml
now works on prepared nodes- and
redis-remove.yml
now remove redis from nodes.
- and
pgsql-matrix.yml
now install matrixdb (Greenplum 7) on prepared nodes.
Software Upgrade
- PostgreSQL 14.2
- PostGIS 3.2
- TimescaleDB 2.6
- Patroni 2.1.3 (Prometheus Metrics + Failover Slots)
- HAProxy 2.5.5 (Fix stats error, more metrics)
- PG Exporter 0.4.1 (Timeout Parameters, and)
- Grafana 8.4.4
- Prometheus 2.33.4
- Greenplum 6.19.4 / MatrixDB 4.4.0
- Loki are now shipped as rpm packages instead of zip archives
Bug Fix
- Remove consul dependency for patroni , which makes it much more easier to migrate to a new consul cluster
- Fix prometheus bin/new scripts default data dir path :
/export/prometheus
to/data/prometheus
- Fix typos and tasks
- Add restart seconds to vip-manager systemd service
API Changes
New Variable
node_cluster
: Identity variable for node clusternodename_overwrite
: If set, nodename will be set to node’s hostnamenodename_exchange
: exchange node hostname (in/etc/hosts
) among play hostsnode_dns_hosts_extra
: extra static dns records which can be easily overwritten by single instance/clusterpatroni_enabled
: if disabled, postgres & patroni bootstrap will not be performed during rolepostgres
pgbouncer_enabled
: if disabled, pgbouncer will not be launched during rolepostgres
pg_exporter_params
: extra url parameters for pg_exporter when generating monitor target url.pg_provision
: bool var to indicate whether perform provision part of rolepostgres
(template, db,user)no_cmdb
: cli args forinfra.yml
andinfra-demo.yml
playbook which will not create cmdb on meta node.
MD5 (app.tgz) = f887313767982b31a2b094e5589a75ea
MD5 (matrix.tgz) = 3d063437c482d94bd7e35df1a08bbc84
MD5 (pigsty.tgz) = e143b88ebea1474f9ebaffddc6072c49
MD5 (pkg.tgz) = 73e8f5ce995b1f1760cb63c1904fb91b
v1.4.1
Routine bug fix / Docker Support / English Docs
Now docker is enabled on meta node by default. You can launch ton’s of SaaS with it
English document is available now.
- add docker to default packages
- add docker-compose to default pacakge list
- disable nameserver by default & enable docker role by default
Bug Fix
- fix promtail & loki config var issue
- Fix grafana legacy alerts.
- Disable nameserver by default
- Rename pg-alias.sh for patroni shortcuts
- disable exemplars queries for all dashboards
- fix loki data dir issue https://github.com/Vonng/pigsty/issues/100
- change autovacuum_freeze_max_age from 100000000 to 1000000000
v1.3.0 Release Note
1.3.0
-
[ENHANCEMENT] Redis Deployment (cluster,sentinel,standalone)
-
[ENHANCEMENT] Redis Monitor
- Redis Overview Dashboard
- Redis Cluster Dashboard
- Redis Instance Dashboard
-
[ENHANCEMENT] monitor: PGCAT Overhaul
- New Dashboard: PGCAT Instance
- New Dashboard: PGCAT Database Dashboard
- Remake Dashboard: PGCAT Table
-
[ENHANCEMENT] monitor: PGSQL Enhancement
- New Panels: PGSQL Cluster, add 10 key metrics panel (toggled by default)
- New Panels: PGSQL Instance, add 10 key metrics panel (toggled by default)
- Simplify & Redesign: PGSQL Service
- Add cross-references between PGCAT & PGSL dashboards
-
[ENHANCEMENT] monitor deploy
- Now grafana datasource is automatically registered during monly deployment
-
[ENHANCEMENT] software upgrade
- add PostgreSQL 13 to default package list
- upgrade to PostgreSQL 14.1 by default
- add greenplum rpm and dependencies
- add redis rpm & source packages
- add perf as default packages
v1.3.1
[Monitor]
- PGSQL & PGCAT Dashboard polish
- optimize layout for pgcat instance & pgcat database
- add key metrics panels to pgsql instance dashboard, keep consist with pgsql cluster
- add table/index bloat panels to pgcat database, remove pgcat bloat dashboard.
- add index information in pgcat database dashboard
- fix broken panels in grafana 8.3
- add redis index in nginx homepage
[Deploy]
- New
infra-demo.yml
playbook for one-pass bootstrap - Use
infra-jupyter.yml
playbook to deploy optional jupyter lab server - Use
infra-pgweb.yml
playbook to deploy optional pgweb server - New
pg
alias on meta node, can initiate postgres cluster from admin user (in addition to postgres) - Adjust all patroni conf templates’s
max_locks_per_transactions
according totimescaledb-tune
’s advise - Add
citus.node_conninfo: 'sslmode=prefer'
to conf templates in order to use citus without SSL - Add all extensions (except for pgrouting) in pgdg14 in package list
- Upgrade node_exporter to v1.3.1
- Add PostgREST v9.0.0 to package list. Generate API from postgres schema.
[BugFix]
- Grafana’s security breach (upgrade to v8.3.1 issue)
- fix
pg_instance
&pg_service
inregister
role when start from middle of playbook - Fix nginx homepage render issue when host without
pg_cluster
variable exists - Fix style issue when upgrading to grafana 8.3.1
v1.2.0 Release Note
v1.2.0
- [ENHANCEMENT] Use PostgreSQL 14 as default version
- [ENHANCEMENT] Use TimescaleDB 2.5 as default extension
- now timescaledb & postgis are enabled in cmdb by default
- [ENHANCEMENT] new monitor-only mode:
- you can use pigsty to monitor existing pg instances with a connectable url only
- pg_exporter will be deployed on meta node locally
- new dashboard PGSQL Cluster Monly for remote clusters
- [ENHANCEMENT] Software upgrade
- grafana to 8.2.2
- pev2 to v0.11.9
- promscale to 0.6.2
- pgweb to 0.11.9
- Add new extensions: pglogical pg_stat_monitor orafce
- [ENHANCEMENT] Automatic detect machine spec and use proper
node_tune
andpg_conf
templates - [ENHANCEMENT] Rework on bloat related views, now more information are exposed
- [ENHANCEMENT] Remove timescale & citus internal monitoring
- [ENHANCEMENT] New playbook
pgsql-audit.yml
to create audit report. - [BUG FIX] now pgbouncer_exporter resource owner are {{ pg_dbsu }} instead of postgres
- [BUG FIX] fix pg_exporter duplicate metrics on pg_table pg_index while executing
REINDEX TABLE CONCURRENTLY
- [CHANGE] now all config templates are minimize into two: auto & demo. (removed:
pub4, pg14, demo4, tiny, oltp
)pigsty-demo
is configured ifvagrant
is the default user, otherwisepigsty-auto
is used.
How to upgrade from v1.1.1
There’s no API change in 1.2.0 You can still use old pigsty.yml
configuration files (PG13).
For the infrastructure part. Re-execution of repo
will do most of the parts
As for the database. You can still use the existing PG13 instances. In-place upgrade is quite tricky especially when involving extensions such as PostGIS & Timescale. I would highly recommend performing a database migration with logical replication.
The new playbook pgsql-migration.yml
will make this a lot easier. It will create a series of
scripts which will help you to migrate your cluster with near-zero downtime.
v1.1.0 Release Note
v1.1.0
- [ENHANCEMENT] add
pg_dummy_filesize
to create fs space placeholder - [ENHANCEMENT] home page overhaul
- [ENHANCEMENT] add jupyter lab integration
- [ENHANCEMENT] add pgweb console integration
- [ENHANCEMENT] add pgbadger support
- [ENHANCEMENT] add pev2 support, explain visualizer
- [ENHANCEMENT] add pglog utils
- [ENHANCEMENT] update default pkg.tgz software version:
- upgrade postgres to v13.4 (with official pg14 support)
- upgrade pgbouncer to v1.16 (metrics definition updates)
- upgrade grafana to v8.1.4
- upgrade prometheus to v2.2.29
- upgrade node_exporter to v1.2.2
- upgrade haproxy to v2.1.1
- upgrade consul to v1.10.2
- upgrade vip-manager to v1.0.1
API Changes
-
nginx_upstream
now holds different structures. (incompatible) -
new config entries:
app_list
, render into home page’s nav entries -
new config entries:
docs_enabled
, setup local docs on default server. -
new config entries:
pev2_enabled
, setup local pev2 utils. -
new config entries:
pgbadger_enabled
, create log summary/report dir -
new config entries:
jupyter_enabled
, enable jupyter lab server on meta node -
new config entries:
jupyter_username
, specify which user to run jupyter lab -
new config entries:
jupyter_password
, specify jupyter lab default password -
new config entries:
pgweb_enabled
, enable pgweb server on meta node -
new config entries:
pgweb_username
, specify which user to run pgweb -
rename internal flag
repo_exist
intorepo_exists
-
now default value for
repo_address
ispigsty
instead ofyum.pigsty
-
now haproxy access point is
http://pigsty
instead ofhttp://h.pigsty
v1.1.1
- [ENHANCEMENT] replace timescaledb
apache
version withtimescale
version - [ENHANCEMENT] upgrade prometheus to 2.30
- [BUG FIX] now pg_exporter config dir’s owner are {{ pg_dbsu }} instead of prometheus
How to upgrade from v1.1.0
The major change in this release is timescaledb. Which replace old apache
license version with timescale
license version
stop/pause postgres instance with timescaledb
yum remove -y timescaledb_13
[timescale_timescaledb]
name=timescale_timescaledb
baseurl=https://packagecloud.io/timescale/timescaledb/el/7/$basearch
repo_gpgcheck=0
gpgcheck=0
enabled=1
yum install timescaledb-2-postgresql13
v1.0.0 Release Note
v1.0.0
Highlights
-
Monitoring System Overhaul
- New Dashboards on Grafana 8.0
- New metrics definition, with extra PG14 support
- Simplified labeling system: static label set: (job, cls, ins)
- New Alerting Rules & Derived Metrics
- Monitoring multiple database at one time
- Realtime log search & csvlog analysis
- Link-Rich Dashboards, click graphic elements to drill-down|roll-up
-
Architecture Changes
- Add citus & timescaledb as part of default installation
- Add PostgreSQL 14beta2 support
- Simply haproxy admin page index
- Decouple infra & pgsql by adding a new role
register
- Add new role
loki
andpromtail
for logging - Add new role
environ
for setting up environment for admin user on admin node - Using
static
service-discovery for prometheus by default (instead ofconsul
) - Add new role
remove
to gracefully remove cluster & instance - Upgrade prometheus & grafana provisioning logics.
- Upgrade to vip-manager 1.0 , node_exporter 1.2 , pg_exporter 0.4, grafana 8.0
- Now every database on every instance can be auto-registered as grafana datasource
- Move consul register tasks to role
register
, change consul service tags - Add cmdb.sql as pg-meta baseline definition (CMDB & PGLOG)
-
Application Framework
- Extensible framework for new functionalities
- core app: PostgreSQL Monitor System:
pgsql
- core app: PostgreSQL Catalog explorer:
pgcat
- core app: PostgreSQL Csvlog Analyzer:
pglog
- add example app
covid
for visualizing covid-19 data. - add example app
isd
for visualizing isd data.
-
Misc
- Add jupyterlab which brings entire python environment for data science
- Add
vonng-echarts-panel
to bring Echarts support back. - Add wrap script
createpg
,createdb
,createuser
- Add cmdb dynamic inventory scripts:
load_conf.py
,inventory_cmdb
,inventory_conf
- Remove obsolete playbooks:
pgsql-monitor
,pgsql-service
,node-remove
, etc….
API Change
- new var :
node_meta_pip_install
- rename var:
grafana_url
tografana_endpoint
- new var:
grafana_admin_username
- new var:
grafana_database
- new var:
grafana_pgurl
- new var:
pg_shared_libraries
- new var:
pg_exporter_auto_discovery
- new var:
pg_exporter_exclude_database
- new var:
pg_exporter_include_database
Bug Fix
- Fix default timezone Asia/Shanghai (CST) issue
- Fix nofile limit for pgbouncer & patroni
- Pgbouncer userlist & database list will be generated when executing tag
pgbouncer
v1.0.1
2021-09-14
- Documentation Update
- Chinese document now viable
- Machine-Translated English document now viable
- Bug Fix:
pgsql-remove
does not remove primary instance. - Bug Fix: replace pg_instance with pg_cluster + pg_seq
- Start-At-Task may fail due to pg_instance undefined
- Bug Fix: remove citus from default shared preload library
- citus will force max_prepared_transaction to non-zero value
- Bug Fix: ssh sudo checking in
configure
:- now
ssh -t sudo -n ls
is used for privilege checking
- now
- Typo Fix:
pg-backup
script typo - Alert Adjust: Remove ntp sanity check alert (dupe with ClockSkew)
- Exporter Adjust: remove collector.systemd to reduce overhead
v0.9.0 Release Note
Pigsty v0.9.0
Features
-
One-Line Installation
Run this on meta node
/bin/bash -c "$(curl -fsSL https://pigsty.cc/install)"
-
MetaDB provisioning
Now you can use pgsql database on meta node as inventory instead of static yaml file affter bootstrap.
-
Add Loki & Prometail as optinal logging collector
Now you can view, query, search postgres|pgbouncer|patroni logs with Grafana UI (PG Instance Log)
-
Pigsty CLI/GUI (beta)
Mange you pigsty deployment with much more human-friendly command line interface.
Bug Fix
- Log related issues
- fix
connection reset by peer
entries in postgres log caused by Haproxy health check. - fix
Connect Reset Exception
in patroni logs caused by haproxy health check - fix patroni log time format (remove mill seconds, add timezone)
- set
log_min_duration_statement=1s
fordbuser_monitor
to get ride of monitor logs.
- fix
- Fix
pgbouncer-create-user
does not handle md5 password properly - Fix obsolete
Makefile
entries - Fix node dns nameserver lost when abort during resolv.conf rewrite
- Fix db/user template and entry not null check
API Change
- Set default value of
node_disable_swap
tofalse
- Remove example enties of
node_sysctl_params
. grafana_plugin
defaultinstall
will now download from CDN if plugins not existsrepo_url_packages
now download rpm via pigsty CDN to accelerate.proxy_env.no_proxy
now add pigsty CDN tonoproxy
sites。grafana_customize
set tofalse
by default,enable it means install pigsty pro UI.node_admin_pk_current
add current user’s~/.ssh/id_rsa.pub
to admin pksloki_clean
whether to cleanup existing loki data during initloki_data_dir
set default data dir for loki logging servicepromtail_enabled
enabling promtail logging agent service?promtail_clean
remove existing promtail status during init?promtail_port
default port used by promtail, 9080 by defaultpromtail_status_file
location of promtail status filepromtail_send_url
endpoint of loki service which receives log data
v0.8.0 Release Note
Pigsty v0.8.0
Pigsty now is in RC status with guaranteed API stability.
New Features
- Service provision.
- full locale support.
API Changes
Role vip
and haproxy
are merged into service
.
#------------------------------------------------------------------------------
# SERVICE PROVISION
#------------------------------------------------------------------------------
pg_weight: 100 # default load balance weight (instance level)
# - service - #
pg_services: # how to expose postgres service in cluster?
# primary service will route {ip|name}:5433 to primary pgbouncer (5433->6432 rw)
- name: primary # service name {{ pg_cluster }}_primary
src_ip: "*"
src_port: 5433
dst_port: pgbouncer # 5433 route to pgbouncer
check_url: /primary # primary health check, success when instance is primary
selector: "[]" # select all instance as primary service candidate
# replica service will route {ip|name}:5434 to replica pgbouncer (5434->6432 ro)
- name: replica # service name {{ pg_cluster }}_replica
src_ip: "*"
src_port: 5434
dst_port: pgbouncer
check_url: /read-only # read-only health check. (including primary)
selector: "[]" # select all instance as replica service candidate
selector_backup: "[? pg_role == `primary`]" # primary are used as backup server in replica service
# default service will route {ip|name}:5436 to primary postgres (5436->5432 primary)
- name: default # service's actual name is {{ pg_cluster }}-{{ service.name }}
src_ip: "*" # service bind ip address, * for all, vip for cluster virtual ip address
src_port: 5436 # bind port, mandatory
dst_port: postgres # target port: postgres|pgbouncer|port_number , pgbouncer(6432) by default
check_method: http # health check method: only http is available for now
check_port: patroni # health check port: patroni|pg_exporter|port_number , patroni by default
check_url: /primary # health check url path, / as default
check_code: 200 # health check http code, 200 as default
selector: "[]" # instance selector
haproxy: # haproxy specific fields
maxconn: 3000 # default front-end connection
balance: roundrobin # load balance algorithm (roundrobin by default)
default_server_options: 'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'
# offline service will route {ip|name}:5438 to offline postgres (5438->5432 offline)
- name: offline # service name {{ pg_cluster }}_replica
src_ip: "*"
src_port: 5438
dst_port: postgres
check_url: /replica # offline MUST be a replica
selector: "[? pg_role == `offline` || pg_offline_query ]" # instances with pg_role == 'offline' or instance marked with 'pg_offline_query == true'
selector_backup: "[? pg_role == `replica` && !pg_offline_query]" # replica are used as backup server in offline service
pg_services_extra: [] # extra services to be added
# - haproxy - #
haproxy_enabled: true # enable haproxy among every cluster members
haproxy_reload: true # reload haproxy after config
haproxy_policy: roundrobin # roundrobin, leastconn
haproxy_admin_auth_enabled: false # enable authentication for haproxy admin?
haproxy_admin_username: admin # default haproxy admin username
haproxy_admin_password: admin # default haproxy admin password
haproxy_exporter_port: 9101 # default admin/exporter port
haproxy_client_timeout: 3h # client side connection timeout
haproxy_server_timeout: 3h # server side connection timeout
# - vip - #
vip_mode: none # none | l2 | l4
vip_reload: true # whether reload service after config
# vip_address: 127.0.0.1 # virtual ip address ip (l2 or l4)
# vip_cidrmask: 24 # virtual ip address cidr mask (l2 only)
# vip_interface: eth0 # virtual ip network interface (l2 only)
New Options
# - localization - #
pg_encoding: UTF8 # default to UTF8
pg_locale: C # default to C
pg_lc_collate: C # default to C
pg_lc_ctype: en_US.UTF8 # default to en_US.UTF8
pg_reload: true # reload postgres after hba changes
vip_mode: none # none | l2 | l4
vip_reload: true # whether reload service after config
Remove Options
haproxy_check_port # covered by service options
haproxy_primary_port
haproxy_replica_port
haproxy_backend_port
haproxy_weight
haproxy_weight_fallback
vip_enabled # replace by vip_mode
Service
pg_services
and pg_services_extra
Defines the services in cluster:
A service has some mandatory fields:
name
: service’s namesrc_port
: which port to listen and expose service?selector
: which instances belonging to this service?
# default service will route {ip|name}:5436 to primary postgres (5436->5432 primary)
- name: default # service's actual name is {{ pg_cluster }}-{{ service.name }}
src_ip: "*" # service bind ip address, * for all, vip for cluster virtual ip address
src_port: 5436 # bind port, mandatory
dst_port: postgres # target port: postgres|pgbouncer|port_number , pgbouncer(6432) by default
check_method: http # health check method: only http is available for now
check_port: patroni # health check port: patroni|pg_exporter|port_number , patroni by default
check_url: /primary # health check url path, / as default
check_code: 200 # health check http code, 200 as default
selector: "[]" # instance selector
haproxy: # haproxy specific fields
maxconn: 3000 # default front-end connection
balance: roundrobin # load balance algorithm (roundrobin by default)
default_server_options: 'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'
Database
Add additional locale support: lc_ctype
and lc_collate
.
It’s mainly because of pg_trgm
’s weird behavior on i18n characters.
pg_databases:
- name: meta # name is the only required field for a database
# owner: postgres # optional, database owner
# template: template1 # optional, template1 by default
# encoding: UTF8 # optional, UTF8 by default , must same as template database, leave blank to set to db default
# locale: C # optional, C by default , must same as template database, leave blank to set to db default
# lc_collate: C # optional, C by default , must same as template database, leave blank to set to db default
# lc_ctype: C # optional, C by default , must same as template database, leave blank to set to db default
allowconn: true # optional, true by default, false disable connect at all
revokeconn: false # optional, false by default, true revoke connect from public # (only default user and owner have connect privilege on database)
# tablespace: pg_default # optional, 'pg_default' is the default tablespace
connlimit: -1 # optional, connection limit, -1 or none disable limit (default)
extensions: # optional, extension name and where to create
- {name: postgis, schema: public}
parameters: # optional, extra parameters with ALTER DATABASE
enable_partitionwise_join: true
pgbouncer: true # optional, add this database to pgbouncer list? true by default
comment: pigsty meta database # optional, comment string for database
v0.7.0 Release Note
v0.7.0
Overview
-
Monitor Only Deployment
- Now you can monitor existing postgres clusters without Pigsty provisioning solution.
- Intergration with other provisioning solution is available and under further test.
-
Database/User Management
- Update user/database definition schema to cover more usecases.
- Add
pgsql-createdb.yml
andpgsql-createuser.yml
to mange user/db on running clusters.
Features
- Monitor Only Deployment Support #25
- Split monolith static monitor target file into per-cluster conf #36
- Add create user playbook #29
- Add create database playbook #28
- Database provisioning interface enhancement #33
- User provisioning interface enhancement #34
Bug Fix
API Changes
New Options
prometheus_sd_target: batch # batch|single
exporter_install: none # none|yum|binary
exporter_repo_url: '' # add to yum repo if set
node_exporter_options: '--no-collector.softnet --collector.systemd --collector.ntp --collector.tcpstat --collector.processes' # default opts for node_exporter
pg_exporter_url: '' # optional, overwrite default pg_exporter target
pgbouncer_exporter_url: '' # optional, overwrite default pgbouncer_expoter target
Remove Options
exporter_binary_install: false # covered by exporter_install
Structure Changes
pg_default_roles # refer to pg_users
pg_users # refer to pg_users
pg_databases # refer to pg_databases
Rename Options
pg_default_privilegs -> pg_default_privileges # fix typo
Enhancement
Monitoring Provisioning Enhancement
- Decouple consul #13
- Binary install mode for node_exporter and pg_exporter #14
- Prometheus static targets mode support #11
Haproxy Enhancement
- Adjust relative traffic weight with configuration #10
- HAProxy admin page access via nginx #12
- Readonly traffic fallback on primary if all replicas down #8
Security Enhancement
Software Update
-
Prometheus 2.25 / Grafana 7.4 / Consul 1.9.3 / Node Exporter 1.1 / PG Exporter 0.3.2
API Change
New Config Entries
service_registry: consul # none | consul | etcd | both
prometheus_options: '--storage.tsdb.retention=30d' # prometheus cli opts
prometheus_sd_method: consul # Prometheus service discovery method:static|consul
prometheus_sd_interval: 2s # Prometheus service discovery refresh interval
pg_offline_query: false # set to true to allow offline queries on this instance
node_exporter_enabled: true # enabling Node Exporter
pg_exporter_enabled: true # enabling PG Exporter
pgbouncer_exporter_enabled: true # enabling Pgbouncer Exporter
export_binary_install: false # install Node/PG Exporter via copy binary
dcs_disable_purge: false # force dcs_exists_action = abort to avoid dcs purge
pg_disable_purge: false # force pg_exists_action = abort to avoid pg purge
haproxy_weight: 100 # relative lb weight for backend instance
haproxy_weight_fallback: 1 # primary server weight in replica service group
Obsolete Config Entries
prometheus_metrics_path # duplicate with exporter_metrics_path
prometheus_retention # covered by `prometheus_options`
Database Definition
Database provisioning interface enhancement #33
Old Schema
pg_databases: # create a business database 'meta'
- name: meta
schemas: [meta] # create extra schema named 'meta'
extensions: [{name: postgis}] # create extra extension postgis
parameters: # overwrite database meta's default search_path
search_path: public, monitor
New Schema
pg_databases:
- name: meta # name is the only required field for a database
owner: postgres # optional, database owner
template: template1 # optional, template1 by default
encoding: UTF8 # optional, UTF8 by default
locale: C # optional, C by default
allowconn: true # optional, true by default, false disable connect at all
revokeconn: false # optional, false by default, true revoke connect from public # (only default user and owner have connect privilege on database)
tablespace: pg_default # optional, 'pg_default' is the default tablespace
connlimit: -1 # optional, connection limit, -1 or none disable limit (default)
extensions: # optional, extension name and where to create
- {name: postgis, schema: public}
parameters: # optional, extra parameters with ALTER DATABASE
enable_partitionwise_join: true
pgbouncer: true # optional, add this database to pgbouncer list? true by default
comment: pigsty meta database # optional, comment string for database
Changes
- Add new options:
template
,encoding
,locale
,allowconn
,tablespace
,connlimit
- Add new option
revokeconn
, which revoke connect privileges from public for this database - Add
comment
field for database
Apply Changes
You can create new database on running postgres clusters with pgsql-createdb.yml
playbook.
- Define your new database in config files
- Pass new database.name with option
pg_database
to playbook.
./pgsql-createdb.yml -e pg_database=<your_new_database_name>
User Definition
User provisioning interface enhancement #34
Old Schema
pg_users:
- username: test # example production user have read-write access
password: test # example user's password
options: LOGIN # extra options
groups: [ dbrole_readwrite ] # dborole_admin|dbrole_readwrite|dbrole_readonly
comment: default test user for production usage
pgbouncer: true # add to pgbouncer
New Schema
pg_users:
# complete example of user/role definition for production user
- name: dbuser_meta # example production user have read-write access
password: DBUser.Meta # example user's password, can be encrypted
login: true # can login, true by default (should be false for role)
superuser: false # is superuser? false by default
createdb: false # can create database? false by default
createrole: false # can create role? false by default
inherit: true # can this role use inherited privileges?
replication: false # can this role do replication? false by default
bypassrls: false # can this role bypass row level security? false by default
connlimit: -1 # connection limit, -1 disable limit
expire_at: '2030-12-31' # 'timestamp' when this role is expired
expire_in: 365 # now + n days when this role is expired (OVERWRITE expire_at)
roles: [dbrole_readwrite] # dborole_admin|dbrole_readwrite|dbrole_readonly
pgbouncer: true # add this user to pgbouncer? false by default (true for production user)
parameters: # user's default search path
search_path: public
comment: test user
Changes
username
field rename toname
groups
field rename toroles
options
now split into separated configration entries:login
,superuser
,createdb
,createrole
,inherit
,replication
,bypassrls
,connlimit
expire_at
andexpire_in
optionspgbouncer
option for user is nowfalse
by default
Apply Changes
You can create new users on running postgres clusters with pgsql-createuser.yml
playbook.
- Define your new users in config files (
pg_users
) - Pass new user.name with option
pg_user
to playbook.
./pgsql-createuser.yml -e pg_user=<your_new_user_name>
v0.6.0 Release Note
v0.6.0
Bug Fix
-
Merge Fix name of dashboard #1, Fix PG Overview Dashboard typo
-
Fix default primary instance to
pg-test-1
of clusterpg-test
in sandbox environment -
Fix obsolete comments
Enhancement
Monitoring Provisioning Enhancement
- Decouple consul #13
- Binary install mode for node_exporter and pg_exporter #14
- Prometheus static targets mode support #11
Haproxy Enhancement
- Adjust relative traffic weight with configuration #10
- HAProxy admin page access via nginx #12
- Readonly traffic fallback on primary if all replicas down #8
Security Enhancement
Software Update
-
Prometheus 2.25 / Grafana 7.4 / Consul 1.9.3 / Node Exporter 1.1 / PG Exporter 0.3.2
API Change
New Config Entries
service_registry: consul # none | consul | etcd | both
prometheus_options: '--storage.tsdb.retention=30d' # prometheus cli opts
prometheus_sd_method: consul # Prometheus service discovery method:static|consul
prometheus_sd_interval: 2s # Prometheus service discovery refresh interval
pg_offline_query: false # set to true to allow offline queries on this instance
node_exporter_enabled: true # enabling Node Exporter
pg_exporter_enabled: true # enabling PG Exporter
pgbouncer_exporter_enabled: true # enabling Pgbouncer Exporter
export_binary_install: false # install Node/PG Exporter via copy binary
dcs_disable_purge: false # force dcs_exists_action = abort to avoid dcs purge
pg_disable_purge: false # force pg_exists_action = abort to avoid pg purge
haproxy_weight: 100 # relative lb weight for backend instance
haproxy_weight_fallback: 1 # primary server weight in replica service group
Obsolete Config Entries
prometheus_metrics_path # duplicate with exporter_metrics_path
prometheus_retention # covered by `prometheus_options`
v0.5.0 Release Note
v0.5.0
Pigsty now have an Official Site 🎉 !
New Features
- Add Database Provision Template
- Add Init Template
- Add Business Init Template
- Refactor HBA Rules variables
- Fix dashboards bugs.
- Move
pg-cluster-replication
to default dashboards - Use ZJU PostgreSQL mirror as default to accelerate repo build phase.
- Move documentation to official site: https://pigsty.cc
- Download newly created offline installation packages: pkg.tgz (v0.5)
Database Provision Template
Now you can customize your database content with pigsty !
pg_users:
- username: test
password: test
comment: default test user
groups: [ dbrole_readwrite ] # dborole_admin|dbrole_readwrite|dbrole_readonly
pg_databases: # create a business database 'test'
- name: test
extensions: [{name: postgis}] # create extra extension postgis
parameters: # overwrite database meta's default search_path
search_path: public,monitor
pg-init-template.sql wil be used as default template1 database init script pg-init-business.sql will be used as default business database init script
you can customize default role system, schemas, extensions, privileges with variables now:
# - system roles - #
pg_replication_username: replicator # system replication user
pg_replication_password: DBUser.Replicator # system replication password
pg_monitor_username: dbuser_monitor # system monitor user
pg_monitor_password: DBUser.Monitor # system monitor password
pg_admin_username: dbuser_admin # system admin user
pg_admin_password: DBUser.Admin # system admin password
# - default roles - #
pg_default_roles:
- username: dbrole_readonly # sample user:
options: NOLOGIN # role can not login
comment: role for readonly access # comment string
- username: dbrole_readwrite # sample user: one object for each user
options: NOLOGIN
comment: role for read-write access
groups: [ dbrole_readonly ] # read-write includes read-only access
- username: dbrole_admin # sample user: one object for each user
options: NOLOGIN BYPASSRLS # admin can bypass row level security
comment: role for object creation
groups: [dbrole_readwrite,pg_monitor,pg_signal_backend]
# NOTE: replicator, monitor, admin password are overwritten by separated config entry
- username: postgres # reset dbsu password to NULL (if dbsu is not postgres)
options: SUPERUSER LOGIN
comment: system superuser
- username: replicator
options: REPLICATION LOGIN
groups: [pg_monitor, dbrole_readonly]
comment: system replicator
- username: dbuser_monitor
options: LOGIN CONNECTION LIMIT 10
comment: system monitor user
groups: [pg_monitor, dbrole_readonly]
- username: dbuser_admin
options: LOGIN BYPASSRLS
comment: system admin user
groups: [dbrole_admin]
- username: dbuser_stats
password: DBUser.Stats
options: LOGIN
comment: business read-only user for statistics
groups: [dbrole_readonly]
# object created by dbsu and admin will have their privileges properly set
pg_default_privilegs:
- GRANT USAGE ON SCHEMAS TO dbrole_readonly
- GRANT SELECT ON TABLES TO dbrole_readonly
- GRANT SELECT ON SEQUENCES TO dbrole_readonly
- GRANT EXECUTE ON FUNCTIONS TO dbrole_readonly
- GRANT INSERT, UPDATE, DELETE ON TABLES TO dbrole_readwrite
- GRANT USAGE, UPDATE ON SEQUENCES TO dbrole_readwrite
- GRANT TRUNCATE, REFERENCES, TRIGGER ON TABLES TO dbrole_admin
- GRANT CREATE ON SCHEMAS TO dbrole_admin
- GRANT USAGE ON TYPES TO dbrole_admin
# schemas
pg_default_schemas: [monitor]
# extension
pg_default_extensions:
- { name: 'pg_stat_statements', schema: 'monitor' }
- { name: 'pgstattuple', schema: 'monitor' }
- { name: 'pg_qualstats', schema: 'monitor' }
- { name: 'pg_buffercache', schema: 'monitor' }
- { name: 'pageinspect', schema: 'monitor' }
- { name: 'pg_prewarm', schema: 'monitor' }
- { name: 'pg_visibility', schema: 'monitor' }
- { name: 'pg_freespacemap', schema: 'monitor' }
- { name: 'pg_repack', schema: 'monitor' }
- name: postgres_fdw
- name: file_fdw
- name: btree_gist
- name: btree_gin
- name: pg_trgm
- name: intagg
- name: intarray
# postgres host-based authentication rules
pg_hba_rules:
- title: allow meta node password access
role: common
rules:
- host all all 10.10.10.10/32 md5
- title: allow intranet admin password access
role: common
rules:
- host all +dbrole_admin 10.0.0.0/8 md5
- host all +dbrole_admin 172.16.0.0/12 md5
- host all +dbrole_admin 192.168.0.0/16 md5
- title: allow intranet password access
role: common
rules:
- host all all 10.0.0.0/8 md5
- host all all 172.16.0.0/12 md5
- host all all 192.168.0.0/16 md5
- title: allow local read-write access (local production user via pgbouncer)
role: common
rules:
- local all +dbrole_readwrite md5
- host all +dbrole_readwrite 127.0.0.1/32 md5
- title: allow read-only user (stats, personal) password directly access
role: replica
rules:
- local all +dbrole_readonly md5
- host all +dbrole_readonly 127.0.0.1/32 md5
pg_hba_rules_extra: []
# pgbouncer host-based authentication rules
pgbouncer_hba_rules:
- title: local password access
role: common
rules:
- local all all md5
- host all all 127.0.0.1/32 md5
- title: intranet password access
role: common
rules:
- host all all 10.0.0.0/8 md5
- host all all 172.16.0.0/12 md5
- host all all 192.168.0.0/16 md5
pgbouncer_hba_rules_extra: []
v0.4.0 Release Note
v0.4.0
The second public beta (v0.4.0) of pigsty is available now ! 🎉
Monitoring System
Skim version of monitoring system consist of 10 essential dashboards:
- PG Overview
- PG Cluster
- PG Service
- PG Instance
- PG Database
- PG Query
- PG Table
- PG Table Catalog
- PG Table Detail
- Node
Software upgrade
- Upgrade to PostgreSQL 13.1, Patroni 2.0.1-4, add citus to repo.
- Upgrade to
pg_exporter 0.3.1
- Upgrade to Grafana 7.3, Ton’s of compatibility work
- Upgrade to prometheus 2.23, with new UI as default
- Upgrade to consul 1.9
Misc
- Update prometheus alert rules
- Fix alertmanager info links
- Fix bugs and typos.
- add a simple backup script
Offline Installation
- pkg.tgz is the latest offline install package (1GB rpm packages, made under CentOS 7.8)
v0.3.0 Release Note
v0.3.0
The first public beta (v0.3.0) of pigsty is available now ! 🎉
Monitoring System
Skim version of monitoring system consist of 8 essential dashboards:
- PG Overview
- PG Cluster
- PG Service
- PG Instance
- PG Database
- PG Table Overview
- PG Table Catalog
- Node
Database Cluster Provision
- All config files are merged into one file:
conf/all.yml
by default - Use
infra.yml
to provision meta node(s) and infrastructure - Use
initdb.yml
to provision database clusters - Use
ins-add.yml
to add new instance to database cluster - Use
ins-del.yml
to remove instance from database cluster
Offline Installation
- pkg.tgz is the latest offline install package (1GB rpm packages, made under CentOS 7.8)