Pigsty: The Production-Ready PostgreSQL Distribution
What is Pigsty?
Pigsty is a production-ready, batteries-included PostgreSQL distribution.
A distribution, in this context, refers to a complete database solution comprised of the database kernel and a curated set of software packages. Just as Linux is an operating system kernel while RedHat, Debian, and SUSE are operating system distributions built upon it, PostgreSQL is a database kernel, and Pigsty, along with BigSQL, Percona, various cloud RDS offerings, and other database variants are database distributions built on top of it.
Pigsty distinguishes itself from other database distributions through five core features:
- A comprehensive and professional monitoring system
- A stable and reliable deployment solution
- A simple and hassle-free user interface
- A flexible and open extension mechanism
- A free and friendly open-source license
These five features make Pigsty a truly batteries-included PostgreSQL distribution.
Who Should Be Interested?
Pigsty caters to a diverse audience including DBAs, architects, ops engineers, software vendors, cloud providers, application developers, kernel developers, data engineers; those interested in data analysis and visualization; students, junior programmers, and anyone curious about databases.
For DBAs, architects, and other professional users, Pigsty offers a unique professional-grade PostgreSQL monitoring system that provides irreplaceable value for database management. Additionally, Pigsty comes with a stable and reliable, battle-tested production-grade PostgreSQL deployment solution that can automatically deploy database clusters with monitoring, alerting, log collection, service discovery, connection pooling, load balancing, VIP, and high availability in production environments.
For developers (application, kernel, or data), students, junior programmers, and database enthusiasts, Pigsty provides a low-barrier, one-click launch, one-click install local sandbox. This sandbox environment, identical to production except for machine specifications, includes a complete feature set: ready-to-use database instances and monitoring systems. It’s perfect for learning, development, testing, and data analysis scenarios.
Furthermore, Pigsty introduces a flexible extension mechanism called “Datalet”. Those interested in data analysis and visualization might be surprised to find that Pigsty can serve as an integrated development environment for data analysis and visualization. Pigsty integrates PostgreSQL with common data analysis plugins and comes with Grafana and embedded Echarts support, allowing users to write, test, and distribute data mini-applications (Datelets). Examples include “Additional monitoring dashboard packs for Pigsty”, “Redis monitoring system”, “PG log analysis system”, “Application monitoring”, “Data directory browser”, and more.
Finally, Pigsty adopts the free and friendly Apache License 2.0, making it free for commercial use. Cloud providers and software vendors are welcome to integrate and customize it for commercial use, as long as they comply with the Apache 2 License’s attribution requirements.
Comprehensive Professional Monitoring System
You can’t manage what you don’t measure.
— Peter F.Drucker
Pigsty provides a professional-grade monitoring system that delivers irreplaceable value to professional users.
To draw a medical analogy, basic monitoring systems are like heart rate monitors or pulse oximeters - tools that anyone can use without training. They show core vital signs: at least users know if the patient is about to die, but they’re not much help for diagnosis and treatment. Most monitoring systems provided by cloud vendors and software companies fall into this category: a dozen core metrics that tell you if the database is alive, giving you a rough idea and nothing more.
A professional-grade monitoring system, on the other hand, is more like a CT or MRI machine, capable of examining every detail inside the subject. Professional physicians can quickly identify diseases and potential issues from CT/MRI reports: treating what’s broken and maintaining what’s healthy. Pigsty can scrutinize every table, every index, every query in each database, providing comprehensive metrics (1,155 types) and transforming them into insights through thousands of dashboards: killing problems in their infancy and providing real-time feedback for performance optimization.
Pigsty’s monitoring system is built on industry best practices, using Prometheus and Grafana as monitoring infrastructure. It’s open-source, customizable, reusable, portable, and free from vendor lock-in. It can integrate with various existing database instances.
Stable and Reliable Deployment Solution
A complex system that works is invariably found to have evolved from a simple system that works.
—John Gall, Systemantics (1975)
If databases are software for managing data, then control systems are software for managing databases.
Pigsty includes a database control solution centered around Ansible, wrapped with command-line tools and a graphical interface. It integrates core database management functions: creating, destroying, and scaling database clusters; creating users, databases, and services. Pigsty adopts an “Infrastructure as Code” design philosophy using declarative configuration, describing and customizing databases and runtime environments through numerous optional configuration parameters, and automatically creating required database clusters through idempotent preset playbooks, providing a private cloud-like experience.
Pigsty creates distributed, highly available database clusters. Built on DCS, Patroni, and HAProxy, Pigsty’s database clusters achieve high availability. Each database instance in the cluster is idempotent in usage - any instance can provide complete read-write services through built-in load balancing components, delivering a distributed database experience. Database clusters can automatically detect failures and perform primary-replica failover, with common failures self-healing within seconds to tens of seconds, during which read-only traffic remains unaffected. During failures, as long as any instance survives, the cluster can provide complete service.
Pigsty’s architecture has been carefully designed and evaluated, focusing on achieving required functionality with minimal complexity. This solution has been validated through long-term, large-scale production environment use across internet/B/G/M/F industries.
Simple and Hassle-Free User Interface
Pigsty aims to lower PostgreSQL’s barrier to entry and has invested heavily in usability.
Installation and Deployment
Someone told me that each equation I included in the book would halve the sales.
— Stephen Hawking
Pigsty’s deployment consists of three steps: download source code, configure environment, and execute installation - all achievable through single commands. Following classic software installation patterns and providing a configuration wizard, all you need is a CentOS 7.8 machine with root access. For managing new nodes, Pigsty uses Ansible over SSH, requiring no agent installation, making deployment easy even for beginners.
Pigsty can manage hundreds or thousands of high-spec production nodes in production environments, or run independently on a local 1-core 1GB virtual machine as an out-of-the-box database instance. For local computer use, Pigsty provides a sandbox based on Vagrant and VirtualBox. It can spin up a database environment identical to production with one click, perfect for learning, development, testing, data analysis, and data visualization scenarios.
User Interface
Clearly, we must break away from the sequential and not limit the computers. We must state definitions and provide for priorities and descriptions of data. We must state relationships, not procedures.
—Grace Murray Hopper, Management and the Computer of the Future (1962)
Pigsty incorporates the essence of Kubernetes architecture design, using declarative configuration and idempotent operation playbooks. Users only need to describe “what kind of database they want” without worrying about how Pigsty creates or modifies it. Based on the user’s configuration manifest, Pigsty will create the required database cluster from bare metal nodes in minutes.
For management and usage, Pigsty provides different levels of user interfaces to meet various user needs. Novice users can use the one-click local sandbox and graphical user interface, while developers might prefer the pigsty-cli
command-line tool and configuration files for management. Experienced DBAs, ops engineers, and architects can directly control task execution through Ansible primitives for fine-grained control.
Flexible and Open Extension Mechanism
PostgreSQL’s extensibility has always been praised, with various extension plugins making it the most advanced open-source relational database. Pigsty respects this value and provides an extension mechanism called “Datalet”, allowing users and developers to further customize Pigsty and use it in “unexpected” ways, such as data analysis and visualization.
When we have a monitoring system and control solution, we also have an out-of-the-box visualization platform Grafana and a powerful database PostgreSQL. This combination packs a punch - especially for data-intensive applications. Users can perform data analysis and visualization, create data application prototypes with rich interactions, or even full applications without writing frontend or backend code.
Pigsty integrates Echarts and common map tiles, making it convenient to implement advanced visualization requirements. Compared to traditional scientific computing languages/plotting libraries like Julia, Matlab, and R, the PG + Grafana + Echarts combination allows you to create shareable, deliverable, standardized data applications or visualization works at minimal cost.
Pigsty’s monitoring system itself is a prime example of Datalet: all Pigsty advanced monitoring dashboards are published as Datelets. Pigsty also comes with some interesting Datalet examples: Redis monitoring system, COVID-19 data analysis, China’s seventh census population data analysis, PG log mining, etc. More out-of-the-box Datelets will be added in the future, continuously expanding Pigsty’s functionality and application scenarios.
Free and Friendly Open Source License
Once open source gets good enough, competing with it would be insane.
Larry Ellison —— Oracle CEO
In the software industry, open source is a major trend. The history of the internet is the history of open source software. One key reason why the IT industry is so prosperous today and people can enjoy so many free information services is open source software. Open source is a truly successful form of communism (better translated as communitarianism) composed of developers: software, the core means of production in the IT industry, becomes collectively owned by developers worldwide - all for one and one for all.
When an open source programmer works, their labor potentially embodies the wisdom of tens of thousands of top developers. Through open source, all community developers form a joint force, greatly reducing the waste of reinventing wheels. This has pushed the industry’s technical level forward at an incredible pace. The momentum of open source is like a snowball, unstoppable today. Except for some special scenarios and path dependencies, developing software behind closed doors and striving for self-reliance has become a joke.
Relying on open source and giving back to open source, Pigsty adopts the friendly Apache License 2.0, free for commercial use. Cloud providers and software vendors are welcome to integrate and customize it for commercial use, as long as they comply with the Apache 2 License’s attribution requirements.
About Pigsty
A system cannot be successful if it is too strongly influenced by a single person. Once the initial design is complete and fairly robust, the real test begins as people with many different viewpoints undertake their own experiments. — Donald Knuth
Pigsty is built around the open-source database PostgreSQL, the most advanced open-source relational database in the world, and Pigsty’s goal is to be the most user-friendly open-source PostgreSQL distribution.
Initially, Pigsty didn’t have such grand ambitions. Unable to find any monitoring system that met my needs in the market, I had to roll up my sleeves and build one myself. Surprisingly, it turned out better than expected, and many external PostgreSQL users wanted to use it. Then, deploying and delivering the monitoring system became an issue, so the database deployment and control parts were added; after production deployment, developers wanted local sandbox environments for testing, so the local sandbox was created; users found Ansible difficult to use, so the pigsty-cli
command-line tool was developed; users wanted to edit configuration files through UI, so Pigsty GUI was born. This way, as demands grew, features became richer, and Pigsty became more refined through long-term polishing, far exceeding initial expectations.
This project itself is a challenge - creating a distribution is somewhat like creating a RedHat, a SUSE, or an “RDS product”. Usually, only professional companies and teams of certain scale would attempt this. But I wanted to try: could one person do it? Actually, except for being slower, there’s nothing impossible about it. It’s an interesting experience switching between product manager, developer, and end-user roles, and the biggest advantage of “eating your own dog food” is that you’re both developer and user - you understand what you need and won’t cut corners on your own requirements.
However, as Donald Knuth said, “A system cannot be successful if it is too strongly influenced by a single person.” To make Pigsty a project with vigorous vitality, it must be open source, letting more people use it. “Once the initial design is complete and fairly robust, the real test begins as people with many different viewpoints undertake their own experiments.”
Pigsty has well solved my own problems and needs, and now I hope it can help more people and make the PostgreSQL ecosystem more prosperous and colorful.