Fighting PostgreSQL write amplification with HOT updates
Our database setup at Adyen is unique for a few reasons. We currently process 5,000+ PostgreSQL transactions per second across multiple clusters. In addition, as a payments service provider dealing with sensitive financial data, redundancy is even more critical than normal, and downtime is unacceptable.
On top of that, we’re scaling our infrastructure at a rapid rate. In 2015, our database was under 10 terabytes. But our latest upgrade was 50 terabytes, and still growing fast. With global digital payments volumes projected to reach over 700 billion annual transactions by 2020, we need to think in terms of much bigger volumes than we currently have even now.
In this post I’ll look at Adyen’s evolving approach to updating our PostgreSQL database, against the requirements and challenges posed by redundancy, uptime, and scalability.
My colleague Michiel has written previously on how we have made a conscious decision to be “ruthless” when choosing a solution. We make careful decisions that give significant weight to the reliability of a solution and the maturity of the ecosystem (support, community, documentation and so on) that surrounds it, as well as the functionality. We built our database stack with PostgreSQL, as it provides consistency, isolation, and reliability we need. In addition, it is open source, and has an active ecosystem including multiple support and consultancy companies. Finally, as we work with sensitive financial data, having a transactional database was key as it ensures we don’t lose track of records.
Our PostgreSQL database clusters consist of one primary and at least three replica servers, spread over multiple data centers. The database servers are dual socket machines with 768 GB of RAM. Each server connects to its own shared storage device with fiber channel or ISCSI, and has a raw capacity of 150+ terabytes of SSDs and average compression ratio of 1:6.
One other detail to note is that we built our software architecture in such a way that we can stop traffic to our PostgreSQL databases, queue the transactions, and run a PostgreSQL update without affecting payments acceptance.
As recently as 2015, our database needs were much simpler. Our approach to upgrading to new PostgreSQL versions followed these steps:
At 10 terabytes it took a few days before we got the new server up and running, and this process was becoming progressively more time-consuming. Furthermore, we could foresee a potential impact on redundancy in the long term, as replicas were sacrificed at certain points in the process. With a 50-terabyte database on the horizon, the length of time required and the potential risks to redundancy started to become unacceptable.
By the time 9.6 was to be released, we needed a smarter, more scalable solution. In reality, our options were limited. Continuing with logical replication was impossible for the reasons I outlined above. The only other option from a PostgreSQL perspective was to run a pg_upgrade on the primary. However, this would have meant that you would have only one primary, which is also an unacceptable situation from a redundancy perspective.
As PostgreSQL options were not suitable for the next upgrade, in parallel we considered other possibilities. Our storage devices were able to make instant snapshots and also make them available on remote storage devices over the network, within a much smaller timeframe. We started combining the options, and ultimately, our solution involved the following steps:
This process then needs to be repeated across all clusters. Altogether, the process takes around a day including preparation work.
This approach has a number of advantages:
The beauty of our new approach is that it can continue to scale indefinitely –horizontally, by adding more clusters, and vertically, by using storage devices with increased capacity.
In fact, we have doubled the number of live clusters in the past year, and our biggest cluster has already grown to 74 terabytes. And later this year, we are looking forward to using this approach to update to PostgreSQL 10.3+, without any of the headaches associated with our previous approach.
By submitting this form, you acknowledge that you have reviewed the terms of our Privacy Statement and consent to the use of data in accordance therewith.