[CASE STUDY]
[ INDUSTRY ]
Fintech
[ CHALLENGES ]
Scale, performance, availability challenges with legacy systems.
[ SOLUTION ]
A globally available, strictly consistent, and always-on platform built on CockroachDB.
transactions per year
payments/minute at peak traffic
data loss
SumUp’s original infrastructure was built on AWS RDS for PostgreSQL. As the business grew, they started to face several challenges associated with scale, performance, availability, and maintenance.
First, they were vertically scaling PostgreSQL to the point where there was nowhere else to go. Manual sharding is a well-known workaround, but causes a lot of operational overhead. Additionally, RDS for PostgreSQL has a single primary write architecture with one writer and many readers. This was becoming a performance bottleneck and was a single point of failure which put them at risk for outages.
Not only were they concerned about unplanned downtown disruptions, but they were also regularly taking the database offline for routine maintenance, such as DB version upgrades. This type of planned downtime was no longer acceptable to SumUp, as they’re across multiple markets, continents and time zones. There’s no time for a scheduled downtime.
Overall, operational complexity was becoming a problem at scale and they wanted a distributed system that would help solve some of these challenges out of the box. At a more granular level, these were their requirements for a new solution:
Ability to horizontally scale out to multiple regions
Native change data capture (CDC) with first-class support for Debezium
Online schema changes
Multi-node write support to scale traffic and maintain resilience
ACID guarantees for correctness
Standard SQL support for developer efficiency
PCI-DSS compliant
They started to evaluate several database options including CockroachDB, Yugabyte, ScyllaDB, and MongoDB. When they did a side-by-side feature comparison, CockroachDB stood out in a few particular areas: strict serializability, truly global multi-region writes, full online schema changes support, and native CDC.
We were looking for a strictly serializable consistent database that was PCI-DSS compliant and would allow us to scale horizontally across multiple regions. We also wanted to bring the latest engineering practices to our team. CockroachDB was the best database solution that met all of our requirements.
Anton Antonov
Engineering Manager, SumUp
After selecting CockroachDB as their new database, SumUp organized a plan to move workloads from their legacy PostgreSQL setup to CockroachDB, while keeping the system online and data consistent. This is a high-level overview of the migration, but to hear the full story from SumUp’s Engineering Manager, Anton Antonov, click here.
During the process, all requests went through a “migration proxy” which was routed to either the old database (PostgreSQL) or new database (CockroachDB) based on the Universally Unique Identifier’s (UUID) versions. Debezium read changes from PostgreSQL and feed them into Kafka ensuring that the post-processing analytics were completed and consistent during the migration. Both the old database and CockroachDB were fed the same Kafka stream to guarantee there was no breakage for customers downstream in areas like analytics, reporting, and back office tools.
The dual write setup allowed them to validate and backfill while gradually shifting the load. Ultimately the SumUp team completed the migration with zero downtime, zero data loss, and zero rollback horror story. The diagram below illustrates what this process looked like behind the scenes.
Following the migration, all online payment processing routes to CockroachDB (see diagram below). The SumUp team transformed migration components to Payments API which is an API gateway and aggregator. This will enable them to break down the payments gateway into smaller microservices when the time comes in the future.
Kafka continues to serve as the backbone for decoupled communications, and CockroachDB’s built-in change data capture (CDC) feeds the long-term storage for future replays and audits. With CockroachDB, SumUp was able to build a globally available, strictly consistent, observable, and always compliant platform built for long-term resilience and growth.
While the before and after diagrams may seem similar, there’s a lot of benefits that the SumUp team gained from CockroachDB. Since the migration, there’s been no data loss, and they’ve been performing zero downtime upgrades. They now have first-class support for Golang clients which was causing some operational challenges before. And, SumUp now has a self-healing architecture that automatically rebalances data to handle increases in workload.
Today, they are utilizing CockroachDB for payment processing which is the main component of their core payment platform. They are also leveraging it for payments reporting, identity management, payments ledger, point of sale (POS), and other areas of the business that benefit from its capabilities.
SumUp runs CockroachDB in a single AWS region, but they have plans to scale out to multiple regions in the future. Since the company operates in Europe, there’s several regulations such as the Digital Operational Resilience Act (DORA) that they are required to adhere to. Fortunately CockroachDB is cloud-native and will allow them to run their deployment across multiple clouds should the time come.
When switching to a new technology, there’s always some lessons to be learned along the way. Anton remarked that they needed to think differently about how a distributed database would work. He recommended embracing Cockroach Labs documentation, and to make sure you are following the best practices.
When it comes to their new setup, Anton said it's smart to “treat CockroachDB as a first class citizen” and to integrate it into your observability stack. This will help you debug issues faster and always have insight into what is going on behind the scenes. He also recommends automating as much as possible to alleviate unnecessary operations for your team.
Testimonial
Anton Antonov
Engineering Manager, SumUp
In 2024, SumUp reported that they processed over 1B transactions in a year and had around 10K payments per minute during peak periods. Over the next few years, the company plans to expand its presence around the globe.
While tackling global expansion, the engineering team will also continue to work on improving the observability, scalability and resiliency of their systems. Anton is excited to share his knowledge of CockroachDB with other engineering teams so they can take advantage of its distributed capabilities that are well suited to support fintech use cases.
To learn more about SumUp’s offerings, visit their website: https://www.sumup.com/