How Devsisters built a world-class game with over 10 million downloads on CockroachDB
90 nodes in a single region
2x traffic during midnight releases
60K transactions per second
10 million downloads since January
360K concurrent users
2 years of development
Who wouldn’t want to build and design an entire kingdom out of cookies and protect it from other dessert monsters? Since its public release in January 2021, Cookie Run: Kingdom has been downloaded over 10 million times by gamers and cookie enthusiasts across the globe.
Devsisters is a South Korean-based video game development company founded in 2007. The name originates from their mission of forging a relationship with developers all around the world and creating a successful network of creators. They pride themselves on delivering the best possible player experience with excellent content, service, and technology.
Over the years, they’ve created several successful games in the OvenBreak and Cookie Run series with their most recent (and perhaps most successful) release to date being Cookie Run: Kingdom. If you are familiar with the Asia Pacific gaming industry, you might already know popular games (i.e. Pokemon, Dragon Quest, Mario) have the capability to turn into media franchises. Devsisters’ cookies characters are famous, available in local arcades, with merchandise and books for sale.
Anyone, anywhere in the world can download Cookie Run: Kingdom via Google Play, Apple App Store, and other mobile platforms. Prior to its launch, the creators were not sure how successful their latest game would be. Luckily, it exceeded their expectations and fortunately, they built Cookie Run: Kingdom on CockroachDB.
Looking back, it would have been impossible to scale this game on MySQL or Aurora. We experienced more than 6x the workload size we anticipated, and CockroachDB was able to scale with us throughout this journey.” - Sungyoon Jeong, DevOps Team
While there are over 100 people working on Cookie Run: Kingdom, the game server team at Devsisters has 6 full time developers and 2 infrastructure engineers. Prior to using CockroachDB, they used Couchbase and Aurora MySQL to build other products.
Couchbase is a key value store and over the years Devsisters’ document size grew larger and larger and started to create latency issues. The team remarked that Couchbase is fast, but has a lot of operational cost when it comes to tasks like rebalancing (which is not automated) and makes it tricky to scale horizontally. Additionally, Couchbase has limited support for transactions (document size limitations) which forced Devsisters to implement transactions at the application layer and affected performance.
When choosing a new database solution, the game server team had the following requirements:
“When starting this project, the ability to do distributed transactions was extremely important to us. Little did we know how frequently we would be scaling after the launch. Thankfully we choose CockroachDB to help support heavy traffic without causing latency for our users.” - Pierre Ricadat, Software Architect
Prior to selecting CockroachDB as the database to build Cookie Run: Kingdom, the team benchmarked Aurora and DynamoDB as well. However, CockroachDB provided better performance at the most affordable price. They also thought it was the right solution to scale with their success. In retrospect, the team remarked that Aurora MySQL simply wouldn’t have worked for Cookie Run: Kingdom because it couldn’t accommodate the scale they experienced.
The development for this game spanned over 2 years before it was released into production. The team started building their game servers on CockroachDB Core in 2019 before upgrading to Enterprise at the end of 2020 prior to their launch.
Here’s how it works: they deploy a CockroachDB cluster in a Kubernetes cluster on AWS EKS, and use the Helm chart for CockroachDB in Kubernetes. Their CockroachDB cluster consists of 3 localities for each available zone, and each locality statefulset corresponds to one Helm release. The CockroachDB Helm chart is deployed in an automated CD pipeline with GitOps. There’s a total of 90 nodes split across a single region, multi-availability zone deployment.
CockroachDB is used primarily as the event and snapshot store. They are using an event sourcing architecture, which means everything a player does in the game is saved in the form of events (+ regular snapshots). For this reason, their usage is write-heavy which CockroachDB is built to accommodate. Additionally, they built “projection” tables, which give them fast read access to some of the user data without having to load whole snapshots and replay events.
To keep gamers engaged, Devsisters unlocks new content at midnight (KST), and fans tune in. During these periods they often see 2x their normal traffic — up to 60,000 transactions per second at peak time — and they can support 360K concurrent users at the same time.
“From an operations perspective, we really like how it survives quietly without any human effort. We’ve never experienced this type of scale before. It has a lot of nodes compared to other products, so the hardware failure happens frequently, but it survives these failures very smoothly. If we built this game on Couchbase and a node would die, we would be screwed.” - Sungyoon Jeong, DevOps Team
The game servers team remarked that they had some key takeaways from working with CockroachDB and came to the realization that distributed databases need to be treated differently. They have different capabilities than NoSQL and legacy relational databases, with one of them being locality.
CockroachDB spreads the replicas of each piece of data across a diverse set of localities with the order determining the priority. The --locality flag accepts arbitrary key-value pairs that describe the location of the node. Locality might include region, country, availability zone, etc. The key-value pairs should be ordered into locality tiers from most inclusive to least inclusive (e.g., region before availability zone as in region=eu,zone=paris), and the keys and order of key-value pairs must be the same on all nodes.
Locality can also be used to influence the location of data replicas in various ways using replication zones. When there is high latency between nodes (e.g., cross-availability zone deployments), CockroachDB uses locality to move range leases closer to the current workload, reducing network round trips and improving read performance, also known as "follow-the-workload". It’s important to set up locality correctly from the beginning, otherwise you may face some performance issues. And given the nature of how locality works, it helps with resiliency as well.
Another piece of advice is to understand the application's usage patterns and prepare accordingly. Since this was a new game at the time, the team did not know what to anticipate and at one point CPU got dangerously high (over 70-80%). They gradually added new nodes which improved performance. Over time the usage patterns became more predictable, but for an extremely high volume of workloads, it’s beneficial to pre-split their data in advance during peak performance periods such as releases.
A last bit of guidance from the team: If you are preparing for a big launch, consider having a Customer Success team available before, during, and after. CockroachDB offers varying levels of support models and is available 24/7.