Bitski reduces read latencies to 0.5 milliseconds using a single database spanning multiple data centers, regions, and clouds
0.5 millisecond read latency
1 database distributed across multiple data centers
1 FTE time saved
“CockroachDB has handled everything we’ve thrown at it.”
- Patrick Tescher, CTO
Virtual world video games are not just games. They are places where people hang out, socialize, and spend real money. And the digital goods that are bought or earned in virtual worlds are valuable assets in the real world. They can be transferred, traded or sold on digital platforms. In order to provide a unique virtual world economy, and a currency that people can trust, some gaming platforms have pushed their economic model onto the blockchain.
Bitski uses CockroachDB and blockchain technology to provide the infrastructure for users who need a secure digital wallet, and for companies that need to mint assets and process transactions. In this article you’ll learn what technical challenges caused Bitski to migrate from Postgres to CockroachDB, how Bitski uses CockroachDB for their blockchain transactions, authentication, user data, application data, and analytics; as well as why a multi-region/hybrid-cloud deployment helps Bitski secure data better than their competitors.
In 2018 Bitski engineers began building their infrastructure with Postgres. They knew of the limitations but they wanted to start building an MVP.
While developing their platform Bitski’s identity started to take shape: They are building infrastructure to support currency and, for that reason, they function much the same as a bank. If banks can’t have downtime then neither can Bitski. And any asset stored in Bitski should feel as secure as it does in a bank. In the interest of performing like a bank Bitski needed to have always-on availability and they wanted multi-region capabilities for performance reasons. Both of which were difficult to accomplish with Postgres.
In the early days of MVP development on Postgres there were a few occasions in which a server died and it took a while to get the database back up. The downtime was only around five minutes. But five minutes can be a devastatingly long period of time in the wrong circumstances.
For example, some of Bitski’s current customers run limited release sales (called “Drops”) which is a very brief sale in which a lot of transactions occur in a short period of time. If unexpected downtime occurred during one of these brief sales it would mean missed revenue. At a small startup downtime outside a maintenance window usually means that you’re on your own to solve the problem. “What should I do? Should I reboot the server? Will that make it worse? Is there some corruption issue with the disks? Do I need to recover from a backup?
There were times in the early days when Bitski needed to schedule downtime to run an upgrade in which they recall wondering if the database was going to come back up. Even small upgrades in Postgres require downtime: If you have a Kubernetes upgrade - that’s downtime, if the server needs an upgrade - that’s downtime, if they need to physically move the server - that’s downtime. During downtime users cannot login and access their digital wallet which means Bitski’s partners would lose revenue opportunities. This was not an acceptable risk to take. Bitski started looking for a high-availability database partner.
“When we found CockroachDB we were really happy that it could do multi-master automatic failover and that we’d never need to schedule downtime.”
- Patrick Tescher, CTO
When the Bitski engineers decided that they needed to find an alternative to Postgres they created the following requirements:
- Multi-master automatic failover
- Postgres compatibility
- Private server storage
- Kubernetes compatibility
- Excellent monitoring
- Performance requirements: reads have to be under <10ms
CockroachDB serves as the general purpose database for Bitski. One of the most important workloads that it handles is the login experience. Logins should happen in under one second in order not to lose users before they even get onto the platform.
There are a lot of queries happening in the login flow, and for the database getting low write latency is important because the entire login experience needs to happen in 100ms or less. 100ms is the cutoff for something to feel ‘instantaneous.’ This is often called the 100ms rule.
Bitski uses Hydra for authentication - which supports CockroachDB natively. Hydra gives Bitski an access token, which gets queried a lot. When Bitski was setting up their second data center they had an issue with read query latency being too slow. They filed a support ticket with the Cockroach Labs Support Team and got help setting up a Secondary Index, which lowered the latency from 70 milliseconds read queries to ½ a millisecond.
The banking industry standard for securing private keys is to store them in encrypted databases. But there have been a few leaks of private keys recently which is particularly concerning because it would be hard to know if someone had your key. And if they did, they could impersonate you indefinitely.
Bitski wanted to choose the most secure set-up imaginable for private keys. Which is how they landed on hardware security modules (HSMs). In an HSM, the keys never leave the hardware. With an HSM, Bitski and its customers could avoid problems like: people getting hacked, seed phrases getting compromised, hardware keys or laptops (with crypto assets on them) getting lost - with an HSM you don’t even have to remember your password. So it’s safer, and easier for the end user.
The barrier to entry for setting up hardware security modules is high because it’s difficult. People can boot something up on Amazon but Amazon limits the available feature set and quickly becomes cost prohibitive. The advantage of hosting their own HSMs is that it allows Bitski to generate billions of keys per HSM while the Amazon HSMs only allow for a few hundred.
Setting HSMs up properly requires using a database that gives you hybrid-cloud capabilities. The fact that Bitski can deploy CockroachDB in multiple clouds and deploy it on their own hosted hardware makes it possible to use HSMs. This is because Bitski applications need to have access to the HSMs in their datacenters, but also need to be able to run workloads in AWS.
Other companies have to make other compromises to make up for their lack of security. Which usually means asking the user to set up their own encrypted password so they don’t have to store it. But if the user loses that password then they lose access to their data. This will never happen to any of Bitski’s customers which gives them a competitive advantage.
“A leaked private key is basically ‘game over’ for a startup. There’s no recovering from that.”
- Patrick Tescher, CTO
Bitski has their own tracing that they use to trace requests through their whole system - which they use for optimizing performance. CockroachDB is able to dump to that system (using Yeager).
In one system Bitski can see every little piece of software in their entire stack and how long each piece takes. They look at where they wrote to the disk and where they read from memory. “Being able to see all that detail in one system really helps us fine tune our performance and helps us focus our efforts. No other database that we looked at can do this,” says Tescher.
Authentication and permission is set up with Kubernetes. And because security is an important focus it’s useful to have one stack that can handle permissions and to have CockroachDB run seamlessly inside that stack. Bitski’s use of Kubernetes is, at this stage, fairly simple - they run it with host access to make it easy to get docker images running. They just pin docker images to one server which works well in CockroachDB and does not work well in Postgres because if a server dies you have to unpin it and figure it out.
One issue with Blockchain, and particularly Ethereum, is that every transaction that a user does needs an auto-incrementing lock. With global applications it’s hard to get a global, auto-incrementing number because most systems that generate these locks either run client-side (which sometimes works) or server-side which is limited to one server. Because of CockroachDB’s distributed architecture Bitski can use Cockroach to generate the auto incrementing locks and it works really well. “We were thinking that we would have to use etcd and set up global replicated etcd, and we were like, ‘No, we can just use the database we already have,'" says CTO Patrick Tescher.
Storing JSON columns in Cockroach means that most of Bitski’s systems are compatible with CockroachDB. Bitski has not come across any situation where they needed a different datastore because CockroachDB can handle all data types. And all the tools Bitski used with Postgres are also compatible with CockroachDB.
Bitski currently has CockroachDB deployed in data centers in the east, west, and central regions of the United States.
“Keeping data close to users drops latency a lot. Anything we can do to reduce latency is important. Some of our services make tons of back and forth latency requests so even a small improvement makes a big difference for performance and user experience. And being able to pin data to a region and really control where data is huge, not just for performance reasons, but also for security reasons and compliance reasons.”
-Patrick Tescher, CTO
Soon, Bitski will add data centers in Asia to reduce latency for the users located in that area, and to comply with data storage regulations. With Postgres adding a new datacenter is operationally complex. Bitski would have had to put read replicas in each region, and some requests would have gone to those, but they would have been out of date, so Bitski would have experienced inconsistencies and higher latencies for requests that had to go to the main region. This process would also result in potential downtime for maintenance windows.
“CockroachDB allows us to just say, ‘Okay let’s launch a new datacenter and launch some nodes and connect them, and everything just works. And we don’t have maintenance windows with CockroachDB. Even if we need to upgrade a whole region we just route traffic to another region, do the upgrade and then bring it back.”
-Patrick Tescher, CTO
In the Primary-Secondary architecture of traditional databases like Postgres a whole project has to be scheduled around a failover from one region to another. The engineers would have to get signoff and then they’d need to execute in a sandbox environment.
With Cockroach, when Bitski needs to do an upgrade, they just do it. “A really small change with CockroachDB would have been a really big change with a traditional Postgres setup. For a small startup, devoting a week of planning to something for an upgrade is a week that we can’t build other features,” says CTO Patrick Tescher. “Between CockroachDB support, and the ease of using the database, we’ve basically eliminated an entire engineering role.”
With CockroachDB and blockchain Bitski has built the most secure and easy-to-use digital wallet in their industry. At the same time, the engineers have built architecture that requires no downtime, no maintenance windows, while enhancing their scalability, availability, and insight into application performance. No other database could have allowed Bitski to accomplish both at the same time.
The primary challenge that Bitski faces now is overcoming the incumbent way of thinking about virtual world currencies, and the stigma that blockchain is an overly complicated tech fad.
“Most people don’t know how Netflix’s streaming services work. But that doesn’t stop them from watching. They just enjoy the outcomes and experiences that the blockchain enables. Our products provide the connection layer to the blockchain to enable these new experiences, and will negate the lack of familiarity people have about how blockchain works.”
- Naveen Molloy, COO
Soon Bitski will add data centers in Southeast Asia and Europe. They’ll use CockroachDB’s geo-partitioning feature to pin data to these locations which will speed up latency in those regions and deliver the quality experiences that will help them grow.