CockroachDB is 10x more scalable than Amazon Aurora for OLTP workloads

The three design principles of CockroachDB are correctness, stability, and performance. Having achieved our correctness and stability goals with CockroachDB 1.0 and 1.1, we focused heavily on performance with CockroachDB 2.0. For more information on which benchmarks matter, or to see a comparison between CockroachDB 1.1 and 2.0, you can read CockroachDB 2.0 Makes Significant Strides.

Today we are releasing a comprehensive whitepaper that demonstrates how CockroachDB achieves high OLTP performance of over 128,000 tpmC on a TPC-C dataset over 2 terabytes in size. This OLTP performance is over 10x more TPC-C throughput than Amazon Aurora, in a 3x replicated deployment with single-digit seconds recovery time and zero-downtime migrations and upgrades. This far surpasses a typical active-passive database deployment with manual failure recovery. CockroachDB achieves this in serializable isolation, unlike competing databases that sacrifice isolation for performance.

We’re very excited to talk about performance, and have taken care to make this more than just an announcement with vague numbers. In keeping with our open source philosophy, our whitepaper contains a step-by-step reproduction of instructions to verify all our performance claims, as well as context on our benchmarking philosophy and practices. We don’t just want you to take our word for our performance numbers, we’d like to arm you with all the tools you need to check it out for yourself! In this post, we’d like to cover some brief highlights, but do check out the whitepaper for more details and a fully reproducible test script.

Benchmarking Scenario

We’ve decided to start with an unofficial TPC-C benchmark as our first benchmark. The TPC-C is a specification for a set of load that simulates the backend data processing requirements of a large retailer. Unlike NoSQL benchmarks such as YCSB, which involve a single logical table of key-values, no transactions, and only two queries, TPC-C exercises a richer set of SQL semantics with foreign keys, multiple indexes, transaction rollbacks, and joins. It also stresses the underlying storage capabilities of the database.

DISCLAIMER: this benchmark script was not validated and certified by the Transaction Processing Council. The results obtained are not official TPC-C results, and the results are not comparable with any official TPC-C results. Instead, we only compare our results to other unofficial TPC-C results by competing vendors such as Amazon Aurora. We have also made one modification to the TPC-C specification: we run every transaction in serializable mode as opposed to degrading the isolation guarantees selectively. As we show in this benchmark, one can achieve high performance without compromising safety, maintainability, and correctness.

Benchmarking Setup

We benchmarked two scenarios: a 3 node cluster on the TPC-C 80GB (1,000 warehouse) dataset, and a 30 node cluster on the TPC-C 800GB (10,000 warehouse) dataset.

We compare our unofficial TPC-C results to Amazon Aurora RDS unofficial TPC-C results from AWS re:Invent 2017. We also used Aurora’s SIGMOD 2017 paper for additional information as to their test setup and load generator.

Small clusters

We first set up a 3 node CockroachDB cluster on Google Compute Engine. Three nodes is the minimum for proper fault tolerance in CockroachDB, and a good initial setup. We use 3 n1-highcpu-16 VMs and loaded it with 1,000 warehouses of data. This translates to 80GB of data replicated 3 ways, which results in 200GB of data stored across the cluster after compression.

Large Clusters

We then set up a 30 node CockroachDB cluster on Google Compute Engine, using the same n1-highcpu-16 VMs, loaded 10,000 warehouses of data (a 2TB replicated dataset).

Database CRDB 2.0 Amazon Aurora RDS MySQL
Throughput (1,000 warehouses) 12,819 tpmC 12,582 tpm
Median latency (1,000) 88.1ms Not reported
95th percentile latency (1,000) 151.0ms Not reported
Hardware (1,000) 3x Google Compute Engine n1-highcpu-16 with attached Local SSDs 2x r3.8XL (One read replica with automatic failover)
Database CRDB 2.0 Amazon Aurora RDS MySQL
Throughput (10,000 warehouses) 128,587 tpmC 9406 tpmC
Median latency (10,000) 88ms Not reported
95th percentile latency (10,000) 176ms Not reported
Hardware (10,000) 30x Google Compute Engine n1-highcpu-16 with attached Local SSDs r3.8XL (MySQL RDS)
Database CRDB 2.0 Amazon Aurora RDS MySQL
Availability Can tolerate failure of any node. Can tolerate failure of a master and automatically promote read replicas to master.
RPO Zero (data is replicated 3x across the cluster) Zero (underlying elastic block storage is 6x replicated)
Default Transaction Isolation Level Serializability Repeatable read

Takeaways

10x more throughput than Amazon Aurora: Our highest reported tpmC of 128,587 is 10x that of Aurora’s 12,582 tpmC.

Linear Scalability: CockroachDB scales linearly in performance: our 10,000 warehouse uses 10x the nodes as our 1,000 warehouse cluster, providing 10x the throughput. Combined with our zero-downtime cluster migrations and node additions, your operational costs will scale with your business.

Performance under strong isolation We achieve this performance in serializable mode, the strongest isolation mode in the SQL standard. Unlike Aurora and other databases that selectively degrade isolation for performance, we show that this is not a choice you have to make. CockroachDB gives you both correctness and performance, without sacrificing one or the other.

Don’t just take our word for it, reproduce these numbers yourself! We aren’t just publishing these numbers today. Complete step-by-step reproduction instructions are in our whitepaper. Our database and all our tooling is open-source, so you can run these benchmarks yourself, and then scrutinize the code to ensure we haven’t missed anything. And if we have, we’d greatly appreciate your bug report or your pull request!

Illustration by Dalbert B. Vilarino

Benchmarking CockroachDB 2.0: Read the Performance Report now!

Get the report