Learn how we built a serverless SQL database.Read Blog
*Note: This blog was originally published in 2017. Everything is still true today. It is being updated to include additional capabilities in CockroachDB. Namely, bounded staleness reads.
The CAP theorem is a fundamental part of the theory of distributed systems. It states that in the presence of partitions (i.e. network failures), a system cannot be both consistent and available, and must choose one of the two.
CockroachDB chooses consistency, and is therefore a CP system in the terminology of the CAP theorem (as opposed to an AP system, which favors availability over consistency). Both consistency and availability are crucial to any business, and you might wonder how you are expected to choose between such important goals.
In this post I’ll explain how CAP-Consistent systems can still be highly available, how the CAP theorem applies to CockroachDB, and why consistency is the better choice for most databases.
The CAP theorem defines availability in strict binary terms: a system is either CAP-Available or it is not. However, in the language of high availability and service level agreements, the term “availability” is also used, but it describes a continuum instead of a binary condition.
A system might guarantee that it is available 99.99% of the time (“four nines”, allowing for less than an hour of downtime per year); 100% availability is generally regarded as unrealistic; engineering for high availability requires estimating the likelihood and severity of different kinds of outages, and balancing that against the cost of mitigating them.
A CAP-Consistent system will sometimes be unavailable due to network partitions. But all systems, even CAP-Available ones, will sometimes be unavailable for all kinds of reasons. In a well-run network there is no reason to believe that the kind of partitions where CAP tradeoffs are relevant will be any more common than other kinds of outages.
Dr. Brewer has gone so far as to say that partitions in Google’s network are so rare that the Spanner database is “technically CP” but “effectively CA”. This claim muddies the waters around this already-confusing subject, but it does illustrate that the loss of availability implied by the CAP theorem matters only within a narrow margin.
Distributed systems engineering is full of tradeoffs, with tensions between a variety of concerns including consistency, availability, performance, and flexibility. The CAP theorem focuses on a single narrow tradeoff between consistency and availability, but this doesn’t cover all the causes of unavailability or solutions to unavailability.
Outages can be caused by a variety of factors that the CAP theorem doesn’t consider, such as single-node hardware failure, application bugs, or operator error. And when the system is considered as a whole, even network partitions can be handled in ways that increase availability without sacrificing consistency. Choosing CAP-Availability buys very little in effective availability, but the loss of consistency pushes a significant amount of complexity into the application code and has a high price in engineering effort.
For example, consider an application deployed across three datacenters, with client traffic load balanced across all three. If one of those datacenters is knocked offline by a network failure, client traffic directed to the offline datacenter would experience an outage whether the underlying database is CAP-Consistent or CAP-Available. When the load balancer is updated to direct traffic to the live datacenters (which can be based on automated health checks), service is restored, no matter how the underlying database handles the partition.
The only time that a CAP-Available system would be available when a CAP-Consistent one would not is when one of the datacenters can’t talk to the other replicas, but can talk to clients, and the load balancer keeps sending it traffic. By considering the deployment as a whole, high availability can be achieved without the CAP theorem’s requirement of responses from a single partitioned node.
If the increase in availability in a CAP-Available system is small, then why choose one over a CAP-Consistent one? One reason is write latency: consistent systems must coordinate between different nodes during writes to provide that consistency (and depending on the system, consistent reads may also incur higher coordination costs). Since inconsistent systems allow for the possibility of missing data, they can return responses more quickly. This may be an appropriate choice for applications where speed is more important than robustness.
CockroachDB is a CAP-Consistent (CP) system: each piece of data lives on at least three replicas, and writes require that a majority of those replicas are able to communicate with each other. For reads, one of those replicas is granted a lease, or temporary ownership of a range of data, that allows it to serve reads without communicating with the others for a few seconds. In the event that the leaseholder is partitioned away from the other replicas, it will be allowed to continue to serve reads (but not writes) until its lease expires (leases currently last 9 seconds by default), and then one of the other two replicas will get a new lease (after waiting for the first replica’s lease to expire). This ensures that the system recovers quickly from outages, maximizing availability even though it does not satisfy the CAP theorem’s all-or-nothing definition of availability.
I think it’s worth noting here that we recently introduced bounded-staleness reads which let you opt in to a higher-availability, lower-consistency mode on a case-by-case basis. Here are a couple examples of when you should consider using bounded staleness:
When you need minimally stale reads from the nearest replica without blocking on conflicting transactions. This is possible because the historical timestamp is chosen dynamically and the least stale timestamp that can be served locally without blocking is used.
You can confine the read to a single statement that meets the bounded staleness limitations.
You need higher availability than is provided by exact staleness reads. Specifically, what we mean by availability in this context is:
CockroachDB’s foundation of strong consistency is what makes it possible to offer a distributed database with the expected guarantees of a traditional non-distributed SQL database while still being a highly available system. Without consistency, application developers would have to work around surprising behavior like secondary indexes that don’t have all the data (or that have pointers to records that haven’t been replicated yet), or deal with the potential loss of data when a node fails.
For most applications, a CAP-Consistent database like CockroachDB is often the better choice, despite potentially longer latencies, because it offers a simple contract to the application developer:
We recently put together a series of livestreams discussing each component of CAP Theorem. I’ll embed them here for ease of viewing: