How to talk about consistency and isolation in distributed DBs

Last edited on February 11, 2022

0 minute read

    The SQL standard defines four isolation levels (READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, and SERIALIZABLE). SNAPSHOT is a somewhat popular but non-standard addition to this list. People often reach for these words when they want to talk about behavior in CockroachDB in comparison to other databases, even when the behavior in question is not related to isolation.

    For example, the fact that reads in CockroachDB block on writes is often attributed to our use of the SERIALIZABLE isolation level, and offering another isolation level (READ COMMITTED or SNAPSHOT) is proposed as a solution. This blocking is actually caused by our consistency rather than our isolation, but consistency in a distributed database is a complex topic and the vocabulary for it is poorly defined.

    In this post I will define a taxonomy for consistency levels, building primarily on a post by Daniel Abadi. While Abadi’s terminology isn’t completely ideal, I’ll be sticking with his terms as much as possible to avoid introducing yet more new words here.

    Consistency and isolation: One of eachCopy Icon

    My main goal in this post is to introduce the idea of consistency levels as a first-class concept, mostly orthogonal to isolation levels. Jepsen’s consistency models chart comes close to this: entries on the left are isolation levels (which Jepsen calls “transactional” and “multi-object” properties if you click through one of them), and entries on the right are consistency levels (which Jepsen calls “single-object” properties).

    The Jepsen chart singles out one combination – serializable isolation and linearizable consistency – and gives it a name, “strict serializability”. This obscures the fact that every database must have both an isolation and a consistency level, whether the combination is named or not.

    If serializable plus linearizable equals “strict serializable”, what about other combinations? Are there “strict” versions of repeatable read, read committed, etc? There sort of are, but we don’t have names for them, because in non-distributed databases, linearizable consistency is the unspoken norm. It wasn’t until the advent of distributed databases that it became potentially beneficial to use consistency levels weaker than linearizable.

    It is here that the terminology took a confusing evolutionary step: compared to the SQL standard, which only defines serializability and doesn’t mention consistency, linearizability is an added consideration, so the combination is “stricter” than serializability alone.

    However, in practice “strict serializability” isn’t a new thing; existing monolithic databases that implemented serializability did so “strictly”. The real novel development is the introduction of non-strict serializability (as seen in CockroachDB), but it didn’t get its own new term and we’re left calling it simply “serializable”.

    Composite consistency levelsCopy Icon

    So if CockroachDB’s consistency level is not linearizable, what is it? When you look at the worst-case scenario, we provide sequential consistency. But that doesn’t tell the whole story – we make a point of providing linearizability for a subset of transactions (specifically for transactions that touch any overlapping keys), a property we usually call single-key linearizability. This is very important – it allows us to summarize the user-visible consistency behavior of CRDB as “no stale reads”, and it’s also where much of the consistency-related costs come from.

    Daniel Abadi calls this model STRONG PARTITION SERIALIZABLE, which, like “strict serializable” is a combination of an isolation level and a consistency level. But this consistency level is itself complex: linearizable for some interactions and sequential for others. This suggests generalization in several dimensions: STRONG PARTITION variants of other isolation levels (Early versions of CockroachDB implemented SNAPSHOT isolation, although it was later removed. This would have been considered “strong partition snapshot” rather than “strict snapshot” in this scheme), and PARTITION consistency levels with different combinations (e.g. with the out-of-partition consistency reduced from sequential to causal). To be explicit, we could call CockroachDB’s current model “isolation level serializable, consistency level partition (linearizable, sequential)”.

    Remember when I said that monolithic databases are linearizable by default? With asynchronous replication, that’s no longer true for the system as a whole. If read-only transactions can be directed to follower replicas, the system has a composite consistency level: linearizable for read/write transactions, and sequential for read-only. Abadi calls this STRONG WRITE SERIALIZABILITY (when combined with the serializable isolation level). In contrast to the PARTITION consistency levels, WRITE consistency involves a choice under the developer’s control on a transaction-by-transaction basis. (Read-only queries can be sent to the primary replica for linearizability, or to a secondary replica for weaker consistency but better performance. In PARTITION consistency levels, the data layout is generally determined in advance so the developer cannot choose at query time whether a distributed or local transaction will be used).

    Reevaluating the comparison matrixCopy Icon

    Abadi’s post ends with a comparison matrix that compares serializability combined with various consistency levels with other isolation levels with unspecified consistency. Without pinning down a consistency level, the “time travel anomalies” are indeed possible in these isolation levels.

    However, there must be some consistency level involved, and in general the lower isolation levels may be (and often are) combined with a consistency level that prohibits these anomalies. Unfortunately the space of isolation and consistency levels is too large for a legible comparison matrix, but we can talk about some individual cases.

    In a non-distributed SQL database, READ COMMITTED will generally be implemented with linearizable consistency – there is no reason to read anything older than the latest committed value, and no way in which any not-yet-committed transaction could be construed as coming before the read. This therefore prevents the stale read anomaly. And while I’m having a hard time coming up with any formal basis for this claim, I’ve never seen a non-distributed DB that permitted the immortal write anomaly. Causal reverse is trickier – I think this anomaly is possible in all sub-serializable isolation levels, regardless of consistency level. Adding asynchronous replication to the mix changes the consistency level to “STRONG WRITE”, and permits the stale read while continuing to prevent immortal writes.

    A new comparison matrixCopy Icon

    Here’s an attempt at a comparison matrix that focuses on isolation levels implemented in PostgreSQL and CockroachDB. PostgreSQL is provided as an example of a monolithic database with asynchronous replication.

    In the charts below, 😊 indicates that the anomaly is not possible (so you don’t need to worry about it), while ❗ indicates that the anomaly is possible.

    PostgreSQLCopy Icon

    Replicated PostgreSQLCopy Icon

    CockroachDBCopy Icon

    distributed SQL