Almost a year ago, we wrote about our use of Jepsen in testing CockroachDB. As we prepare for CockroachDB 1.0, we wanted to get independent verification of our findings, so last fall we hired Kyle Kingsbury, the author of Jepsen, to review our tests and add more of his own. Last week, Kyle published his results.
Kyle’s testing found two new bugs: one in CockroachDB’s timestamp cache which could allow inconsistencies whenever two transactions are assigned the same timestamp, and one in which certain transactions could be applied twice due to internal retries. Both of these issues were fixed (the first in September and the second in October), and the expanded Jepsen test suite is now a part of our nightly test runs. Now we can say with more confidence that CockroachDB meets its guarantee of serializability (the highest ANSI SQL isolation level) for all SQL transactions.
CockroachDB’s original Jepsen test suite included four tests: register and bank, which were adapted from Jepsen tests for other databases (The bank test had previously been used to find serializability failures in MariaDB with Galera and Percona XtraDB Cluster), and monotonic and sets, which were developed specifically for CockroachDB.
To these, Kyle added three more: G2, sequential, and comments. The G2 test looks for (but did not find) a specific hypothesized anomaly called “anti-dependency cycles”. The sequential test shows that with an additional constraint (clients are “sticky” to a particular node), we provide sequential consistency (which is a stronger claim than our normal guarantee of serializability).
The comments test is different from the rest, in that it was expected to fail (it requires a database that is linearizable instead of merely serializable). This helps verify that the testing methodology is able to detect the subtle differences between consistency levels. Kyle also added several new nemesis modes to inject adverse events into the test, such as clock changes or network failures.
The register, bank, G2, and sequential tests passed under all tested configurations, verifying CockroachDB’s strong serializability claims, and the comments test failed as expected.
The monotonic test (described in detail in section 2.6 of Kyle’s report) found a serializability violation when two transactions were assigned the same timestamp. This may seem unlikely since Cockroach uses timestamps with nanosecond precision, but our use of hybrid logical clocks for timekeeping can cause multiple servers to use exactly the same timestamps, especially when the system clocks are jumping around. The cause of this violation was a bug in the timestamp cache and was fixed in
The sets test (described in section 2.7) revealed a bug when it was refactored to use a single auto-committed
INSERT statement instead of a multi-statement
COMMIT transaction (using a table with no primary key). Single statements can be retried by the server after certain errors (multi-statement transactions generally cannot, because the results of previous statements have already been returned to the client). In this case, the statement was attempted once, timed out due to a network failure, then retried, and both attempts eventually succeeded (this was possible because the table had an auto-generated primary key, so the two insertion attempts were not using the same PK values).
To fix this, we recognize certain types of errors as “ambiguous” since
beta-20161027 and avoid retries when it might result in a statement executing twice. It is unfortunate that we must sometimes return this “result is ambiguous” error to the client, but ambiguity is inevitable in any protocol that does not include transaction ids or other mechanisms to query the outcome of a previous operation.
This testing also confirmed that loose clock synchronization is necessary for CockroachDB to guarantee serializability. CockroachDB servers monitor the clock offset between them and will attempt to abort rather than return results that may be inconsistent, although this monitoring is imperfect and may not be able to react quickly enough to large clock jumps.
All CockroachDB deployments should use NTP to synchronize their system clocks. The amount of clock offset that can be tolerated is configurable. We’ve found the default of 500ms (increased since Kyle’s testing, when it was 250ms) to be reasonable in most environments, including virtualized cloud platforms. Environments with very reliable clocks may be able to reduce this setting to improve performance in some cases.
One of the most eye-catching lines in Kyle’s report concerns performance: “For instance, on a cluster of five m3.large nodes, an even mixture of processes performing single-row inserts and selects over a few hundred rows pushed \~40 inserts and \~20 reads per second (steady-state).”
These, of course, are pitiful numbers for a database. However, it’s important to note that these numbers are not representative of what real-world applications can expect to see from CockroachDB.
First, this is a test in which the Jepsen nemesis is continuously tweaking the network configuration to cause (and heal) partitions between different nodes. You don’t commonly see behavior like this in a real world deployment.
Second, tests like these deliberately try to reach high levels of contention, in which many transactions are trying to operate on the same keys. In real-world situations contention tends to be much lower as each user is operating mainly on their own data. The performance of all transactional systems suffer under high contention (although CockroachDB’s optimistic concurrency model fares worse than most), and applications that anticipate high contention on part of their data (such as incrementing the “like” counter on a post that’s gone viral) should consider this in their schema and application design.
The development philosophy of Cockroach Labs has always been correctness and stability first, then performance. To that end, we’re focusing our attention now on performance, including both the kinds of high-contention transactions seen here and more typical low-contention scenarios, ahead of our 1.0 launch later this spring. We’ll have much more to say about performance in an upcoming blog post.