What to do when a transaction fails in CockroachDB

If you’re working with CockroachDB, chances are that you care about transactional consistency. CockroachDB offers ACID transactional guarantees, including serializable isolation to ensure that no matter the volume of transactions or how many transactions are being processed in parallel, each transaction is committed to the database sequentially.

These guarantees ensure that your database maintains ironclad consistency immediately, which is important for many transactional applications. (Every application has a range of business use cases that determine how consistent its database needs to be. For transactional workloads, an eventually consistent database is often not the right persistence tool).

However, CockroachDB’s strong ACID guarantees do mean that occasionally transactions will fail and will need to be retried. Let’s take a closer look at why that happens, and how retries can be accomplished.

What is a deadlock?

In a relational database, a deadlock occurs when two transactions are in contention for the same resource in the database at the same time – for example, they might be intending to update the same row at the same time.

When a deadlock occurs, one of the transactions will need to be canceled.

Deadlocks in CockroachDB

When CockroachDB encounters a deadlock, it automatically cancels one of the deadlocked transactions, allowing the other to process. The failed transaction returns a specific error code: 40001. This code indicates that the transaction was not committed but can be retried.

For example, if CockroachDB encounters a deadlock between two transactions, Transaction A and Transaction B, it might process Transaction A while canceling Transaction B and returning a 40001 error. In this scenario, Transaction B is retryable because once Transaction A has been processed, there should be no more contention and thus no deadlock.

When a transaction returns a 40001 error, retrying it is not required. However, if the transaction is not retried, it won’t be processed and committed to the database.

We recommend ensuring that transactions returning 40001 are retried, so let’s take a look at how to do that.

Automatic transaction retries in CockroachDB

In some cases, CockroachDB will retry the transaction automatically. This occurs when the operation is executed as a single SQL statement, which CockroachDB treats as an implicit transaction.

For example, if executing the statement DELETE FROM customers WHERE id = 1; failed, CockroachDB would retry it automatically.

CockroachDB will also automatically retry failed transactions that are part of batched statements, i.e. statements that are sent to the database as part of a single batch, with no requirement to return results between each statement in the batch. These statements must be executed in a single query by whatever database driver you’re using.

However, in other circumstances, transaction retries will have to be handled in the application layer. This is because there may be additional application-side processing that the database doesn’t have access to. For example, there may be business-related errors thrown by the application that require correction before a database retry. Therefore, we’ll need to include retry logic in the application.

Retrying transactions using application logic

When implementing transaction retries in your application, your first step should be to check the documentation for the ORM or database driver you are using. Many provide methods for retrying transactions, and if your ORM or driver provides such a method, you should use it. If you don’t see anything in the documentation about transaction retries, check for documentation on handling deadlocks, as it is sometimes documented that way.

If your ORM or driver of choice does not already have a method of handling transaction retries, you’ll need to write one from scratch. Precisely how you approach this will depend on what language you’re using, of course, but it will likely follow a framework similar to this one written in pseudocode:

// Pseudocode try execute_transaction catch(err) rollback if(err.code != 40001 || tries > max_tries) throw err else backoff retry

In essence, what you need your code to do is attempt to execute the transaction. If an error is returned that doesn’t have the code 40001, or if the number of transaction retries has exceeded the maximum allowable number of retries, an error should be returned. Otherwise, the system should simply wait a moment (backoff) and try the transaction again, at which point it will likely be processed and committed successfully.

Implementing transaction retries using node.js and node-postgres

To see a real-world example of this kind of implementation in action, here’s transaction retry logic being created for a node.js app using the node-postgres driver. This video comes from our Getting Started with node-postgres and CockroachDB course, which is freely available on Cockroach University.