Message queuing and the database: Solving the dual write problem

Developing a modern application means developing for the cloud, with uptime, scalability, geographic distribution, and low latency at the forefront of concerns. This has led to the widespread adoption of application architectures based on event-driven microservices. Breaking the elements of an application down into microservices allows us to (for example) scale different services independently. It is simply the most efficient way to architect applications for the cloud.

However, embracing event-driven microservices also presents some challenges. With so many different services in motion at the same time, communication between them can become a challenge.

For example, what if one microservice needs to send some data or a request to another service, but the other service is busy? If these two services must wait on each other, we lose some of the efficiency of microservices architecture.

One solution to this problem is message queuing.

What is a message queue?

A message queue is essentially an intermediary storage queue that allows microservices to communicate with each other asynchronously. Message queuing allows a service to send a “message” to another service, even if the other service is not ready to receive it.

For example, in the diagram below, service 1 may have messages that it needs to send to service 2. Using a message queue, it can send them as needed and continue operating, regardless of whether service 2 is ready to receive them. These messages are then stored in the queue until service 2 retrieves them.

This makes the overall system more efficient and easier to scale. By decoupling services 1 and 2, we can enable them each to operate without having to wait on the other while still allowing them to communicate asynchronously via the intermediary message queue.

This also helps us reduce the risk of cascading failures. If our services must communicate synchronously, and one service fails, other services attempting to communicate with that service may also fail, and the services communicating with those services will then fail, and so on.

However, the use of messaging queues can lead to some sneaky problems when we want to store the same data in two places, such as the message queue and the database.

What is the dual write problem?

Sometimes, we need a service to send the same piece of data to two storage locations while ensuring consistency between them. For example, when a particular event occurs, we might want to update both the database and a message queue (or a message queuing system such as Apache Kafka) with the same information. This is called a dual write – we’re writing the same data to two different places.

But what happens if one of these two updates succeeds and the other fails? This is the dual write problem; if we’re trying to update two separate storage solutions in a distributed system without some additional measure that ensures consistency between them, eventually we will end up with an inconsistent state.

It is easy for the dual write problem to fly underneath our radar, because as long as both our database and message queue (for example) are functioning normally, no inconsistencies will arise. And when inconsistencies do arise, we may not always notice them.

In the long run, however, ignoring the dual write problem is not sustainable. Eventually, a failure, error, or outage is going to lead to an inconsistency that negatively impacts your application – and quite possibly your business.

We also asked Twitter how to define the dual write problem concisely, and got some great answers, including these:

The transactional outbox pattern

One approach to solving this problem is a design approach called the transactional outbox pattern. This approach requires a transactional database such as CockroachDB.

Here’s how it works: instead of sending the data to two separate locations, we send a single transaction that will store two separate copies of the data on the database. One copy is stored in the relevant database table, and the other copy is stored in an outbox table from which we will subsequently update the other storage location. For example, we might connect the outbox to Kafka, or to some other message queuing system.

In other words, it’s a two step process: First, we update two parts of the database (the relevant table and our transactional outbox) using a single transaction, which allows us to guarantee that either both updates commit or neither of them commits.

Then, we push the update from the transactional outbox to the message queue. If a message that’s in the outbox fails to make it to the message queue, no data or consistency is lost. Because the data is already safely stored in the database, we can simply retry.

However, this approach does require an additional service or job: we need to move the events from the database outbox to the message queue. So how do we do that?

We’ve just added a new course to Cockroach University that covers everything you need to know. It’s called Event-Driven Architecture for Java Developers and it’s completely free.

A better way?

The course covers how to build the transactional outbox pattern into your own application, but it also covers how to achieve this result even more easily using CockroachDB’s built-in Change Data Capture feature.

Don’t wait! Join Cockroach University today (it’s free) and start learning about event-driven architectures (or many other database and application development topics you may be interested in, too).