Distributed BACKUP and RESTORE added to free CockroachDB Core

We are constantly looking for ways to help small teams make a big impact with CockroachDB. We heard from our community that DUMP, the free, single-machine Disaster Recovery feature we provided in our open source option, CockroachDB Core, didn’t quite go far enough for supporting the types of data-intensive apps that startups were building in 2020.

So in CockroachDB v20.2, basic distributed BACKUPs, along with the entire suite of RESTORE functionality, are now a part of our free, open source option CockroachDB Core, to provide reliable, valuable, usable Disaster Recovery for all CockroachDB customers and applications.

The Evolution of BACKUP and RESTORE in CockroachDB

When we detailed the decision to move to an Open Core model with a commercial license in this 2017 blog post, our founders Peter, Ben and Spencer said:

“Features necessary for a startup to succeed will be … part of the open core [CockroachDB Core]; a feature which is primarily useful only to an already successful company will be CCL, and part of the [CockroachDB] enterprise product”
[Editor’s note: We’ve since moved from the APL to BSL]

They also detailed the decision for making Backup and Restore a part of the Enterprise offering:

“The first [of the Enterprise features] is a fully-distributed, incremental capability for quickly and consistently backing up and restoring large databases using configurable storage sinks (e.g. S3 or GCS). The same functionality, but non-distributed, will be available for free to all users.”

That was written three years ago, and more recently, we began to hear from CockroachDB Core users that DUMP, the free, single-machine Disaster Recovery feature we provided in CockroachDB Core, didn’t quite go far enough for supporting the types of data-intensive apps that startups were building in 2020.

“But Michael, CockroachDB is designed to deliver bulletproof resilience. Why do people still need backup and recovery features?”

BACKUP and RESTORE: Because Even the Safest Boats Carry Life Jackets

CockroachDB is, in fact, designed to deliver bulletproof resilience. Built with fault tolerance in mind, isolated issues like small-scale node failures don’t require manual intervention or complex scripting. But even the world’s safest boat needs to carry life jackets. The same principle applies to your data. This is why CockroachDB includes built-in distributed backup and recovery functionality, to help customers:

recover from mistakes (such as a forgotten `where` clause changing everyone’s name to “test”)
stay in regulatory compliance through backup archival
add an extra layer of protection for their customers’ data.

Prior to v20.2, CockroachDB offered two different methods for backing up data: BACKUP for CockroachDB Self-hosted and Cockroach Cloud users and DUMP for CockroachDB Core users. BACKUP is distributed, can write to a number of different storage options, and captures native binary data with very high reliability and reproducibility. BACKUP is paired with RESTORE, which is also distributed and restores data from files created by BACKUP.

DUMP, like the “dump” command in other databases, is a standalone client program which reads all the data from all the tables and prints it out as SQL statements that can be replayed in another CockroachDB cluster. This text-based detour leaves room for the restoring cluster to interpret that text differently, plus converting to/from text makes both sides slower and more expensive. But the applications of 2020 are more data-intensive than the apps of yesteryear and community members reached out through our public GitHub repo, our Community Slack and Forum feedback channels that DUMP and its trade offs just weren’t cutting it anymore.

They were right.

We confirmed these frustrations through some very light testing and found that running DUMP from a laptop on a 10GiB table from a 3-node cluster on c5d.4xlarge AWS machines took around three hours. On that same cluster under load, using BACKUP to back the whole cluster up to s3 (100GiB) took around three minutes.

(Note: These are not apples to apples comparisons nor are they meant to be benchmarks–for example the dump had to transfer data over the internet whereas backup was sent to S3 is the same region. They’re provided as examples of the types of results we encountered in making this product decision. We advise customers to benchmark themselves as Backup speeds will vary based on setup.)

Our customers told us the product wasn’t fitting their needs. We tested the performance ourselves, which only strengthened their case. As a result, we decided to make a change.

In our v20.2 release, basic distributed BACKUPs, along with the entire suite of RESTORE functionality, are now a part of CockroachDB Core to provide a reliable, valuable, usable Disaster Recovery option for applications and companies running on Core. In doing so, users of CockroachDB Core now have access to:

Distributed, scalable, performant binary backups
Corruption checks to ensure restored data is the same as what was backed up
Backup scheduling with the same resilience as the underlying cluster via our built-in

backup scheduling functionality

(new in v20.2 – no more cron jobs for backups!)

Other BACKUP functionality will remain a part of our CockroachDB Self-hosted and CockroachDB Dedicated.

In addition, we deprecated DUMP in v20.2 and DUMP will be removed in a future version. For data portability, where dump’s text-based encoding was actually desirable, we are working to ensure that schemas can be easily exported and made our distributed export to CSV feature EXPORT available in CockroachDB Core as well.

As we mentioned before, Cockroach Labs is constantly looking for ways to help small teams make a big impact with CockroachDB. This is the first of many such changes to come. As technologies emerge, best-practices evolve and data volumes grow, we’ll continue to evolve as well.

(Thanks to our customers and community members for providing feedback that’s making our product better. Your input helps us keep tabs on the market shifts and evolving needs. If you’re not already part of our community Slack, join the conversation here!)