BACKUP

Note:

Core users can only take full backups. To use the other backup features, you need an Enterprise license.

CockroachDB's BACKUP statement allows you to create full or incremental backups of your cluster's schema and data that are consistent as of a given timestamp.

You can backup a full cluster, which includes:

You can also back up:

Because CockroachDB is designed with high fault tolerance, these backups are designed primarily for disaster recovery (i.e., if your cluster loses a majority of its nodes) through RESTORE. Isolated issues (such as small-scale node outages) do not require any intervention.

Note:

BACKUP only backs up entire tables; it does not support backing up subsets of a table.

Note:

BACKUP is a blocking statement. To run a backup job asynchronously, use the DETACHED option. See the options below.

Tip:

To view the contents of an Enterprise backup created with the BACKUP statement, use SHOW BACKUP.

Required privileges

  • Full cluster backups can only be run by members of the admin role. By default, the root user belongs to the admin role.
  • For all other backups, the user must have read access on all objects being backed up. Database and table backups require SELECT privileges. Backups of user-defined schemas, or backups containing user-defined types, require USAGE privileges.
  • BACKUP requires full read and write (including delete and overwrite) permissions to its target destination.

Destination privileges

The destination file URL does not require the ADMIN role in the following scenarios:

  • S3 and GS using SPECIFIED (and not IMPLICIT) credentials. Azure is always SPECIFIED by default.
  • Userfile

The destination file URL does require the ADMIN role in the following scenarios:

We recommend using cloud storage for bulk operations.

Synopsis

BACKUP TABLE table_pattern , DATABASE database_name , TO destination AS OF SYSTEM TIME timestamp INCREMENTAL FROM full_backup_location , incremental_backup_location WITH backup_options_list OPTIONS ( backup_options_list )

Parameters

Parameter Description
table_pattern The table(s) or view(s) you want to back up.
database_name The name of the database(s) you want to back up (i.e., create backups of all tables and views in the database).
destination The URL where you want to store the backup.

For information about this URL structure, see Backup File URLs.
timestamp Back up data as it existed as of timestamp. The timestamp must be more recent than your cluster's last garbage collection (which defaults to occur every 25 hours, but is configurable per table).
full_backup_location Create an incremental backup using the backup stored at the URL full_backup_location as its base. For information about this URL structure, see Backup File URLs.

Note: After a full backup for an explicit list of tables and/or databases, it is not possible to create an incremental backup if one or more tables were created, dropped, or truncated. In these cases, you must create a new full backup before more incremental backups can be created. To avoid this, backup the cluster instead of the explicit list of tables/databases.
incremental_backup_location Create an incremental backup that includes all backups listed at the provided URLs.

Lists of incremental backups must be sorted from oldest to newest. The newest incremental backup's timestamp must be within the table's garbage collection period.

For information about this URL structure, see Backup File URLs.

For more information about garbage collection, see Configure Replication Zones.
kv_option_list Control the backup behavior with a comma-separated list of these options.

Options

Option Value Description
revision_history N/A Create a backup with full revision history, which records every change made to the cluster within the garbage collection period leading up to and including the given timestamp.
encryption_passphrase STRING The passphrase used to encrypt the files (BACKUP manifest and data files) that the BACKUP statement generates. This same passphrase is needed to decrypt the file when it is used to restore and to list the contents of the backup when using SHOW BACKUP. There is no practical limit on the length of the passphrase.
DETACHED N/A New in v20.2: When a backup runs in DETACHED mode, it will execute asynchronously and the job ID will be returned immediately without waiting for the job to finish. Note that with DETACHED specified, further job information and the job completion status will not be returned. For more on the differences between the returned job data, see the example below. To check on the job status, use the SHOW JOBS statement.

To run a backup within a transaction, use the DETACHED option.
kms STRING New in v20.2: The key management service (KMS) URI (or a comma-separated list of URIs) used to encrypt the files (BACKUP manifest and data files) that the BACKUP statement generates. This same KMS URI is needed to decrypt the file when it is used to restore and to list the contents of the backup when using SHOW BACKUP.

Currently, only AWS KMS is supported.

Backup file URLs

CockroachDB uses the URL provided to construct a secure API call to the service you specify. The URL structure depends on the type of file storage you are using. For more information, see the following:

Functional details

Object dependencies

Dependent objects must be backed up at the same time as the objects they depend on.

Object Depends On
Table with foreign key constraints The table it REFERENCES; however, this dependency can be removed during the restore.
Table with a sequence The sequence it uses; however, this dependency can be removed during the restore.
Views The tables used in the view's SELECT statement.
Interleaved tables The parent table in the interleaved hierarchy.

Users and privileges

The system.users table stores your users and their passwords. To restore your users and privilege grants, do a cluster backup and restore the cluster to a fresh cluster with no user data. You can also backup the system.users table, and then use this procedure.

Backup types

CockroachDB offers two types of backups: full and incremental.

Full backups

Full backups contain an un-replicated copy of your data and can always be used to restore your cluster. These files are roughly the size of your data and require greater resources to produce than incremental backups. You can take full backups as of a given timestamp and (optionally) include the available revision history.

Incremental backups

Incremental backups are smaller and faster to produce than full backups because they contain only the data that has changed since a base set of backups you specify (which must include one full backup, and can include many incremental backups). You can take incremental backups either as of a given timestamp or with full revision history.

Warning:

Incremental backups can only be created within the garbage collection period of the base backup's most recent timestamp. This is because incremental backups are created by finding which data has been created or modified since the most recent timestamp in the base backup––that timestamp data, though, is deleted by the garbage collection process. Incremental backups that fail because of a recent garbage collection will throw an error.

You can configure garbage collection periods using the ttlseconds replication zone setting.

For an example of an incremental backup, see the Create incremental backups example below.

Performance

The BACKUP process minimizes its impact to the cluster's performance by distributing work to all nodes. Each node backs up only a specific subset of the data it stores (those for which it serves writes; more details about this architectural concept forthcoming), with no two nodes backing up the same data.

BACKUP, like any read, cannot export a range if the range contains an unresolved intent. While you typically will want bulk, background jobs like BACKUP to have as little impact on your foreground traffic as possible, it's more important for backups to actually complete (which maintains your recovery point objective (RPO)). Unlike a normal read transaction that will block until any uncommitted writes it encounters are resolved, BACKUP will block only for a configurable duration before invoking priority to ensure it can complete on-time.

We recommend always starting backups with a specific timestamp at least 10 seconds in the past. For example:

> BACKUP...AS OF SYSTEM TIME '-10s';

This improves performance by decreasing the likelihood that the BACKUP will be retried because it contends with other statements/transactions. However, because AS OF SYSTEM TIME returns historical data, your reads might be stale. Taking backups with AS OF SYSTEM TIME '-10s' is a good best practice to reduce the number of still-running transactions you may encounter, since the backup will take priority and will force still-running transactions to restart after the backup is finished.

New in v20.2: BACKUP will initially ask individual ranges to backup but to skip if they encounter an intent. Any range that is skipped is placed at the end of the queue. When BACKUP has completed its initial pass and is revisiting ranges, it will ask any range that did not resolve within the given time limit (default 1 minute) to attempt to resolve any intents that it encounters and to not skip. Additionally, the backup's transaction priority is then set to high, which causes other transactions to abort until the intents are resolved and the backup is finished.

Viewing and controlling backups jobs

After CockroachDB successfully initiates a backup, it registers the backup as a job, and you can do the following:

Action SQL Statement
View the backup status SHOW JOBS
Pause the backup PAUSE JOB
Resume the backup RESUME JOB
Cancel the backup CANCEL JOB

You can also visit the Jobs page of the DB Console to view job details. The BACKUP statement will return when the backup is finished or if it encounters an error.

Note:

The presence of a BACKUP-CHECKPOINT file in the backup destination usually means the backup is not complete. This file is created when a backup is initiated, and is replaced with a BACKUP file once the backup is finished.

Examples

Per our guidance in the Performance section, we recommend starting backups from a time at least 10 seconds in the past using AS OF SYSTEM TIME. Each example below follows this guidance.

Backup a cluster

To backup a full cluster:

icon/buttons/copy
> BACKUP TO \
'gs://acme-co-backup/test-cluster' \
AS OF SYSTEM TIME '-10s';

Backup a database

To backup a single database:

icon/buttons/copy
> BACKUP DATABASE bank \
TO 'gs://acme-co-backup/database-bank-2017-03-27-weekly' \
AS OF SYSTEM TIME '-10s';

To backup multiple databases:

icon/buttons/copy
> BACKUP DATABASE bank, employees \
TO 'gs://acme-co-backup/database-bank-2017-03-27-weekly' \
AS OF SYSTEM TIME '-10s';

Backup a table or view

To backup a single table or view:

icon/buttons/copy
> BACKUP bank.customers \
TO 'gs://acme-co-backup/bank-customers-2017-03-27-weekly' \
AS OF SYSTEM TIME '-10s';

To backup multiple tables:

icon/buttons/copy
> BACKUP bank.customers, bank.accounts \
TO 'gs://acme-co-backup/database-bank-2017-03-27-weekly' \
AS OF SYSTEM TIME '-10s';

Create incremental backups

If you backup to a destination already containing a full backup, an incremental backup will be produced in a subdirectory with a date-based name (e.g., destination/day/time_1, destination/day/time_2):

icon/buttons/copy
> BACKUP TO \
'gs://acme-co-backup/test-cluster' \
AS OF SYSTEM TIME '-10s';
Note:

This incremental backup syntax does not work for backups using HTTP storage; you must explicitly control where your incremental backups go by using the INCREMENTAL FROM syntax.

Run a backup asynchronously

New in v20.2: Use the DETACHED option to execute the backup job asynchronously:

icon/buttons/copy
> BACKUP TO \
'gs://acme-co-backup/test-cluster' \
AS OF SYSTEM TIME '-10s'
WITH DETACHED;

The job ID is returned immediately without waiting for the job to finish:

        job_id
----------------------
  592786066399264769
(1 row)

Without the DETACHED option, BACKUP will block the SQL connection until the job completes. Once finished, the job status and more detailed job data is returned:

job_id             |  status   | fraction_completed | rows | index_entries | bytes
---------------------+-----------+--------------------+------+---------------+--------
652471804772712449 | succeeded |                  1 |   50 |             0 |  4911
(1 row)

Advanced examples

For examples of advanced BACKUP and RESTORE use cases, see:

Known limitations

Slow (or hung) backups and queries due to write intent buildup

Due to known bugs, transactions do not always clean up their write intents (newly written values) on commit or rollback. Garbage collection is also rather slow to react to them. This can cause the amount of unresolved write intents to build up over time. While this isn't necessarily a problem in itself, some operations do not handle large amounts of intents well. In particular, backups and queries that touch large numbers of values may become very slow and appear to hang.

To verify that intents may be causing an issue, open the Custom Chart debug page in the DB Console, and create a chart for the intentcount metric. This will show the number of intents present over time. The following query can also be used to get intent counts by range:

icon/buttons/copy
> SELECT * FROM (SELECT start_pretty, end_pretty, range_id, crdb_internal.range_stats(start_key)->'intent_count' AS intent_count FROM crdb_internal.ranges_no_leases) WHERE intent_count != '0';

To force cleanup of intents, either of the following methods can be used:

  • Do a high-priority scan of the table, which will resolve intents as it runs. Note that this may abort any conflicting transactions that are currently running. If the table has indexes, these can be cleaned by changing <table> into <table>@<index>. Numeric table and/or index identifiers (e.g., as output by the intent query above) can be used instead of names by placing them in brackets: [<table-id>] or [<table-id>]@[<index-id>].

    icon/buttons/copy
    > BEGIN PRIORITY HIGH; SELECT COUNT(*) FROM <table>; COMMIT;
    
  • Manually enqueue the range for garbage collection. In the DB Console, open the Advanced Debug page, scroll down to Tracing and Profiling Endpoints, and click Run a range through an internal queue. Then select Queue: gc, enter the range ID as output by the intent query above, check SkipShouldQueue, and click Submit. The operation will succeed on the leaseholder node and error on the others; this is expected.

The progress and effect of the cleanup can be monitored via the intent count statistics described above.

See also

YesYes NoNo