BACKUP

On this page Carat arrow pointing down
Warning:
CockroachDB v19.2 is no longer supported. For more details, see the Release Support Policy.
Warning:

The BACKUP feature is only available to enterprise users. For non-enterprise backups, see cockroach dump.

CockroachDB's BACKUP statement allows you to create full or incremental backups of your cluster's schema and data that are consistent as of a given timestamp. Backups can be with or without revision history.

Because CockroachDB is designed with high fault tolerance, these backups are designed primarily for disaster recovery (i.e., if your cluster loses a majority of its nodes) through RESTORE. Isolated issues (such as small-scale node outages) do not require any intervention.

Functional details

Backup targets

You can backup entire tables (which automatically includes their indexes) or views. Backing up a database simply backs up all of its tables and views.

Note:

BACKUP only offers table-level granularity; it does not support backing up subsets of a table.

Object dependencies

Dependent objects must be backed up at the same time as the objects they depend on.

Object Depends On
Table with foreign key constraints The table it REFERENCES; however, this dependency can be removed during the restore.
Table with a sequence The sequence it uses; however, this dependency can be removed during the restore.
Views The tables used in the view's SELECT statement.
Interleaved tables The parent table in the interleaved hierarchy.

Users and privileges

The system.users table stores your users and their passwords. To restore your users, you must first backup the system.users table, and then use this procedure.

Restored tables inherit privilege grants from the target database; they do not preserve privilege grants from the backed up table because the restoring cluster may have different users.

Table-level privileges must be granted to users after the restore is complete.

Backup types

CockroachDB offers two types of backups: full and incremental.

Full backups

Full backups contain an unreplicated copy of your data and can always be used to restore your cluster. These files are roughly the size of your data and require greater resources to produce than incremental backups. You can take full backups as of a given timestamp and (optionally) include the available revision history.

Incremental backups

Incremental backups are smaller and faster to produce than full backups because they contain only the data that has changed since a base set of backups you specify (which must include one full backup, and can include many incremental backups). You can take incremental backups either as of a given timestamp or with full revision history.

Note the following restriction: Incremental backups can only be created within the garbage collection period of the base backup's most recent timestamp. This is because incremental backups are created by finding which data has been created or modified since the most recent timestamp in the base backup––that timestamp data, though, is deleted by the garbage collection process.

You can configure garbage collection periods using the ttlseconds replication zone setting.

Backups with revision history

Warning:

This is a beta feature. It is currently undergoing continued testing. Please file a Github issue with us if you identify a bug.

You can create full or incremental backups with revision history:

  • Taking full backups with revision history allows you to back up every change made within the garbage collection period leading up to and including the given timestamp.
  • Taking incremental backups with revision history allows you to back up every change made since the last backup and within the garbage collection period leading up to and including the given timestamp. You can take incremental backups with revision history even when your previous full or incremental backups were taken without revision history.

You can configure garbage collection periods using the ttlseconds replication zone setting. Taking backups with revision history allows for point-in-time restores within the revision history.

Performance

The BACKUP process minimizes its impact to the cluster's performance by distributing work to all nodes. Each node backs up only a specific subset of the data it stores (those for which it serves writes; more details about this architectural concept forthcoming), with no two nodes backing up the same data.

For best performance, we also recommend always starting backups with a specific timestamp at least 10 seconds in the past. For example:

> BACKUP...AS OF SYSTEM TIME '-10s';

This improves performance by decreasing the likelihood that the BACKUP will be retried because it contends with other statements/transactions. However, because AS OF SYSTEM TIME returns historical data, your reads might be stale.

Automating backups

We recommend automating daily backups of your cluster.

To automate backups, you must have a client send the BACKUP statement to the cluster.

Once the backup is complete, your client will receive a BACKUP response.

Viewing and controlling backups jobs

After CockroachDB successfully initiates a backup, it registers the backup as a job, which you can view with SHOW JOBS.

After the backup has been initiated, you can control it with PAUSE JOB, RESUME JOB, and CANCEL JOB.

Note:

If initiated correctly, the statement returns when the backup is finished or if it encounters an error. In some cases, the backup can continue after an error has been returned (the error message will tell you that the backup has resumed in background).

Synopsis

BACKUP TABLE table_pattern , DATABASE name , TO string_or_placeholder AS OF SYSTEM TIME timestamp INCREMENTAL FROM full_backup_location , incremental_backup_location WITH kv_option_list
Note:

The BACKUP statement cannot be used within a transaction.

Required privileges

Only members of the admin role can run BACKUP. By default, the root user belongs to the admin role.

Parameters

Parameter Description
table_pattern The table or view you want to back up.
name The name of the database you want to back up (i.e., create backups of all tables and views in the database).
destination The URL where you want to store the backup.

For information about this URL structure, see Backup File URLs.
AS OF SYSTEM TIME timestamp Back up data as it existed as of timestamp. The timestamp must be more recent than your cluster's last garbage collection (which defaults to occur every 25 hours, but is configurable per table).
WITH revision_history Create a backup with full revision history that records every change made to the cluster within the garbage collection period leading up to and including the given timestamp.
INCREMENTAL FROM full_backup_location Create an incremental backup using the full backup stored at the URL full_backup_location as its base. For information about this URL structure, see Backup File URLs.

Note: It is not possible to create an incremental backup if one or more tables were created, dropped, or truncated after the full backup. In this case, you must create a new full backup.
incremental_backup_location Create an incremental backup that includes all backups listed at the provided URLs.

Lists of incremental backups must be sorted from oldest to newest. The newest incremental backup's timestamp must be within the table's garbage collection period.

For information about this URL structure, see Backup File URLs.

For more information about garbage collection, see Configure Replication Zones.

Backup file URLs

We will use the URL provided to construct a secure API call to the service you specify. The path to each backup must be unique, and the URL for your backup's destination/locations must use the following format:

[scheme]://[host]/[path]?[parameters]
Location Scheme Host Parameters
Amazon s3 Bucket name AUTH 1 (optional; can be implicit or specified), AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN
Azure azure N/A (see Example file URLs AZURE_ACCOUNT_KEY, AZURE_ACCOUNT_NAME
Google Cloud 2 gs Bucket name AUTH (optional; can be default, implicit, or specified), CREDENTIALS
HTTP 3 http Remote host N/A
NFS/Local 4 nodelocal Empty or nodeID 5 (see Example file URLs) N/A
S3-compatible services 6 s3 Bucket name AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN, AWS_REGION 7 (optional), AWS_ENDPOINT
Warning:

If you write to nodelocal storage in a multi-node cluster, individual data files will be written to the extern directories of arbitrary nodes and will likely not work as intended. To work correctly, each node must have the --external-io-dir flag point to the same NFS mount or other network-backed, shared storage.

Note:

The location parameters often contain special characters that need to be URI-encoded. Use Javascript's encodeURIComponent function or Go language's url.QueryEscape function to URI-encode the parameters. Other languages provide similar functions to URI-encode special characters.

Note:

If your environment requires an HTTP or HTTPS proxy server for outgoing connections, you can set the standard HTTP_PROXY and HTTPS_PROXY environment variables when starting CockroachDB.

  • 1 If the AUTH parameter is not provided, AWS connections default to specified and the access keys must be provided in the URI parameters. If the AUTH parameter is implicit, the access keys can be omitted and the credentials will be loaded from the environment.

  • 2 If the AUTH parameter is not specified, the cloudstorage.gs.default.key cluster setting will be used if it is non-empty, otherwise the implicit behavior is used. If the AUTH parameter is implicit, all GCS connections use Google's default authentication strategy. If the AUTH parameter is default, the cloudstorage.gs.default.key cluster setting must be set to the contents of a service account file which will be used during authentication. If the AUTH parameter is specified, GCS connections are authenticated on a per-statement basis, which allows the JSON key object to be sent in the CREDENTIALS parameter. The JSON key object should be base64-encoded (using the standard encoding in RFC 4648).

  • 3 You can create your own HTTP server with Caddy or nginx. A custom root CA can be appended to the system's default CAs by setting the cloudstorage.http.custom_ca cluster setting, which will be used when verifying certificates from HTTPS URLs.

  • 4 The file system backup location on the NFS drive is relative to the path specified by the --external-io-dir flag set while starting the node. If the flag is set to disabled, then imports from local directories and NFS drives are disabled.

  • 5 The host component of NFS/Local can either be empty or the nodeID. If the nodeID is specified, it is currently ignored (i.e., any node can be sent work and it will look in its local input/output directory); however, the nodeID will likely be required in the future.

  • 6 A custom root CA can be appended to the system's default CAs by setting the cloudstorage.http.custom_ca cluster setting, which will be used when verifying certificates from an S3-compatible service.

  • 7 The AWS_REGION parameter is optional since it is not a required parameter for most S3-compatible services. Specify the parameter only if your S3-compatible service requires it.

Example file URLs

Location Example
Amazon S3 s3://acme-co/employees.sql?AWS_ACCESS_KEY_ID=123&AWS_SECRET_ACCESS_KEY=456
Azure azure://employees.sql?AZURE_ACCOUNT_KEY=123&AZURE_ACCOUNT_NAME=acme-co
Google Cloud gs://acme-co/employees.sql
HTTP http://localhost:8080/employees.sql
NFS/Local nodelocal:///path/employees, nodelocal://2/path/employees

Note: If you write to nodelocal storage in a multi-node cluster, individual data files will be written to the extern directories of arbitrary nodes and will likely not work as intended. To work correctly, each node must have the --external-io-dir flag point to the same NFS mount or other network-backed, shared storage.

Examples

Per our guidance in the Performance section, we recommend starting backups from a time at least 10 seconds in the past using AS OF SYSTEM TIME.

Backup a single table or view

icon/buttons/copy
> BACKUP bank.customers \
TO 'gs://acme-co-backup/database-bank-2017-03-27-weekly' \
AS OF SYSTEM TIME '-10s';

Backup multiple tables

icon/buttons/copy
> BACKUP bank.customers, bank.accounts \
TO 'gs://acme-co-backup/database-bank-2017-03-27-weekly' \
AS OF SYSTEM TIME '-10s';

Backup an entire database

icon/buttons/copy
> BACKUP DATABASE bank \
TO 'gs://acme-co-backup/database-bank-2017-03-27-weekly' \
AS OF SYSTEM TIME '-10s';

Backup with revision history

icon/buttons/copy
> BACKUP DATABASE bank \
TO 'gs://acme-co-backup/database-bank-2017-03-27-weekly' \
AS OF SYSTEM TIME '-10s' WITH revision_history;

Create incremental backups

Incremental backups must be based off of full backups you've already created.

icon/buttons/copy
> BACKUP DATABASE bank \
TO 'gs://acme-co-backup/db/bank/2017-03-29-nightly' \
AS OF SYSTEM TIME '-10s' \
INCREMENTAL FROM 'gs://acme-co-backup/database-bank-2017-03-27-weekly', 'gs://acme-co-backup/database-bank-2017-03-28-nightly';

Create incremental backups with revision history

icon/buttons/copy
> BACKUP DATABASE bank \
TO 'gs://acme-co-backup/database-bank-2017-03-29-nightly' \
AS OF SYSTEM TIME '-10s' \
INCREMENTAL FROM 'gs://acme-co-backup/database-bank-2017-03-27-weekly', 'gs://acme-co-backup/database-bank-2017-03-28-nightly' WITH revision_history;

Create locality-aware backups

New in v19.2: You can create locality-aware backups such that each node writes files only to the backup destination that matches the node locality configured at node startup.

A locality-aware backup is specified by a list of URIs, each of which has a COCKROACH_LOCALITY URL parameter whose single value is either default or a single locality key-value pair such as region=us-east. At least one COCKROACH_LOCALITY must be the default.

Backup file placement is determined by leaseholder placement, as each node is responsible for backing up the ranges for which it is the leaseholder. Nodes write files to the backup storage location whose locality matches their own node localities, with a preference for more specific values in the locality hierarchy. If there is no match, the default locality is used.

Note:

Note that the locality query string parameters must be URL-encoded as shown below.

Example - Create a locality-aware backup

For example, to create a locality-aware backup where nodes with the locality region=us-west write backup files to s3://us-west-bucket, and all other nodes write to s3://us-east-bucket by default, run:

icon/buttons/copy
BACKUP DATABASE bank TO ('s3://us-east-bucket?COCKROACH_LOCALITY=default', 's3://us-west-bucket?COCKROACH_LOCALITY=region%3Dus-west');

The backup created above can be restored by running:

icon/buttons/copy
RESTORE DATABASE bank FROM ('s3://us-east-bucket', 's3://us-west-bucket');

Example - Create an incremental locality-aware backup

To make an incremental locality-aware backup from a full locality-aware backup, the syntax is just like for regular incremental backups:

icon/buttons/copy
BACKUP DATABASE foo TO (${uri_1}, ${uri_2}, ...) INCREMENTAL FROM ${full_backup_uri} ...;

For example, to create an incremental locality-aware backup from a previous full locality-aware backup where nodes with the locality region=us-west write backup files to s3://us-west-bucket, and all other nodes write to s3://us-east-bucket by default, run:

icon/buttons/copy
BACKUP DATABASE bank TO
('s3://us-east-bucket/database-bank-2019-10-08-nightly?COCKROACH_LOCALITY=default', 's3://us-west-bucket/database-bank-2019-10-08-nightly?COCKROACH_LOCALITY=region%3Dus-west')
INCREMENTAL FROM 's3://us-east-bucket/database-bank-2019-10-07-weekly';
Note:

Note that only the backup URIs you set as the default when you created the previous backup(s) are needed in the INCREMENTAL FROM clause of your incremental BACKUP statement (as shown in the example). This is because the default destination for a locality-aware backup contains a manifest file that contains all the metadata required to create additional incremental backups based on it.

Example - Create an incremental locality-aware backup from a previous locality-aware backup

To make an incremental locality-aware backup from another locality-aware backup, the syntax is as follows:

icon/buttons/copy
BACKUP DATABASE foo TO ({uri_1}, {uri_2}, ...) INCREMENTAL FROM {full_backup}, {incr_backup_1}, {incr_backup_2}, ...;

For example, let's say you normally run a full backup every Monday, followed by incremental backups on the remaining days of the week.

By default, all nodes send their backups to your s3://us-east-bucket, except for nodes in region=us-west, which will send their backups to s3://us-west-bucket.

If today is Thursday, October 10th, 2019, your BACKUP statement will list the following backup URIs:

  • The full locality-aware backup URI from Monday, e.g.,
    • s3://us-east-bucket/database-bank-2019-10-07-weekly
  • The incremental backup URIs from Tuesday and Wednesday, e.g.,
    • s3://us-east-bucket/database-bank-2019-10-08-nightly
    • s3://us-east-bucket/database-bank-2019-10-09-nightly

Given the above, to take the incremental locality-aware backup scheduled for today (Thursday), you will run:

icon/buttons/copy
BACKUP DATABASE bank TO
    ('s3://us-east-bucket/database-bank-2019-10-10-nightly?COCKROACH_LOCALITY=default', 's3://us-west-bucket/database-bank-2019-10-10-nightly?COCKROACH_LOCALITY=region%3Dus-west')
INCREMENTAL FROM
    's3://us-east-bucket/database-bank-2019-10-07-weekly',
    's3://us-east-bucket/database-bank-2019-10-08-nightly',
    's3://us-east-bucket/database-bank-2019-10-09-nightly';
Note:

Note that only the backup URIs you set as the default when you created the previous backup(s) are needed in the INCREMENTAL FROM clause of your incremental BACKUP statement (as shown in the example). This is because the default destination for a locality-aware backup contains a manifest file that contains all the metadata required to create additional incremental backups based on it.

See also


Yes No
On this page

Yes No