Run Bulk Operations from Your Cluster

CockroachDB Serverless (beta) and CockroachDB Dedicated offer different levels of support for the following bulk operations. This page provides information on the availability of these operations in both types of CockroachDB Cloud cluster and examples.

Note:

For CockroachDB Serverless (beta) clusters, you must have billing information on file for your organization to have access to cloud storage. If you don't have billing set up, userfile is your only available storage option for bulk operations. CockroachDB Dedicated users can run bulk operations with userfile or cloud storage.

For information on userfile commands, visit the following pages:

The cloud storage examples on this page use Amazon S3 for demonstration purposes. For guidance on connecting to other storage options or using other authentication parameters, read Use Cloud Storage for Bulk Operations.

Examples

Before you begin, connect to your cluster. For guidance on connecting to your CockroachDB Cloud cluster, visit Connect to a CockroachDB Serverless (beta) Cluster or Connect to Your CockroachDB Dedicated Cluster.

Backup and restore data

We recommend starting backups from a time at least 10 seconds in the past using AS OF SYSTEM TIME. Read our guidance in the Performance section on the BACKUP page.

Note:

Only database and table-level backups are possible when using userfile as storage. Restoring cluster-level backups will not work because userfile data is stored in the defaultdb database, and you cannot restore a cluster with existing table data.

Database and table

When working on the same cluster, userfile storage allows for database and table-level backups.

First, run the following statement to backup a database to a directory in the default userfile space:

icon/buttons/copy
BACKUP DATABASE bank INTO 'userfile://defaultdb.public.userfiles_$user/bank-backup' AS OF SYSTEM TIME '-10s';

This directory will hold the files that make up a backup; including the manifest file and data files.

Note:

When backing up from a cluster and restoring a database or table that is stored in your userfile space to a different cluster, you can run cockroach userfile get to download the backup files to a local machine and cockroach userfile upload --url {CONNECTION STRING} to upload to the userfile of the alternate cluster.

BACKUP ... INTO adds a backup to a collection within the backup destination. The path to the backup is created using a date-based naming scheme by default, unless an explicit subdirectory is passed with the BACKUP statement. To view the backup paths in a given destination, use SHOW BACKUPS:

icon/buttons/copy
> SHOW BACKUPS IN 'userfile://defaultdb.public.userfiles_$user/bank-backup';
       path
------------------------
2021/03/23-213101.37
2021/03/24-172553.85
2021/03/24-210532.53
(3 rows)

In cases when your database needs to be restored, run the following:

icon/buttons/copy
RESTORE DATABASE bank FROM '2021/03/24-210532.53' IN 'userfile://defaultdb.public.userfiles_$user/bank-backup';

It is also possible to run userfile:///bank-backup as userfile:/// refers to the default path userfile://defaultdb.public.userfiles_$user/.

Once the backup data is no longer needed, delete from the userfile storage:

icon/buttons/copy
cockroach userfile delete bank-backup --url {CONNECTION STRING}

If you use cockroach userfile delete {file}, it will take as long as the garbage collection to be removed from disk.

To resolve database or table naming conflicts during a restore, see Troubleshooting naming conflicts.

Cockroach Labs runs full backups daily and incremental backups hourly for every CockroachDB Cloud cluster. The full backups are retained for 30 days, while incremental backups are retained for 7 days. For more information, read Restore Data From a Backup.

The following examples show how to run manual backups and restores:

Backup a cluster

To take a full backup of a cluster:

icon/buttons/copy
> BACKUP INTO \
's3://{BUCKET NAME}/{PATH}?AWS_ACCESS_KEY_ID={KEY ID}&AWS_SECRET_ACCESS_KEY={SECRET ACCESS KEY}' \
AS OF SYSTEM TIME '-10s';

Backup a database

To take a full backup of a single database:

icon/buttons/copy
> BACKUP DATABASE bank \
INTO 's3://{BUCKET NAME}/{PATH}?AWS_ACCESS_KEY_ID={KEY ID}&AWS_SECRET_ACCESS_KEY={SECRET ACCESS KEY}' \
AS OF SYSTEM TIME '-10s';

To take a full backup of multiple databases:

icon/buttons/copy
> BACKUP DATABASE bank, employees \
INTO 's3://{BUCKET NAME}/{PATH}?AWS_ACCESS_KEY_ID={KEY ID}&AWS_SECRET_ACCESS_KEY={SECRET ACCESS KEY}' \
AS OF SYSTEM TIME '-10s';

Backup a table or view

To take a full backup of a single table or view:

icon/buttons/copy
> BACKUP bank.customers \
INTO 's3://{BUCKET NAME}/{PATH}?AWS_ACCESS_KEY_ID={KEY ID}&AWS_SECRET_ACCESS_KEY={SECRET ACCESS KEY}' \
AS OF SYSTEM TIME '-10s';

To resolve database or table naming conflicts during a restore, see Troubleshooting naming conflicts.

View the backup subdirectories

BACKUP ... INTO adds a backup to a collection within the backup destination. The path to the backup is created using a date-based naming scheme by default, unless an explicit subdirectory is passed with the BACKUP statement. To view the backup paths in a given destination, use SHOW BACKUPS:

icon/buttons/copy
> SHOW BACKUPS IN 's3://{BUCKET NAME}?AWS_ACCESS_KEY_ID={KEY ID}&AWS_SECRET_ACCESS_KEY={SECRET ACCESS KEY}';
       path
-------------------------
/2021/12/14-190909.83
/2021/12/20-155249.37
/2021/12/21-142943.73
(3 rows)

When you restore a backup, add the backup's subdirectory path (e.g. /2021/12/21-142943.73) to the RESTORE statement.

Incremental backups will be appended to the full backup with BACKUP ... INTO LATEST IN {destination}. Your storage location will contain the incremental as a date-based subdirectory within the full backup.

In the following example /2021/12/21-142943.73 contains the full backup. The incremental backups (144748.08/ and 144639.97/) are appended as subdirectories to the full backup:

2021
|—— 12
   |—— 21-142943.73/
       |—— 20211221/
           |—— 144748.08/
           |—— 144639.97/

To output more detail about the backups contained within a directory, see View a list of the full and incremental backups in a specific full backup subdirectory

See Incremental backups with explicitly specified destinations to control where your backups go.

Restore a cluster

To restore a full cluster:

icon/buttons/copy
RESTORE FROM '2021/03/23-213101.37' IN 's3://{bucket_name}/{path/to/backup}?AWS_ACCESS_KEY_ID={key_id}&AWS_SECRET_ACCESS_KEY={access_key}';

To view the available subdirectories, use SHOW BACKUPS.

Restore a database

To restore a database:

icon/buttons/copy
RESTORE DATABASE bank FROM '2021/03/23-213101.37' IN 's3://{bucket_name}/{path/to/backup}?AWS_ACCESS_KEY_ID={key_id}&AWS_SECRET_ACCESS_KEY={access_key}';

To view the available subdirectories, use SHOW BACKUPS.

Note:

RESTORE DATABASE can only be used if the entire database was backed up.

Restore a table

To restore a single table:

icon/buttons/copy
> RESTORE TABLE bank.customers FROM '2021/03/23-213101.37' IN 's3://{bucket_name}/{path/to/backup}?AWS_ACCESS_KEY_ID={key_id}&AWS_SECRET_ACCESS_KEY={access_key}';

To restore multiple tables:

icon/buttons/copy
> RESTORE TABLE bank.customers, bank.accounts FROM '2021/03/23-213101.37' IN 's3://{bucket_name}/{path/to/backup}?AWS_ACCESS_KEY_ID={key_id}&AWS_SECRET_ACCESS_KEY={access_key}';

To view the available subdirectories, use SHOW BACKUPS.

For more information on taking backups and restoring to your cluster, read the following pages:

Import data into your CockroachDB Cloud cluster

To import a table from userfile, use the following command:

icon/buttons/copy
IMPORT TABLE customers (
        id UUID PRIMARY KEY,
        name TEXT,
        INDEX name_idx (name)
)
   CSV DATA ('userfile:///test-data.csv');

userfile:/// references the default path (userfile://defaultdb.public.userfiles_$user/).

        job_id       |  status   | fraction_completed |  rows  | index_entries |  bytes
---------------------+-----------+--------------------+--------+---------------+-----------
  599865027685613569 | succeeded |                  1 | 300024 |             0 | 13389972
(1 row)

For more import options, see IMPORT.

To import a table into your cluster:

icon/buttons/copy
> IMPORT TABLE customers (
        id UUID PRIMARY KEY,
        name TEXT,
        INDEX name_idx (name)
)
CSV DATA ('s3://{BUCKET NAME}/{customer-data}?AWS_ACCESS_KEY_ID={ACCESS KEY}&AWS_SECRET_ACCESS_KEY={SECRET ACCESS KEY}')
;

Read the IMPORT page for more examples and guidance.

Export data out of CockroachDB Cloud

Note:

Using EXPORT with userfile is not recommended. If you need to export data from a Serverless cluster, you can either set up billing for your organization to access cloud storage or export data to a local CSV file.

The following example exports the customers table from the bank database into a cloud storage bucket in CSV format:

EXPORT INTO CSV
  's3://{BUCKET NAME}/{customer-export-data}?AWS_ACCESS_KEY_ID={ACCESS KEY}&AWS_SECRET_ACCESS_KEY={SECRET ACCESS KEY}'
  WITH delimiter = '|' FROM TABLE bank.customers;

Read the EXPORT page for more examples and guidance.

Stream data out of your CockroachDB Cloud cluster

Core changefeeds stream row-level changes to a client until the underlying SQL connection is closed.

Note:

Only core changefeeds are available on CockroachDB Serverless (beta). To create a changefeed into a configurable sink, like cloud storage or Kafka, use CockroachDB Dedicated, which has this feature enabled by default.

To create a core changefeed in CockroachDB Serverless (beta), use the following example.

In this example, you'll set up a core changefeed on your CockroachDB Serverless (beta) cluster.

  1. As the root user, open the built-in SQL client:

    icon/buttons/copy
    cockroach sql --url {CONNECTION STRING} --format=csv
    
    Note:

    Because core changefeeds return results differently than other SQL statements, they require a dedicated database connection with specific settings around result buffering. In normal operation, CockroachDB improves performance by buffering results server-side before returning them to a client; however, result buffering is automatically turned off for core changefeeds. Core changefeeds also have different cancellation behavior than other queries: they can only be canceled by closing the underlying connection or issuing a CANCEL QUERY statement on a separate connection. Combined, these attributes of changefeeds mean that applications should explicitly create dedicated connections to consume changefeed data, instead of using a connection pool as most client drivers do by default.

    Note:

    To determine how wide the columns need to be, the default table display format in cockroach sql buffers the results it receives from the server before printing them to the console. When consuming core changefeed data using cockroach sql, it's important to use a display format like csv that does not buffer its results. To set the display format, use the --format=csv flag when starting the built-in SQL client, or set the \set display_format=csv option once the SQL client is open.

  2. Enable the kv.rangefeed.enabled cluster setting:

    icon/buttons/copy
    > SET CLUSTER SETTING kv.rangefeed.enabled = true;
    
  3. Create table foo:

    icon/buttons/copy
    > CREATE TABLE foo (a INT PRIMARY KEY);
    
  4. Insert a row into the table:

    icon/buttons/copy
    > INSERT INTO foo VALUES (0);
    
  5. Start the core changefeed:

    icon/buttons/copy
    > EXPERIMENTAL CHANGEFEED FOR foo;
    
    table,key,value
    foo,[0],"{""after"": {""a"": 0}}"
    
  6. In a new terminal, add another row:

    icon/buttons/copy
    cockroach sql --url {CONNECTION STRING} -e "INSERT INTO foo VALUES (1)"
    
  7. Back in the terminal where the core changefeed is streaming, the following output has appeared:

    foo,[1],"{""after"": {""a"": 1}}"
    

    Note that records may take a couple of seconds to display in the core changefeed.

  8. To stop streaming the changefeed, enter CTRL+C into the terminal where the changefeed is running.

For further information on changefeeds, read Stream Data Out of CockroachDB and EXPERIMENTAL CHANGEFEED FOR.

Change data capture (CDC) provides efficient, distributed, row-level changefeeds into a configurable sink for downstream processing such as reporting, caching, or full-text indexing.

A changefeed targets an allowlist of tables, called "watched rows". Each change to a watched row is emitted as a record to a configurable sink, like Kafka or a cloud storage sink. You can manage your changefeeds with create, pause, resume, or cancel in this version of CockroachDB Cloud.

Create a changefeed connected to Kafka

icon/buttons/copy
> CREATE CHANGEFEED FOR TABLE name, name2, name3
  INTO 'kafka://host:port'
  WITH updated, resolved;
+--------------------+
|       job_id       |
+--------------------+
| 360645287206223873 |
+--------------------+
(1 row)
Note:

Currently, changefeeds connected to Kafka versions < v1.0 are not supported in CockroachDB v21.1.

For more information on how to create a changefeed connected to Kafka, see Stream Data Out of CockroachDB Using Changefeeds and CREATE CHANGEFEED.

Create a changefeed connected to a cloud storage sink

Warning:

This is an experimental feature. The interface and output are subject to change.

icon/buttons/copy
> CREATE CHANGEFEED FOR TABLE name, name2, name3
  INTO 'experimental-s3://host?parameters'
  WITH updated, resolved;
+--------------------+
|       job_id       |
+--------------------+
| 360645287206223873 |
+--------------------+
(1 row)

For more information on how to create a changefeed connected to a cloud storage sink, see Stream Data Out of CockroachDB Using Changefeeds and CREATE CHANGEFEED.

See also

YesYes NoNo