Use Cloud Storage for Bulk Operations

On this page

Warning:

CockroachDB v20.2 is no longer supported. For more details, see the Release Support Policy.

CockroachDB uses the URL provided in a BACKUP, RESTORE, IMPORT, or EXPORT statement to construct a secure API call to the service you specify. The URL structure depends on the type of file storage you are using.

Tip:

We strongly recommend using cloud/remote storage (Amazon S3, Google Cloud Platform, etc.).

URLs for the files you want to import must use the format shown below. For examples, see Example file URLs.

[scheme]://[host]/[path]?[parameters]

Location	Scheme	Host	Parameters
Amazon	`s3`	Bucket name	`AUTH` (optional; can be `implicit` or `specified`), `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_SESSION_TOKEN` For more information, see Authentication - Amazon S3.
Azure	`azure`	N/A (see Example file URLs)	`AZURE_ACCOUNT_KEY`, `AZURE_ACCOUNT_NAME`
Google Cloud	`gs`	Bucket name	`AUTH` (optional; can be `default`, `implicit`, or `specified`), `CREDENTIALS` For more information, see Authentication - Google Cloud Storage.
HTTP ¹	`http`	Remote host	N/A For more information, see Authentication - HTTP.
NFS/Local ²	`nodelocal`	`nodeID` or `self` (see Example file URLs)	N/A
S3-compatible services	`s3`	Bucket name	`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_SESSION_TOKEN`, `AWS_REGION` ³ (optional), `AWS_ENDPOINT` For more information, see Authentication - S3-compatible services.

Tip:

The location parameters often contain special characters that need to be URI-encoded. Use Javascript's encodeURIComponent function or Go language's url.QueryEscape function to URI-encode the parameters. Other languages provide similar functions to URI-encode special characters.

¹ The file system backup location on the NFS drive is relative to the path specified by the --external-io-dir flag set while starting the node. If the flag is set to disabled, then imports from local directories and NFS drives are disabled.
² Using a nodeID is required and the data files will be in the extern directory of the specified node. In most cases (including single-node clusters), using nodelocal://1/<path> is sufficient. Use self if you do not want to specify a nodeID, and the individual data files will be in the extern directories of arbitrary nodes; however, to work correctly, each node must have the --external-io-dir flag point to the same NFS mount or other network-backed, shared storage.
³ The AWS_REGION parameter is optional since it is not a required parameter for most S3-compatible services. Specify the parameter only if your S3-compatible service requires it.

Example file URLs

Example URLs for BACKUP, RESTORE, EXPORT, or changefeeds given a bucket or container name of acme-co and an employees subdirectory:

Location	Example
Amazon S3	`s3://acme-co/employees?AWS_ACCESS_KEY_ID=123&AWS_SECRET_ACCESS_KEY=456`
Azure	`azure://acme-co/employees?AZURE_ACCOUNT_NAME=acme-co&AZURE_ACCOUNT_KEY=url-encoded-123`
Google Cloud	`gs://acme-co/employees?AUTH=specified&CREDENTIALS=encoded-123`
NFS/Local	`nodelocal://1/path/employees`, `nodelocal://self/nfsmount/backups/employees` ²

Note:

URLs for changefeeds should be prepended with experimental-.

Note:

Currently, cloud storage sinks (for changefeeds) only work with JSON and emits newline-delimited JSON files.

Example URLs for IMPORT given a bucket or container name of acme-co and a filename of employees:

Location	Example
Amazon S3	`s3://acme-co/employees.sql?AWS_ACCESS_KEY_ID=123&AWS_SECRET_ACCESS_KEY=456`
Azure	`azure://employees.sql?AZURE_ACCOUNT_KEY=123&AZURE_ACCOUNT_NAME=acme-co`
Google Cloud	`gs://acme-co/employees.sql`
HTTP	`http://localhost:8080/employees.sql`
NFS/Local	`nodelocal://1/path/employees`, `nodelocal://self/nfsmount/backups/employees` ²

Note:

HTTP storage can only be used for IMPORT.

Encryption

Transport Layer Security (TLS) is used for encryption in transit when transmitting data to or from Amazon S3, Google Cloud Storage, and Azure.

For encryption at rest, if your cloud provider offers transparent data encryption, you can use that to ensure that your backups are not stored on disk in cleartext.

CockroachDB also provides client-side encryption of backup data, for more information, see Take and Restore Encrypted Backups.

Authentication

Authentication behavior differs by cloud provider:

Amazon S3

If the AUTH parameter is not provided, AWS connections default to specified and the access keys must be provided in the URI parameters.

As an example:

BACKUP DATABASE <database> INTO 's3://{bucket name}/{path in bucket}/?AWS_ACCESS_KEY_ID={access key ID}&AWS_SECRET_ACCESS_KEY={secret access key}';

If the AUTH parameter is implicit, the access keys can be omitted and the credentials will be loaded from the environment, i.e. the machines running the backup.
```
BACKUP DATABASE <database> INTO 's3://{bucket name}/{path}?AUTH=implicit';
```
You can associate an EC2 instance with an IAM role to provide implicit access to S3 storage within the IAM role's policy. In the following command, the instance example EC2 instance is associated with the example profile instance profile, giving the EC2 instance implicit access to any example profile S3 buckets.
```
aws ec2 associate-iam-instance-profile --iam-instance-profile Name={example profile} --region={us-east-2} --instance-id {instance example}
```

Google Cloud Storage

If the AUTH parameter is set to:

default: GCS connections only use the key provided in the cloudstorage.gs.default.key cluster setting, and will error if not present.
specified: Pass the JSON object for authentication to the CREDENTIALS parameter. The JSON key object needs to be base64-encoded (using the standard encoding in RFC 4648).

To access the storage bucket with specified credentials, it's necessary to create a service account and add the service account address to the permissions on the specific storage bucket. The JSON object for authentication can be downloaded, encoded, and then passed to the CREDENTIALS parameter.
```
BACKUP DATABASE <database> INTO 'gs://{bucket name}/{path}?AUTH=specified&CREDENTIALS={encoded key}';
```
implicit: The instance can use environment data from the service account to access resources. This will provide implicit access to the storage bucket.
```
BACKUP DATABASE <database> INTO 'gs://{bucket name}/{path}?AUTH=implicit';
```
If AUTH is not provided, use the key provided in the cloudstorage.gs.default.key cluster setting. Otherwise, use environment data.

Note:

Deprecation notice: Currently, GCS connections default to the cloudstorage.gs.default.key cluster setting. This default behavior will no longer be supported in v21.2. If you are relying on this default behavior, we recommend adjusting your queries and scripts to now specify the AUTH parameter you want to use. Similarly, if you are using the cloudstorage.gs.default.key cluster setting to authorize your GCS connection, we recommend switching to use AUTH=specified or AUTH=implicit. AUTH=specified will be the default behavior in v21.2 and beyond.

Azure Storage

To access Azure storage containers, it is sometimes necessary to url encode the account key since it is base64-encoded and may contain +, /, = characters. For example:

BACKUP DATABASE <database> INTO 'azure://{container name}/{path}?AZURE_ACCOUNT_NAME={account name}&AZURE_ACCOUNT_KEY={url-encoded key}';

HTTP

If your environment requires an HTTP or HTTPS proxy server for outgoing connections, you can set the standard HTTP_PROXY and HTTPS_PROXY environment variables when starting CockroachDB. You can create your own HTTP server with NGINX. A custom root CA can be appended to the system's default CAs by setting the cloudstorage.http.custom_ca cluster setting, which will be used when verifying certificates from HTTPS URLs.

If you cannot run a full proxy, you can disable external HTTP(S) access (as well as custom HTTP(S) endpoints) when importing by using the --external-io-disable-http flag.

S3-compatible services

A custom root CA can be appended to the system's default CAs by setting the cloudstorage.http.custom_ca cluster setting, which will be used when verifying certificates from an S3-compatible service.

Cockroach
University

Docs Hub

Use Cloud Storage for Bulk Operations

Example file URLs

Encryption

Authentication

Amazon S3

Google Cloud Storage

Azure Storage

HTTP

S3-compatible services

See also

Cockroach University

Docs Hub

Cockroach University

Docs Hub

Use Cloud Storage for Bulk Operations

Example file URLs

Encryption

Authentication

Amazon S3

Google Cloud Storage

Azure Storage

HTTP

S3-compatible services

See also

Cockroach
University

Cockroach
University