Use Cloud Storage for Bulk Operations

On this page Carat arrow pointing down

CockroachDB constructs a secure API call to the cloud storage specified in a URL passed to one of the following statements:

Tip:

We strongly recommend using cloud/remote storage.

URL format

URLs for the files you want to import must use the format shown below. For examples, see Example file URLs.

[scheme]://[host]/[path]?[parameters]

New in v22.2: You can create an external connection to represent an external storage or sink URI. This allows you to specify the external connection's name in statements rather than the provider-specific URI. For detail on using external connections, see the CREATE EXTERNAL CONNECTION page.

The following table provides a list of the parameters supported by each storage scheme:

Location Scheme Host Parameters
Amazon s3 Bucket name AUTH: implicit or specified (default: specified). When using specified pass user's AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY.

ASSUME_ROLE (optional): Pass the ARN of the role to assume. Use in combination with AUTH=implicit or specified.

AWS_SESSION_TOKEN (optional): For more information, see Amazon's guide on temporary credentials.

S3_STORAGE_CLASS (optional): Specify the Amazon S3 storage class for created objects. Note that Glacier Flexible Retrieval and Glacier Deep Archive are not compatible with incremental backups. Default: STANDARD.
Azure azure Storage container AZURE_ACCOUNT_KEY, AZURE_ACCOUNT_NAME

You must url encode your Azure account key before authenticating to Azure Storage. For more information, see Authentication - Azure Storage.
Google Cloud gs Bucket name AUTH: implicit, or specified (default: specified); CREDENTIALS

For more information, see Authentication - Google Cloud Storage.
HTTP http Remote host N/A

For more information, see Authentication - HTTP.
NFS/Local 1 nodelocal nodeID or self 2 (see Example file URLs) N/A
S3-compatible services s3 Bucket name
Warning:
While Cockroach Labs actively tests Amazon S3, Google Cloud Storage, and Azure Storage, we do not test S3-compatible services (e.g., MinIO, Red Hat Ceph).


AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN, AWS_REGION 3 (optional), AWS_ENDPOINT

For more information, see Authentication - S3-compatible services.
Tip:

The location parameters often contain special characters that need to be URI-encoded. Use Javascript's encodeURIComponent function or Go language's url.QueryEscape function to URI-encode the parameters. Other languages provide similar functions to URI-encode special characters.

Note:

You can disable the use of implicit credentials when accessing external cloud storage services for various bulk operations by using the --external-io-disable-implicit-credentials flag.

1 The file system backup location on the NFS drive is relative to the path specified by the --external-io-dir flag set while starting the node. If the flag is set to disabled, then imports from local directories and NFS drives are disabled.

2 Using a nodeID is required and the data files will be in the extern directory of the specified node. In most cases (including single-node clusters), using nodelocal://1/<path> is sufficient. Use self if you do not want to specify a nodeID, and the individual data files will be in the extern directories of arbitrary nodes; however, to work correctly, each node must have the --external-io-dir flag point to the same NFS mount or other network-backed, shared storage.

3 The AWS_REGION parameter is optional since it is not a required parameter for most S3-compatible services. Specify the parameter only if your S3-compatible service requires it.

Example file URLs

Example URLs for BACKUP, RESTORE, changefeeds, or EXPORT given a bucket or container name of acme-co and an employees subdirectory:

Location Example
Amazon S3 s3://acme-co/employees?AWS_ACCESS_KEY_ID=123&AWS_SECRET_ACCESS_KEY=456
Azure azure://acme-co/employees?AZURE_ACCOUNT_NAME=acme-co&AZURE_ACCOUNT_KEY=url-encoded-123
Google Cloud gs://acme-co/employees?AUTH=specified&CREDENTIALS=encoded-123
NFS/Local nodelocal://1/path/employees, nodelocal://self/nfsmount/backups/employees 2
Note:

Cloud storage sinks (for changefeeds) only work with JSON and emits newline-delimited JSON files.

Example URLs for IMPORT given a bucket or container name of acme-co and a filename of employees:

Location Example
Amazon S3 s3://acme-co/employees.sql?AWS_ACCESS_KEY_ID=123&AWS_SECRET_ACCESS_KEY=456
Azure azure://acme-co/employees.sql?AZURE_ACCOUNT_NAME=acme-co&AZURE_ACCOUNT_KEY=url-encoded-123
Google Cloud gs://acme-co/employees.sql?AUTH=specified&CREDENTIALS=encoded-123
HTTP http://localhost:8080/employees.sql
NFS/Local nodelocal://1/path/employees, nodelocal://self/nfsmount/backups/employees 2
Note:

HTTP storage can only be used for IMPORT and CREATE CHANGEFEED.

Encryption

Transport Layer Security (TLS) is used for encryption in transit when transmitting data to or from Amazon S3, Google Cloud Storage, and Azure.

For encryption at rest, if your cloud provider offers transparent data encryption, you can use that to ensure that your backups are not stored on disk in cleartext.

CockroachDB also provides client-side encryption of backup data, for more information, see Take and Restore Encrypted Backups.

Authentication

When running bulk operations to and from a storage bucket, authentication setup can vary depending on the cloud provider. This section details the necessary steps to authenticate to each cloud provider.

Note:

implicit authentication cannot be used to run bulk operations from CockroachDB Cloud clusters—instead, use AUTH=specified.

You can either authenticate to Amazon S3 with specified or implicit authentication. To have users assume IAM roles to complete bulk operations on an S3 bucket, you can also configure assume role authentication in addition to specified or implicit.

Specified authentication

If the AUTH parameter is not provided, AWS connections default to specified and the access keys must be provided in the URI parameters.

As an example:

icon/buttons/copy
BACKUP DATABASE <database> INTO 's3://{bucket name}/{path in bucket}/?AWS_ACCESS_KEY_ID={access key ID}&AWS_SECRET_ACCESS_KEY={secret access key}';

Implicit authentication

If the AUTH parameter is implicit, the access keys can be omitted and the credentials will be loaded from the environment (i.e., the machines running the backup).

Note:

New in v22.2: You can grant a user the EXTERNALIOIMPLICITACCESS system privilege.

icon/buttons/copy
BACKUP DATABASE <database> INTO 's3://{bucket name}/{path}?AUTH=implicit';

You can associate an EC2 instance with an IAM role to provide implicit access to S3 storage within the IAM role's policy. In the following command, the instance example EC2 instance is associated with the example profile instance profile, giving the EC2 instance implicit access to any example profile S3 buckets.

icon/buttons/copy
aws ec2 associate-iam-instance-profile --iam-instance-profile Name={example profile} --region={us-east-2} --instance-id {instance example}

Assume role authentication

Note:

CockroachDB supports assume role authentication on clusters running v22.2. Authenticating to cloud storage with ASSUME_ROLE on clusters running versions v22.1 and earlier, or mixed versions, is not supported and will result in failed bulk operations.

New in v22.2: To limit the control access to your Amazon S3 buckets, you can create IAM roles for users to assume. IAM roles do not have an association to a particular user. The role contains permissions that define the operations a user (or Principal) can complete. An IAM user can then assume a role to undertake a CockroachDB backup, restore, import, etc. As a result, the IAM user only has access to the assigned role, rather than having unlimited access to an S3 bucket.

Tip:

Role assumption applies the principle of least privilege rather than directly providing privilege to a user. Creating IAM roles to manage access to AWS resources is Amazon's recommended approach compared to giving access straight to IAM users.

The following section demonstrates setting up assume role authentication between two users. Since you can chain an arbitrary number of roles, see the Role chaining section for additional detail.

Set up AWS assume role authentication

For example, to configure a user to assume an IAM role that allows a bulk operation to an Amazon S3 bucket, take the following steps:

  1. Create a role that contains a policy to interact with the S3 buckets depending on the operation your user needs to complete. See the Storage permissions section for details on the minimum permissions each CockroachDB bulk operation requires. You can create an IAM role in Amazon's Management console, under the IAM and then Roles menu. Alternately, you can use the AWS CLI.

  2. If you do not already have the user that needs to assume the role, create the user. Under IAM in the Amazon console, navigate to Users and Add users. You can then add the necessary permissions by clicking on the Permissions tab. Ensure that the IAM user has sts:AssumeRole permissions attached. The following policy will give the user assume role permissions:

    {
    "Version": "2012-10-17",
    "Statement": {
        "Effect": "Allow",
        "Action": "sts:AssumeRole",
        "Resource": "arn:aws:iam::{account ID}:role/{role name}"
        }
    }
    

    The Resource here is the Amazon Resource Name (ARN) of the role you created in step 1. You can copy this from the role's Summary page.

    The sts:AssumeRole permission allows the user to obtain a temporary set of security credentials that gives them access to an S3 bucket to which they would not have access with their user-based permissions.

    AWS user summary page showing the JSON policy in place

  3. Return to your IAM role's Summary page, and click on the Trust Relationships tab. Add a trust policy into the role, which will define the users that can assume the role.

    The following trust policy provides the user the privilege to assume the role:

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Principal": {
                    "AWS": "arn:aws:iam::123456789123:user/{user}"
                },
                "Action": "sts:AssumeRole"
            }
        ]
    }
    

    When creating a trust policy consider the following:

    • In the trust policy you need to include the ARN of the user that you want to assume the role under Principal. You can also include the Condition attribute to further control access to the Amazon S3 bucket. For example, this could limit the operation to a specified date range, to users with multi-factor authentication enabled, or to specific IP addresses.
    • If you set the Principal ARN to root, this will allow any IAM user in the account with the AssumeRole permission to access the Amazon S3 bucket as per the defined IAM role permissions.
    • When the IAM user takes on the role to perform a bulk operation, they are temporarily granted the permissions contained in the role. That is, not the permissions specified in their user profile.
  4. Run the bulk operation. If using specified authentication, pass in the S3 bucket's URL with the IAM user's AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. If using implicit authentication, specify AUTH=IMPLICIT instead. For assuming the role, pass the assumed role's ARN, which you can copy from the IAM role's summary page:

    icon/buttons/copy
    BACKUP DATABASE movr INTO 's3://{bucket name}?AWS_ACCESS_KEY_ID={user key}&AWS_SECRET_ACCESS_KEY={user secret key}&ASSUME_ROLE=arn:aws:iam::{account ID}:role/{role name}' AS OF SYSTEM TIME '-10s';
    

    CockroachDB also supports authentication for assuming roles when taking encrypted backups. To use with an encrypted backup, pass the ASSUME_ROLE parameter to the KMS URI as well as the bucket's:

    icon/buttons/copy
    BACKUP INTO 's3://{bucket name}?AWS_ACCESS_KEY_ID={user key}&AWS_SECRET_ACCESS_KEY={user secret key}&ASSUME_ROLE={ARN}' WITH kms = 'aws:///{key}?AWS_ACCESS_KEY_ID={user key}&AWS_SECRET_ACCESS_KEY={user secret key}&REGION={region}&ASSUME_ROLE={ARN}';
    

    For more information on AWS KMS URI formats, see Take and Restore Encrypted Backups.

AWS role chaining

Role chaining allows a user to assume a role through an intermediate role(s) instead of the user directly assuming a role. In this way, the role chain passes the request for access to the final role in the chain. Role chaining could be useful when a third-party organization needs access to your Amazon S3 bucket to complete a bulk operation. Or, your organization could grant roles based on limited-privilege levels.

Assuming the role follows the same approach outlined in the previous section. The additional required step to chain roles is to ensure that the ARN of role A, which is assuming role B, is present in role B's trust policy with the sts:AssumeRole action.

The role B's trust policy must contain:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::{account-A-ID}:role/{role A name}"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

In a chain of three roles, role C's trust policy needs to include role B in the same way. For example, to chain three roles so that a user could assume role C, it is necessary to verify the following:

User → Role A → Role B → Role C
Has permission to assume role A. See step 2. Has a trust policy that permits the user to assume role A. See step 3. Has a trust policy that permits role A to assume role B. Has a trust policy that permits role B to assume role C.
Needs permission to assume role B. Needs permission to assume role C.

When passing a chained role into BACKUP, it will follow this pattern:

icon/buttons/copy
BACKUP DATABASE movr INTO "s3://{bucket name}?AWS_ACCESS_KEY_ID={user's key}&AWS_SECRET_ACCESS_KEY={user's secret key}&ASSUME_ROLE={role A ARN},{role B ARN},{role C ARN}" AS OF SYSTEM TIME '-10s';

Each chained role is listed separated by a , character. You can copy the ARN of the role from its summary page.

AWS workload identity

With a CockroachDB cluster deployed on Kubernetes, you can allow your pods to authenticate as an IAM role that you have associated to a Kubernetes service account. You can then use assume role authentication to allow that IAM role to assume another role that has permissions to perform bulk operations to an S3 bucket.

This means that a CockroachDB node will only be able to access credentials for the IAM role associated with the Kubernetes service account.

You can use workload identities with assume role authentication to run the following operations:

To use assume role authentication, you will need at least two IAM roles:

  • An identity role: the IAM role you have associated with your Kubernetes service account.
  • An operation role: the IAM role to be assumed. This contains the permissions required to complete a CockroachDB operation.

Set up AWS workload identity

First, create an IAM role for your Kubernetes service account to assume, and then configure your CockroachDB pods to use the service account. We will refer to this IAM role as an "identity role". You can complete all of these steps with Amazon's guide on IAM roles for service accounts.

Once you have an identity role that your CockroachDB nodes can assume, you can configure the identity role to assume another IAM role that contains the permissions to perform a bulk operation.

  1. Copy the ARN of the identity role. In the Amazon management console, click on IAM, then Roles, and select the name of your identity role. From the Summary page, copy your ARN. You will need this later when configuring the Trust Policy for the IAM role to be assumed.

    Role summary page showing the ARN copied

  2. Create or open the operation role that your identity role will assume.

    Note:

    If you already have the role that contains permissions for the bulk operation, ensure that you add the identity role ARN to the role's Trust Relationships tab on the Summary page.

    a. To create a role, click Create Role under the Roles menu. Select Custom trust policy and then add the ARN of your identity role to the JSON by clicking Principal. This will open a dialog box. Select IAM Roles for Principal Type and paste the ARN. Click Add Principal and then Next.

    Dialog box to add principal with IAM roles selected

    b. On the Add Permissions page, search for the permission policies that the role will need to complete the bulk operation.

    Filter list to add permissions to IAM roles

    Or, use the Create Policy button to define the required permissions. You can use the visual editor to select the service, actions, and resources.

    Using the visual editor to define S3 service and S3 actions.

    Or, use the JSON tab to specify the policy. For the JSON editor, see Storage Permissions for an example and detail on the minimum permissions required for each operation to complete. Click Next.

    c. Finally, give the role a name on the Name, review, and create page. The following screenshot shows the selected trust policy and permissions:

    Final screen in the create role process to review permissions and name role

  3. To run the bulk operation, you can use implicit authentication for your identity role and pass the ASSUME_ROLE parameter for your operation role. For a backup to Amazon S3:

    BACKUP DATABASE {database} INTO 's3://{bucket name}/{path}?AUTH=implicit&ASSUME_ROLE=arn:aws:iam::{account ID}:role/{operation role name}' AS OF SYSTEM TIME '-10s';
    

    In this SQL statement, AUTH=implicit uses the identity role to authenticate to the S3 bucket. The identity role then assumes the operation role that has permission to write a backup to the S3 bucket.

The AUTH parameter passed to the file URL must be set to either specified or implicit. The default behavior is specified in v21.2+. The following sections describe how to set up each authentication method.

Specified authentication

To access the storage bucket with specified credentials, it's necessary to create a service account and add the service account address to the permissions on the specific storage bucket.

The JSON credentials file for authentication can be downloaded from the Service Accounts page in the Google Cloud Console and then base64-encoded:

icon/buttons/copy
cat gcs_key.json | base64

Pass the encoded JSON object to the CREDENTIALS parameter:

icon/buttons/copy
BACKUP DATABASE <database> INTO 'gs://{bucket name}/{path}?AUTH=specified&CREDENTIALS={encoded key}';

Implicit authentication

For CockroachDB instances that are running within a Google Cloud Environment, environment data can be used from the service account to implicitly access resources within the storage bucket.

Note:

New in v22.2: You can grant a user the EXTERNALIOIMPLICITACCESS system privilege.

For CockroachDB clusters running in other environments, implicit authentication access can still be set up manually with the following steps:

  1. Create a service account and add the service account address to the permissions on the specific storage bucket.

  2. Download the JSON credentials file from the Service Accounts page in the Google Cloud Console to the machines that CockroachDB is running on. (Since this file will be passed as an environment variable, it does not need to be base64-encoded.) Ensure that the file is located in a path that CockroachDB can access.

  3. Create an environment variable instructing CockroachDB where the credentials file is located. The environment variable must be exported on each CockroachDB node:

    icon/buttons/copy
    export GOOGLE_APPLICATION_CREDENTIALS="/{cockroach}/gcs_key.json"
    

    Alternatively, to pass the credentials using systemd, use systemctl edit cockroach.service to add the environment variable Environment="GOOGLE_APPLICATION_CREDENTIALS=gcs-key.json" under [Service] in the cockroach.service unit file. Then, run systemctl daemon-reload to reload the systemd process. Restart the cockroach process on each of the cluster's nodes with systemctl restart cockroach, which will reload the configuration files.

    To pass the credentials using code, see Google's Authentication documentation.

  4. Run a backup (or other bulk operation) to the storage bucket with the AUTH parameter set to implicit:

    icon/buttons/copy
    BACKUP DATABASE <database> INTO 'gs://{bucket name}/{path}?AUTH=implicit';
    
Note:

If the use of implicit credentials is disabled with --external-io-disable-implicit-credentials flag, an error will be returned when accessing external cloud storage services for various bulk operations when using AUTH=implicit.

Assume role authentication

Note:

CockroachDB supports assume role authentication on clusters running v22.2. Authenticating to cloud storage with ASSUME_ROLE on clusters running versions v22.1 and earlier, or mixed versions, is not supported and will result in failed bulk operations.

New in v22.2: To limit the control access to your Google Cloud Storage buckets, you can create service accounts for another service account to assume. Service accounts do not necessarily have an association to a particular user. The service account contains permissions that define the operations a user, who has access to the service account, can complete. A service account can then assume another service account to undertake a CockroachDB backup, restore, import, etc. As a result, a service account with limited privileges only has access to the roles of the assumed service account, rather than having unlimited access to a GCS bucket.

The access is also limited by the generated short-lived credentials. The service account/role that is assuming another role, will issue the request for the short-lived credentials. If there are multiple roles in the chain, then each role defined in the chain will issue the request for credentials for the next role in the chain.

The following section demonstrates setting up assume role authentication between two service accounts A and B. You can also chain an arbitrary number of roles, see the Role chaining section for additional detail.

Set up Google Cloud assume role authentication

In the following example, we will configure service account A to assume service account B. In this way, service account A will be able to assume the role of service account B to complete a bulk operation to a GCS bucket.

For this example, both service accounts have already been created. If you need to create your own service accounts, see Google Cloud's Creating and managing service accounts page.

  1. First, you'll create a role that contains a policy to interact with the Google Cloud Storage bucket depending on the bulk operation your user needs to complete. This role will be attached to service account B in order that service account A can assume it.

    • In Google's Cloud console, click IAM & Admin, Roles, and then Create Role.
    • Add a title for the role and then click Add Permissions. Filter for the permissions required for the bulk operation. For example, if you want to enable service account B to run a changefeed, your role will include the storage.objects.create permission. See the Storage permissions section on this page for details on the minimum permissions each CockroachDB bulk operation requires.

    Adding permissions to a changefeed role when creating a role.

    Tip:

    Alternately, you can use the gcloud CLI to create roles.

  2. The service account that will be assumed (B in this case) must be granted access to the storage bucket with the role assigned from step 1.

    • Go to the Cloud Storage menu and select the bucket. In the bucket's menu, click Grant Access.
    • Add the service account to the Add principals box and select the name of the role you created in step 1 under Assign roles.

    Adding service account with the created role to the bucket.

  3. Next, service account B needs the "Service Account Token Creator" role for service account A. This enables service account B to create short-lived tokens for A.

    • Go to the Service Accounts menu in the Google Cloud Console.
    • Select service account B from the list, then the Permissions tab, and click Grant Access under Principals with access to this service account.
    • Enter the name of service account A into the New principals box and select "Service Account Token Creator" under the Assign roles dropdown. Click Save to complete.

    Granting service account A access to service account B with the token creator role.

  4. Finally, you will run the bulk operation from your CockroachDB cluster. If you're using specified authentication, pass in the GCS bucket's URL with the IAM user's CREDENTIALS. If you're using implicit authentication, specify AUTH=IMPLICIT instead. For assuming the role, pass the assumed role's service account name, which you can copy from the Service Accounts page:

    icon/buttons/copy
    BACKUP DATABASE <database> INTO 'gs://{bucket name}/{path}?AUTH=implicit&ASSUME_ROLE={service account name}@{project name}.iam.gserviceaccount.com';
    

    CockroachDB also supports authentication for assuming roles when taking encrypted backups. To use with an encrypted backup, pass the ASSUME_ROLE parameter to the KMS URI as well as the bucket's:

    icon/buttons/copy
    BACKUP DATABASE <database> INTO 'gs://{bucket name}/{path}?AUTH=implicit&ASSUME_ROLE={service account name}@{project name}.iam.gserviceaccount.com' WITH kms = 'gs:///projects/{project name}/locations/us-east1/keyRings/{key ring name}/cryptoKeys/{key name}?AUTH=IMPLICIT&ASSUME_ROLE={service account name}@{project name}.iam.gserviceaccount.com';
    

    For more information on Google Cloud Storage KMS URI formats, see Take and Restore Encrypted Backups.

    Note:

    CockroachDB supports assume role authentication for changefeeds emitting to Google Cloud Pub/Sub sinks. The process to set up assume role for Pub/Sub works in a similar way, except that you will provide the final service account with the "Pub/Sub Editor" role at the project level. See the Changefeed Sinks page for more detail on the Pub/Sub sink.

Google Cloud role chaining

Role chaining allows a service account to assume a role through an intermediate service account(s) instead of the service account directly assuming a role. In this way, the role chain passes the request for access to the final role in the chain. Role chaining could be useful when a third-party organization needs access to your Google Cloud Storage bucket to complete a bulk operation. Or, your organization could grant roles based on limited-privilege levels.

Following from the previous setup section, if you want to add an intermediate account to the chain of roles, it is necessary to ensure each service account has granted the "Service Account Token Creator" role to the previous account in the chain. See step 3 in the previous section to add this role on a service account.

In a chain of three roles, A, B, C:

Service Account A ← Service Account B (intermediate accounts) ← Service Account C (final account)
Credentials included in AUTH=implicit or specified Grants access to A with the Service Account Token Creator role Grants access to B with the Service Account Token Creator role
Access to the resource e.g., storage bucket
  • The initial account (A) requests permissions from account B.
  • The intermediate account (B) will delegate the request to account C.
  • The final service account (C) will request the credentials that account A requires.

When passing a chained role into BACKUP, it will follow this pattern with each chained role separated by a , character:

icon/buttons/copy
BACKUP DATABASE <database> INTO 'gs://{bucket name}/{path}?AUTH=implicit&ASSUME_ROLE={intermediate service account name}@{project name}.iam.gserviceaccount.com,{final service account name}@{project name}.iam.gserviceaccount.com'; AS OF SYSTEM TIME '-10s';

Google Cloud workload identity

With a CockroachDB cluster deployed on Kubernetes, you can allow your pods to authenticate as an IAM service account that you have associated to a Kubernetes service account. You can then use assume role authentication to allow the IAM service account to assume another service account that has permissions to perform bulk operations to a Google Cloud Storage bucket.

This means that a CockroachDB node will only be able to access credentials for the IAM service account associated with the Kubernetes service account.

You can use workload identities with assume role authentication to run the following operations:

Note:

Service accounts in Google and Kubernetes refer to different resources. See Google's documentation for definitions.

To use assume role authentication, you will need at least two IAM roles:

  • An identity service account: the IAM service account you have associated with your Kubernetes service account.
  • An operation service account: the IAM service account to be assumed. This contains the permissions required to complete a CockroachDB operation.

Set up Google Cloud workload identity

Before completing the steps to run a bulk operation with assume role, it is necessary to create a identity service account for your Kubernetes service account to assume. Then, you must configure your CockroachDB pods to use the Kubernetes service account. You can complete all of these steps with Google's guide Use Workload Identity.

Once you have an identity service account that your CockroachDB nodes can assume, you can configure the identity role to assume another service account role that contains the permissions to perform the bulk operation.

  1. Copy the service account name of the identity service account. In the Google Cloud Console, navigate to the IAM section, Service Accounts, and then the name of your identity service account. From the list view, copy the name of your identity service account. You will need to add this to the operation service account to be assumed.

  2. Create or open the operation service account that your identity service account will assume.

    a. To create a service account, click Create Service Account under the Service Accounts menu. Enter a name for the service account and click Create and Continue.

    b. In the Grant this service account access to project section, select the role you require for the bulk operation, e.g., "Storage Object Creator". See Storage Permissions for detail on the minimum permissions required for each operation to complete. Click Continue.

    Adding the workload identity role to the service account users role box

    c. In the Grant users access to this service account section, paste the name of the identity service account. Then, click Done.

    Adding the workload identity role to the service account users role box

    Note:

    If you already have the service account that contains permissions for the bulk operation, ensure that you give the identity service account access to this service account. Click on your service account and navigate to the Permissions tab. Then, use the process in step 2 to complete this.

  3. To run the bulk operation, you can use implicit authentication for your identity service account and pass the ASSUME_ROLE parameter for your operation service account. For a backup to your Google Cloud Storage bucket:

    icon/buttons/copy
    BACKUP DATABASE {database} INTO 'gs://{bucket name}/{path}?AUTH=implicit&ASSUME_ROLE={operation service account}@{project name}.iam.gserviceaccount.com'; AS OF SYSTEM TIME '-10s';
    

    In this SQL statement, AUTH=implicit uses the workload identity service account to authenticate to the bucket. The workload identity role then assumes the operation service account that has permission to write a backup to the bucket.

To access Azure storage containers, it is necessary to url encode the account key since it is base64-encoded and may contain +, /, = characters. For example:

icon/buttons/copy
BACKUP DATABASE <database> INTO 'azure://{container name}/{path}?AZURE_ACCOUNT_NAME={account name}&AZURE_ACCOUNT_KEY={url-encoded key}';

If your environment requires an HTTP or HTTPS proxy server for outgoing connections, you can set the standard HTTP_PROXY and HTTPS_PROXY environment variables when starting CockroachDB. You can create your own HTTP server with NGINX. A custom root CA can be appended to the system's default CAs by setting the cloudstorage.http.custom_ca cluster setting, which will be used when verifying certificates from HTTPS URLs.

If you cannot run a full proxy, you can disable external HTTP(S) access (as well as custom HTTP(S) endpoints) when importing by using the --external-io-disable-http flag.

Warning:

While Cockroach Labs actively tests Amazon S3, Google Cloud Storage, and Azure Storage, we do not test S3-compatible services (e.g., MinIO, Red Hat Ceph).

A custom root CA can be appended to the system's default CAs by setting the cloudstorage.http.custom_ca cluster setting, which will be used when verifying certificates from an S3-compatible service.

Storage permissions

This section describes the minimum permissions required to run CockroachDB bulk operations. While we provide the required permissions for Amazon S3 and Google Cloud Storage, the provider's documentation provides detail on the setup process and different options regarding access management.

Depending on the actions a bulk operation performs, it will require different access permissions to a cloud storage bucket.

This table outlines the actions that each operation performs against the storage bucket:

Operation Permission Description
Backup Write Backups write the backup data to the bucket/container. During a backup job, a BACKUP CHECKPOINT file will be written that tracks the progress of the backup.
Get Backups need get access after a pause to read the checkpoint files on resume.
List Backups need list access to the files already in the bucket. For example, BACKUP uses list to find previously taken backups when executing an incremental backup and to find the latest checkpoint file.
Delete (optional) To clean up BACKUP CHECKPOINT files that the backup job has written, you need to also include a delete permission in your bucket policy (e.g., s3:DeleteObject). However, delete is not necessary for backups to complete successfully in v22.1 and later.
Restore Get Restores need access to retrieve files from the backup. Restore also requires access to the LATEST file in order to read the latest available backup.
List Restores need list access to the files already in the bucket to find other backups in the backup collection. This contains metadata files that describe the backup, the LATEST file, and other versioned subdirectories and files.
Import Get Imports read the requested file(s) from the storage bucket.
Export Write Exports need write access to the storage bucket to create individual export file(s) from the exported data.
Enterprise changefeeds Write Changefeeds will write files to the storage bucket that contain row changes and resolved timestamps.

These actions are the minimum access permissions to be set in an Amazon S3 bucket policy:

Operation S3 permission
Backup s3:PutObject, s3:GetObject, s3:ListBucket
Restore s3:GetObject, s3:ListBucket
Import s3:GetObject
Export s3:PutObject
Enterprise Changefeeds s3:PutObject

See Policies and Permissions in Amazon S3 for detail on setting policies and permissions in Amazon S3.

An example S3 bucket policy for a backup:

{
    "Version": "2012-10-17",
    "Id": "Example_Policy",
    "Statement": [
        {
            "Sid": "ExampleStatement01",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::{ACCOUNT_ID}:user/{USER}"
            },
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::{BUCKET_NAME}",
                "arn:aws:s3:::{BUCKET_NAME}/*"
            ]
        }
    ]
}

In Google Cloud Storage, you can grant users roles that define their access level to the storage bucket. For the purposes of running CockroachDB operations to your bucket, the following table lists the permissions that represent the minimum level required for each operation. GCS provides different levels of granularity for defining the roles in which these permissions reside. You can assign roles that already have these permissions configured, or make your own custom roles that include these permissions.

For more detail about Predefined, Basic, and Custom roles, see IAM roles for Cloud Storage.

Operation GCS Permission
Backup storage.objects.create, storage.objects.get, storage.objects.list
Restore storage.objects.get, storage.objects.list
Import storage.objects.get
Export storage.objects.create
Changefeeds storage.objects.create

For guidance on adding a user to a bucket's policy, see Add a principal to a bucket-level policy.

Additional cloud storage feature support

Object locking

Delete and overwrite permissions are not required. To complete a backup successfully, BACKUP requires read and write permissions to cloud storage buckets. As a result, you can write backups to cloud storage buckets with object locking enabled. This allows you to store backup data using a write-once-read-many (WORM) model, which refers to storage that prevents any kind of deletion or modification to the objects once written.

Note:

We recommend enabling object locking in cloud storage buckets to protect the validity of a backup for restores.

For specific cloud-storage provider documentation, see the following:

Amazon S3 storage classes

When storing objects in Amazon S3 buckets during backups, exports, and changefeeds, you can specify the S3_STORAGE_CLASS={class} parameter in the URI to configure a storage class type.

The following S3 connection URI uses the INTELLIGENT_TIERING storage class:

's3://{BUCKET NAME}?AWS_ACCESS_KEY_ID={KEY ID}&AWS_SECRET_ACCESS_KEY={SECRET ACCESS KEY}&S3_STORAGE_CLASS=INTELLIGENT_TIERING'

While Cockroach Labs supports configuring an AWS storage class, we only test against S3 Standard. We recommend implementing your own testing with other storage classes.

Note:

Incremental backups are not compatible with the S3 Glacier Flexible Retrieval or Glacier Deep Archive storage classes. Incremental backups require ad-hoc reading of previous backups, which is not possible with the Glacier Flexible Retrieval or Glacier Deep Archive storage classes as they do not allow immediate access to S3 objects without first restoring the objects. See Amazon's documentation on Restoring an archived object for more detail.

This table lists the valid CockroachDB parameters that map to an S3 storage class:

CockroachDB parameter AWS S3 storage class
STANDARD S3 Standard
REDUCED_REDUNDANCY Reduced redundancy Note: Amazon recommends against using this storage class.
STANDARD_IA Standard Infrequent Access
ONEZONE_IA One Zone Infrequent Access
INTELLIGENT_TIERING Intelligent Tiering
GLACIER Glacier Flexible Retrieval
DEEP_ARCHIVE Glacier Deep Archive
OUTPOSTS Outpost
GLACIER_IR Glacier Instant Retrieval

You can view an object's storage class in the Amazon S3 Console from the object's Properties tab. Alternatively, use the AWS CLI to list objects in a bucket, which will also display the storage class:

aws s3api list-objects-v2 --bucket {bucket-name}
{
    "Key": "2022/05/02-180752.65/metadata.sst",
    "LastModified": "2022-05-02T18:07:54+00:00",
    "ETag": "\"c0f499f21d7886e4289d55ccface7527\"",
    "Size": 7865,
    "StorageClass": "STANDARD"
},
    ...

    "Key": "2022-05-06/202205061217256387084640000000000-1b4e610c63535061-1-2-00000000-
users-7.ndjson",
    "LastModified": "2022-05-06T12:17:26+00:00",
    "ETag": "\"c60a013619439bf83c505cb6958b55e2\"",
    "Size": 94596,
    "StorageClass": "INTELLIGENT_TIERING"
},

For a specific operation, see the following examples:

See also


Yes No
On this page

Yes No