Data Domiciling with CockroachDB

As you scale your usage of , you may need to keep certain subsets of data in specific localities. Keeping specific data on servers in specific geographic locations is also known as data domiciling. CockroachDB has basic support for data domiciling in multi-region clusters using the statement.

Using CockroachDB as part of your approach to data domiciling has several limitations. For more information, see Known limitations.

Overview

This page has instructions for data domiciling in . At a high level, this process involves:

Controlling the placement of specific row or table data using regional tables with the and clauses.
Further restricting where the data in those regional tables is stored using the , which creates a set of such that and whose are in the super region will have all of their stored only in regions that are members of the super region. For more information, see .

An alternative method to the statement is to use the statement.PLACEMENT RESTRICTED is not recommended, and is documented for backwards compatibility. Most users should use ADD SUPER REGION, which allows for region survival as well as providing data placement.

Before you begin

This page assumes you are already familiar with:

CockroachDB’s . If you are not using them, the instructions on this page will not apply.
The fact that CockroachDB stores your data in .

Example

In the following example, you will go through the process of configuring the data set using . Then, as part of implementing a data domiciling strategy, you will apply restricted replica settings in Step 4 using the : It works with databases with or . If you need region survival goals, you must use . Finally, you will verify that the resulting replica placements are as expected using the . For the purposes of this example, the data domiciling requirement is to configure a multi-region deployment of the such that data for EU-based users, vehicles, etc. is being stored on CockroachDB nodes running in EU localities.

Step 1. Start a simulated multi-region cluster

Use the following command to start the cluster. This particular combination of flags results in a demo cluster of 9 nodes, with 3 nodes in each region. It sets the appropriate and also simulates the network latency that would occur between nodes in these localities. For more information about each flag, see the documentation, especially for .

$ cockroach demo --global --nodes 9

When the cluster starts, you’ll see a message like the one shown below, followed by a SQL prompt. Note the URLs for:

Viewing the : http://127.0.0.1:8080/demologin?password=demo30570&username=demo.
Connecting to the database from a or a : postgresql://demo:demo30570@127.0.0.1:26257/movr?sslmode=require&sslrootcert=%2FUsers%2Frloveland%2F.cockroach-demo%2Fca.crt.

# Welcome to the CockroachDB demo database!
#
# You are connected to a temporary, in-memory CockroachDB cluster of 9 nodes.
# Communication between nodes will simulate real world latencies.
#
# WARNING: the use of --global is experimental. Some features may not work as expected.
#
# This demo session will send telemetry to Cockroach Labs in the background.
# To disable this behavior, set the environment variable
# COCKROACH_SKIP_ENABLING_DIAGNOSTIC_REPORTING=true.
#
# Beginning initialization of the movr dataset, please wait...
#
# The cluster has been preloaded with the "movr" dataset
# (MovR is a fictional vehicle sharing company).
#
# Reminder: your changes to data stored in the demo session will not be saved!
#
# If you wish to access this demo cluster using another tool, you will need
# the following details:
#
#   - Connection parameters:
#      (webui)    http://127.0.0.1:8080/demologin?password=demo30570&username=demo
#      (cli)      cockroach sql --certs-dir=/Users/rloveland/.cockroach-demo -u demo -d movr
#      (sql)      postgresql://demo:demo30570@127.0.0.1:26257/movr?sslmode=require&sslrootcert=%2FUsers%2Frloveland%2F.cockroach-demo%2Fca.crt
#
#   To display connection parameters for other nodes, use \demo ls.
#   - Username: "demo", password: "demo30570"
#   - Directory with certificate files (for certain SQL drivers/tools): /Users/rloveland/.cockroach-demo
#
# You can enter \info to print these details again.
#
# Server version: CockroachDB CCL v23.1.2 (x86_64-apple-darwin19, built 2023/05/25 16:10:39, go1.19.4) (same version as client)
# Cluster ID: 21c6756f-7e7e-4990-863a-cbd99e6f737a
# Organization: Cockroach Demo
#
# Enter \? for a brief introduction.

You now have a cluster running across 9 nodes, with 3 nodes each in the following regions:

us-east1
us-west1
europe-west1

You can verify this using the statement:

SHOW REGIONS;

     region    |  zones  | database_names | primary_region_of | secondary_region_of
---------------+---------+----------------+-------------------+----------------------
  europe-west1 | {b,c,d} | {}             | {}                | {}
  us-east1     | {b,c,d} | {}             | {}                | {}
  us-west1     | {a,b,c} | {}             | {}                | {}
(3 rows)

Step 2. Apply multi-region SQL abstractions

Execute the following statements to set the . This information is necessary so that CockroachDB can later move data around to optimize access to particular data from particular regions.

ALTER DATABASE movr PRIMARY REGION "europe-west1";
ALTER DATABASE movr ADD REGION "us-east1";
ALTER DATABASE movr ADD REGION "us-west1";

Because the data in promo_codes is not updated frequently (a.k.a., “read-mostly”), and needs to be available from any region, the right table locality is .

ALTER TABLE promo_codes SET locality GLOBAL;

Next, alter the user_promo_codes table to have a foreign key into the global promo_codes table. This will enable fast reads of the promo_codes.code column from any region in the cluster.

ALTER TABLE user_promo_codes
  ADD CONSTRAINT user_promo_codes_code_fk
    FOREIGN KEY (code)
    REFERENCES promo_codes (code)
    ON UPDATE CASCADE;

All of the tables except promo_codes contain rows which are partitioned by region, and updated very frequently. For these tables, the right table locality for optimizing access to their data is . Apply this table locality to the remaining tables. These statements use a CASE statement to put data for a given city in the right region and can take around 1 minute to complete for each table.

rides

ALTER TABLE rides ADD COLUMN region crdb_internal_region AS (
  CASE WHEN city = 'amsterdam' THEN 'europe-west1'
       WHEN city = 'paris' THEN 'europe-west1'
       WHEN city = 'rome' THEN 'europe-west1'
       WHEN city = 'new york' THEN 'us-east1'
       WHEN city = 'boston' THEN 'us-east1'
       WHEN city = 'washington dc' THEN 'us-east1'
       WHEN city = 'san francisco' THEN 'us-west1'
       WHEN city = 'seattle' THEN 'us-west1'
       WHEN city = 'los angeles' THEN 'us-west1'
  END
) STORED;
ALTER TABLE rides ALTER COLUMN REGION SET NOT NULL;
ALTER TABLE rides SET LOCALITY REGIONAL BY ROW AS "region";

user_promo_codes

ALTER TABLE user_promo_codes ADD COLUMN region crdb_internal_region AS (
  CASE WHEN city = 'amsterdam' THEN 'europe-west1'
       WHEN city = 'paris' THEN 'europe-west1'
       WHEN city = 'rome' THEN 'europe-west1'
       WHEN city = 'new york' THEN 'us-east1'
       WHEN city = 'boston' THEN 'us-east1'
       WHEN city = 'washington dc' THEN 'us-east1'
       WHEN city = 'san francisco' THEN 'us-west1'
       WHEN city = 'seattle' THEN 'us-west1'
       WHEN city = 'los angeles' THEN 'us-west1'
  END
) STORED;
ALTER TABLE user_promo_codes ALTER COLUMN REGION SET NOT NULL;
ALTER TABLE user_promo_codes SET LOCALITY REGIONAL BY ROW AS "region";

users

ALTER TABLE users ADD COLUMN region crdb_internal_region AS (
  CASE WHEN city = 'amsterdam' THEN 'europe-west1'
       WHEN city = 'paris' THEN 'europe-west1'
       WHEN city = 'rome' THEN 'europe-west1'
       WHEN city = 'new york' THEN 'us-east1'
       WHEN city = 'boston' THEN 'us-east1'
       WHEN city = 'washington dc' THEN 'us-east1'
       WHEN city = 'san francisco' THEN 'us-west1'
       WHEN city = 'seattle' THEN 'us-west1'
       WHEN city = 'los angeles' THEN 'us-west1'
  END
) STORED;
ALTER TABLE users ALTER COLUMN REGION SET NOT NULL;
ALTER TABLE users SET LOCALITY REGIONAL BY ROW AS "region";

vehicle_location_histories

ALTER TABLE vehicle_location_histories ADD COLUMN region crdb_internal_region AS (
  CASE WHEN city = 'amsterdam' THEN 'europe-west1'
       WHEN city = 'paris' THEN 'europe-west1'
       WHEN city = 'rome' THEN 'europe-west1'
       WHEN city = 'new york' THEN 'us-east1'
       WHEN city = 'boston' THEN 'us-east1'
       WHEN city = 'washington dc' THEN 'us-east1'
       WHEN city = 'san francisco' THEN 'us-west1'
       WHEN city = 'seattle' THEN 'us-west1'
       WHEN city = 'los angeles' THEN 'us-west1'
  END
) STORED;
ALTER TABLE vehicle_location_histories ALTER COLUMN REGION SET NOT NULL;
ALTER TABLE vehicle_location_histories SET LOCALITY REGIONAL BY ROW AS "region";

vehicles

ALTER TABLE vehicles ADD COLUMN region crdb_internal_region AS (
  CASE WHEN city = 'amsterdam' THEN 'europe-west1'
       WHEN city = 'paris' THEN 'europe-west1'
       WHEN city = 'rome' THEN 'europe-west1'
       WHEN city = 'new york' THEN 'us-east1'
       WHEN city = 'boston' THEN 'us-east1'
       WHEN city = 'washington dc' THEN 'us-east1'
       WHEN city = 'san francisco' THEN 'us-west1'
       WHEN city = 'seattle' THEN 'us-west1'
       WHEN city = 'los angeles' THEN 'us-west1'
  END
) STORED;
ALTER TABLE vehicles ALTER COLUMN REGION SET NOT NULL;
ALTER TABLE vehicles SET LOCALITY REGIONAL BY ROW AS "region";

Step 3. View noncompliant replicas

Next, check the to see which ranges are still not in compliance with your desired domiciling: that data on EU-based entities (users, etc.) does not leave EU-based nodes. On a small demo cluster like this one, the data movement from the previous step should finish quickly; on larger clusters, the rebalancing process may take longer. With the default settings, you should expect some replicas in the cluster to be violating this constraint. Those replicas will appear in the violatingConstraints field of the output. This is because are enabled by default in to enable stale reads of data in from outside those tables’ . For many use cases, this is preferred, but it keeps you from meeting the domiciling requirements for this example. In order to check the critical nodes status endpoint you first need to get an authentication cookie. To get an authentication cookie, run the command:

cockroach auth-session login demo --certs-dir=/Users/rloveland/.cockroach-demo

It should return output like the following:

  username |     session ID     |                       authentication cookie
-----------+--------------------+---------------------------------------------------------------------
  demo     | 893413786777878529 | session=CIGA9sfJ5Yy0DBIQ4mlvKAxivkm9bq0or4h3AQ==; Path=/; HttpOnly
(1 row)
#
# Example uses:
#
#     curl [-k] --cookie 'session=CIGA9sfJ5Yy0DBIQ4mlvKAxivkm9bq0or4h3AQ==; Path=/; HttpOnly' https://...
#
#     wget [--no-check-certificate] --header='Cookie: session=CIGA9sfJ5Yy0DBIQ4mlvKAxivkm9bq0or4h3AQ==; Path=/; HttpOnly' https://...
#

Using the output above, we can craft a curl invocation to call the critical nodes status endpoint:

curl -X POST --cookie 'session=CIGA9sfJ5Yy0DBIQ4mlvKAxivkm9bq0or4h3AQ==; Path=/; HttpOnly' http://localhost:8080/_status/critical_nodes

{
  "criticalNodes": [
  ],
  "report": {
    "underReplicated": [
    ],
    "overReplicated": [
    ],
    "violatingConstraints": [
      {
        "rangeDescriptor": {
          "rangeId": "93",
          "startKey": "840SwAAB",
          "endKey": "840SwAAC",
          "internalReplicas": [
            {
              "nodeId": 8,
              "storeId": 8,
              "replicaId": 1,
              "type": 0
            },
            {
              "nodeId": 7,
              "storeId": 7,
              "replicaId": 2,
              "type": 0
            },
            {
              "nodeId": 1,
              "storeId": 1,
              "replicaId": 6,
              "type": 0
            },
            {
              "nodeId": 3,
              "storeId": 3,
              "replicaId": 4,
              "type": 5
            },
            {
              "nodeId": 6,
              "storeId": 6,
              "replicaId": 5,
              "type": 5
            }
          ],
          "nextReplicaId": 7,
          "generation": "60",
          "stickyBit": {
            "wallTime": "0",
            "logical": 0,
            "synthetic": false
          }
        },
        "config": {
          "rangeMinBytes": "134217728",
          "rangeMaxBytes": "536870912",
          "gcPolicy": {
            "ttlSeconds": 14400,
            "protectionPolicies": [
            ],
            "ignoreStrictEnforcement": false
          },
          "globalReads": false,
          "numReplicas": 5,
          "numVoters": 3,
          "constraints": [
            {
              "numReplicas": 1,
              "constraints": [
                {
                  "type": 0,
                  "key": "region",
                  "value": "europe-west1"
                }
              ]
            },
            {
              "numReplicas": 1,
              "constraints": [
                {
                  "type": 0,
                  "key": "region",
                  "value": "us-east1"
                }
              ]
            },
            {
              "numReplicas": 1,
              "constraints": [
                {
                  "type": 0,
                  "key": "region",
                  "value": "us-west1"
                }
              ]
            }
          ],
          "voterConstraints": [
            {
              "numReplicas": 0,
              "constraints": [
                {
                  "type": 0,
                  "key": "region",
                  "value": "us-east1"
                }
              ]
            }
          ],
          "leasePreferences": [
            {
              "constraints": [
                {
                  "type": 0,
                  "key": "region",
                  "value": "us-east1"
                }
              ]
            }
          ],
          "rangefeedEnabled": false,
          "excludeDataFromBackup": false
        }
      },
    ...
    ],
    "unavailable": [
    ],
    "unavailableNodeIds": [
    ]
  }
}

Based on this output, you can see that several replicas are out of compliance for the reason described above: the presence of non-voting replicas in other regions to enable fast stale reads from those regions. To get more information about the ranges that are out of compliance, you can use a SQL statement like the one below.

SELECT * FROM [SHOW RANGES FROM DATABASE movr] WHERE range_id = 93;

       start_key      |            end_key            | range_id |  replicas   |                                                      replica_localities                                                      | voting_replicas | non_voting_replicas | learner_replicas | split_enforced_until
----------------------+-------------------------------+----------+-------------+------------------------------------------------------------------------------------------------------------------------------+-----------------+---------------------+------------------+-----------------------
  /Table/107/5/"\xc0" | /Table/107/5/"\xc0"/PrefixEnd |       93 | {1,3,6,7,8} | {"region=us-east1,az=b","region=us-east1,az=d","region=us-west1,az=c","region=europe-west1,az=b","region=europe-west1,az=c"} | {8,7,1}         | {3,6}               | {}               | NULL
(1 row)

Step 4. Apply stricter replica placement settings

(Recommended) Use ADD SUPER REGION
(Not recommended) Use PLACEMENT RESTRICTED

To ensure that data on EU-based users, vehicles, etc. from is stored only on EU-based nodes in the cluster, you can use . This will ensure that and whose are in the super region will have all of their stored only in regions that are members of the super region.Next, use the statement:

ALTER DATABASE movr ADD SUPER REGION "europe" VALUES "europe-west1";

You have now created a super region with only one region. The updated replica placement settings should start to apply immediately.

does not affect the replica placement for , which are designed to provide fast, up-to-date reads from all .

This method is not recommended, and is documented for backwards compatibility. Most users should use ALTER DATABASE ... ADD SUPER REGION which allows for region survival as well as providing data placement.To ensure that data on EU-based users, vehicles, etc. from is stored only on EU-based nodes in the cluster, you must disable the use of on all of the in this database. You can do this using the statement.To use this statement, you must set the enable_multiregion_placement_policy or the sql.defaults.multiregion_placement_policy.enabled :

SET enable_multiregion_placement_policy=on;

Next, use the statement to disable non-voting replicas for regional tables:

ALTER DATABASE movr PLACEMENT RESTRICTED;

The restricted replica placement settings should start to apply immediately.If you want to do data domiciling for databases with using the higher-level multi-region abstractions, you must use super regions. using the higher-level , you must use . Using will not work for databases that are set up with region survival goals.

does not affect the replica placement for , which are designed to provide fast, up-to-date reads from all .

Use instead of the sql.defaults.* . This allows you to set a default value for all users for any that applies during login, making the sql.defaults.* cluster settings redundant.

Step 5. Verify updated replica placement

Now that you have restricted the placement of non-voting replicas for all , you can check the to see the effects. In a few seconds, you should see that the violatingConstraints key in the JSON response shows that there are no longer any replicas violating their constraints:

curl -X POST http://localhost:8080/_status/critical_nodes

{
  "criticalNodes": [
  ],
  "report": {
    "underReplicated": [
    ],
    "overReplicated": [
    ],
    "violatingConstraints": [
    ],
    "unavailable": [
    ],
    "unavailableNodeIds": [
    ]
  }
}

The output above shows that there are no replicas that do not meet the data domiciling goal. As described above, does not affect the replica placement for , so these replicas are considered to be in compliance. Now that you have verified that the system is configured to meet the domiciling requirement, it’s a good idea to check the on a regular basis (via automation of some kind) to ensure that the requirement continues to be met.

The steps above are necessary but not sufficient to accomplish a data domiciling solution using CockroachDB. Be sure to review the limitations of CockroachDB for data domiciling and design your total solution with those limitations in mind.

Known limitations

Using CockroachDB as part of your approach to data domiciling has several limitations:

When using the infer_rbr_region_col_using_constraint option, inserting rows with DEFAULT for the region column uses the database’s primary region instead of inferring the region from the parent table via foreign-key constraint.
When columns are , a subset of data from the indexed columns may appear in or other system tables. CockroachDB synchronizes these system ranges and system tables across nodes. This synchronization does not respect any multi-region settings applied via either the , or the low-level mechanism.
can be used for data placement but these features were historically built for performance, not for domiciling. The replication system’s top priority is to prevent the loss of data and it may override the zone configurations if necessary to ensure data durability. For more information, see .
If your are kept in the region where they were generated, there is some cross-region leakage (like the system tables described previously), but the majority of user data that makes it into the logs is going to be homed in that region. If that’s not strong enough, you can use the to strip all raw data from the logs. You can also limit your log retention entirely.
If you start a node with a flag that says the node is in region A, but the node is actually running in some region B, data domiciling based on the inferred node placement will not work. A CockroachDB node only knows its locality based on the text supplied to the --locality flag; it can not ensure that it is actually running in that physical location.
are not compatible with databases containing tables. CockroachDB does not prevent you from defining secondary regions on databases with regional by row tables, but the interaction of these features is not supported. Therefore, Cockroach Labs recommends that you avoid defining secondary regions on databases that use regional by row table configurations.
With enforce_home_region enabled, CockroachDB currently validates home-region access during plan build. This can falsely reject queries (e.g., lookup joins) that would only read local data at execution time, returning a Query has no home region error.

Get Started

CockroachDB and AI

Feature Overview

Data Resilience

Connect to an Application

Self-Hosted Deployments

Schema Design

Reads and Writes

Stream Data

Cross-Cluster Replication

Multi-Region Capabilities

Optimize Performance

Integrate

Data Domiciling with CockroachDB

Overview

Before you begin

Example

Step 1. Start a simulated multi-region cluster

Step 2. Apply multi-region SQL abstractions

Step 3. View noncompliant replicas

Step 4. Apply stricter replica placement settings

Step 5. Verify updated replica placement

Known limitations

See also

​Overview

​Before you begin

​Example

​Step 1. Start a simulated multi-region cluster

​Step 2. Apply multi-region SQL abstractions

​Step 3. View noncompliant replicas

​Step 4. Apply stricter replica placement settings

​Step 5. Verify updated replica placement

​Known limitations

​See also

Overview

Before you begin

Example

Step 1. Start a simulated multi-region cluster

Step 2. Apply multi-region SQL abstractions

Step 3. View noncompliant replicas

Step 4. Apply stricter replica placement settings

Step 5. Verify updated replica placement

Known limitations

See also