Because of CockroachDB's multi-active availability design, you can perform a "rolling upgrade" of your CockroachDB cluster. This means that you can upgrade nodes one at a time without interrupting the cluster's overall health and operations.

Note:

This page shows you how to upgrade to the latest v2.0 release (v2.0.7) from v1.1.x, or from any patch release in the v2.0.x series. To upgrade within the v1.1.x series, see the v1.1 version of this page.

Step 1. Verify that you can upgrade

When upgrading, you can skip patch releases, but you cannot skip full releases. Therefore, if you are upgrading from v1.0.x to v2.0:

  1. First upgrade to v1.1. Be sure to complete all the steps, include the finalization step (i.e., SET CLUSTER SETTING version = '1.1';).

  2. Then return to this page and perform a second rolling upgrade to v2.0.

If you are upgrading from v1.1.x or from any v2.0.x patch release, you do not have to go through intermediate releases; continue to step 2.

Step 2. Prepare to upgrade

Before starting the upgrade, complete the following steps.

  1. Make sure your cluster is behind a load balancer, or your clients are configured to talk to multiple nodes. If your application communicates with a single node, stopping that node to upgrade its CockroachDB binary will cause your application to fail.

  2. Verify the cluster's overall health by running the cockroach node status command against any node in the cluster.

    In the response:

    • If any nodes that should be live are not listed, identify why the nodes are offline and restart them before begining your upgrade.
    • Make sure the build field shows the same version of CockroachDB for all nodes. If any nodes are behind, upgrade them to the cluster's current version first, and then start this process over.
    • Make sure ranges_unavailable and ranges_underreplicated show 0 for all nodes. If there are unavailable or underreplicated ranges in your cluster, performing a rolling upgrade increases the risk that ranges will lose a majority of their replicas and cause cluster unavailability. Therefore, it's important to identify and resolve the cause of range unavailability and underreplication before beginning your upgrade.
      Tip:
      Pass the --ranges or --all flag to include these range details in the response.
  3. Capture the cluster's current state by running the cockroach debug zip command against any node in the cluster. If the upgrade does not go according to plan, the captured details will help you and Cockroach Labs troubleshoot issues.

  4. Back up the cluster. If the upgrade does not go according to plan, you can use the data to restore your cluster to its previous state.

Step 3. Perform the rolling upgrade

For each node in your cluster, complete the following steps.

Tip:

We recommend creating scripts to perform these steps instead of performing them manually.

Warning:

Upgrade only one node at a time, and wait at least one minute after a node rejoins the cluster to upgrade the next node. Simultaneously upgrading more than one node increases the risk that ranges will lose a majority of their replicas and cause cluster unavailability.

  1. Connect to the node.

  2. Stop the cockroach process.

    Without a process manager like systemd, use this command:

    copy
    icon/buttons/copy
    $ pkill cockroach
    

    If you are using systemd as the process manager, use this command to stop a node without systemd restarting it:

    copy
    icon/buttons/copy
    $ systemctl stop <systemd config filename>
    

    Then verify that the process has stopped:

    copy
    icon/buttons/copy
    $ ps aux | grep cockroach
    

    Alternately, you can check the node's logs for the message server drained and shutdown completed.

  3. Download and install the CockroachDB binary you want to use:

    copy

    icon/buttons/copy

    # Get the CockroachDB tarball:
    $ curl -O https://binaries.cockroachdb.com/cockroach-v2.0.7.darwin-10.9-amd64.tgz
    
    copy
    icon/buttons/copy
    # Extract the binary:
    $ tar xfz cockroach-v2.0.7.darwin-10.9-amd64.tgz
    

    copy

    icon/buttons/copy

    # Get the CockroachDB tarball:
    $ wget https://binaries.cockroachdb.com/cockroach-v2.0.7.linux-amd64.tgz
    
    copy
    icon/buttons/copy
    # Extract the binary:
    $ tar xfz cockroach-v2.0.7.linux-amd64.tgz
    

  4. If you use cockroach in your $PATH, rename the outdated cockroach binary, and then move the new one into its place:

    copy

    icon/buttons/copy

    $ i="$(which cockroach)"; mv "$i" "$i"_old
    
    copy
    icon/buttons/copy
    $ cp -i cockroach-v2.0.7.darwin-10.9-amd64/cockroach /usr/local/bin/cockroach
    

    copy

    icon/buttons/copy

    $ i="$(which cockroach)"; mv "$i" "$i"_old
    
    copy
    icon/buttons/copy
    $ cp -i cockroach-v2.0.7.linux-amd64/cockroach /usr/local/bin/cockroach
    

  5. Start the node to have it rejoin the cluster.

    Without a process manager like systemd, use this command:

    copy
    icon/buttons/copy
    $ cockroach start --join=[IP address of any other node] [other flags]
    

    [other flags] includes any flags you use to a start node, such as it --host.

    If you are using systemd as the process manager, run this command to start the node:

    copy
    icon/buttons/copy
    $ systemctl start <systemd config filename>
    
  6. Verify the node has rejoined the cluster through its output to stdout or through the admin UI.

  7. If you use cockroach in your $PATH, you can remove the old binary:

    copy
    icon/buttons/copy
    $ rm /usr/local/bin/cockroach_old
    

    If you leave versioned binaries on your servers, you do not need to do anything.

  8. Wait at least one minute after the node has rejoined the cluster, and then repeat these steps for the next node.

Step 4. Monitor the upgraded cluster

After upgrading all nodes in the cluster, monitor the cluster's stability and performance for at least one day.

Warning:

During this phase, avoid using any new v2.0 features. Doing so may prevent you from being able to perform a rolling downgrade to v1.1, if necessary. Also, it is not recommended to run enterprise BACKUP and RESTORE jobs during this phase, as some features like detecting schema changes or ensuring correct target expansion may behave differently in mixed version clusters.

Step 5. Finalize or revert the upgrade

Once you have monitored the upgraded cluster for at least one day:

Finalize the upgrade

Note:

These final steps are required after upgrading from v1.1.x to v2.0. For upgrades within the v2.0.x series, you do not need to take any further action.

  1. Start the cockroach sql shell against any node in the cluster.

  2. Use the crdb_internal.node_executable_version() built-in function to check the CockroachDB version running on the node:

    copy
    icon/buttons/copy
    > SELECT crdb_internal.node_executable_version();
    

    Make sure the version matches your expectations. Since you upgraded each node, this version should be running on all other nodes as well.

  3. Use the same function to finalize the upgrade:

    copy
    icon/buttons/copy
    > SET CLUSTER SETTING version = crdb_internal.node_executable_version();
    

    This step enables certain performance improvements and bug fixes that were introduced in v2.0. Note, however, that after completing this step, it will no longer be possible to perform a rolling downgrade to v1.1. In the event of a catastrophic failure or corruption due to usage of new features requiring v2.0, the only option is to start a new cluster using the old binary and then restore from one of the backups created prior to finalizing the upgrade.

Revert the upgrade

  1. Run the cockroach debug zip command against any node in the cluster to capture your cluster's state.

  2. Reach out for support from Cockroach Labs, sharing your debug zip.

  3. If necessary, downgrade the cluster by repeating the rolling upgrade process, but this time switching each node back to the previous version.

See Also



Yes No