Technical Advisory 98779

On this page Carat arrow pointing down

Publication date: March 29, 2023

Description

A bug was introduced in CockroachDB v22.2.6 that can cause a restore job to skip some data files upon resumption of an in-progress RESTORE. This could potentially lead to some missing rows after the restore job succeeds.

A resumption could occur due to manual user action, such as pausing and resuming the job, or, if a job is automatically retried internally, such as after a node restart. Therefore, this failure might not be noticed by an operator.

Statement

This is resolved in CockroachDB by #99066, which removes the incorrect optimization, and reverts back to the behavior of v22.2.5.

The fix has been applied to maintenance releases of CockroachDB v22.2.7 and later. This problem can also be mitigated immediately by updating the bulkio.restore.use_simple_import_spans cluster setting. See the following Mitigation section for details.

This public issue is tracked by #98779.

Mitigation

Users of CockroachDB v22.2.6 are encouraged to upgrade to v22.2.7 or a later version.

Until such an upgrade, users of v22.2.6 can mitigate immediately by running:

icon/buttons/copy
SET CLUSTER SETTING bulkio.restore.use_simple_import_spans = true;

This setting has already been automatically applied to all affected CockroachDB Cloud clusters.

Impact

For users of CockroachDB v22.2.6, a restore that is resumed for any reason may give incorrect results. The resume could be due to manual job control actions, automatic retries if nodes restart, or after some types of internal rebalancing events.

Enabling the cluster setting described in the Mitigation section, or upgrading to v22.2.7 or later, will prevent this.

Questions about any technical alert can be directed to our support team.


Yes No
On this page

Yes No