Technical Advisory 82309

On this page Carat arrow pointing down

Publication date: June 3, 2022

Description

During or after an upgrade from CockroachDB v21.2.x to v22.1.0, existing changefeeds will stop emitting data.

Changefeeds created in a v22.1.x release or later are unaffected.

Statement

Cockroach Labs has discovered a bug that results in changefeeds failing to emit data during or after an upgrade to v22.1.0 from v21.2.x. This failure may be relatively silent, in that no errors will appear in the logs.

This bug has been fixed in CockroachDB v22.1.1.

If you had changefeeds running in v21.2.x, and have already upgraded to v22.1.0, your changefeeds are affected. You can confirm this with the following steps:

  • You observe that your downstream sink (such as Kafka) is not receiving data.
  • After the upgrade begins, metrics related to the number of messages (changefeed.emitted_messages) and message sizes (changefeed.emitted_bytes) will be close to 0. This metric will begin dropping immediately, as soon as nodes start upgrading.
  • The decay in the number of emitted messages may be gradual (depending on the upgrade speed), but once all nodes have upgraded, 0 messages will be emitted.

If the changefeeds run with the resolved option, your sink will continue receiving RESOLVED messages but will not receive any data messages. Although changefeeds will appear to be up to date, these RESOLVED messages are incorrect and should be disregarded.

This is resolved in CockroachDB by PR #82312.

The fix has been applied to maintenance version v22.1.1 of CockroachDB.

This public issue is tracked by #82309.

Mitigation

If you have not yet upgraded to a v22.1 major version, upgrade to the v22.1.1 patch release or later.

Avoid creation of changefeeds during the upgrade process, i.e., while in a mixed-version state. If you have created changefeeds before or during the upgrade process, you should expect them to experience the above-described issue.

If you have already upgraded to v22.1.0, and observed a failure of changefeeds, you will need to cancel and recreate your changefeed(s).

You can make a choice as to how much “catch up” data you wish to emit from the newly-created changefeeds:

  • In the default case, creating a changefeed will emit (scan) the entire table to your sink, and then begin emitting all new events. This does not require any special options, as it is the default.

    icon/buttons/copy
    CREATE CHANGEFEED FOR TABLE name, name2, name3 
    INTO '{sink URI}'
    WITH '{your existing options}';
    
  • Alternatively, you may wish to be more precise in restarting (catching up) from the point in time where the failure began. This is likely preferable for large tables where you do not wish to incur the full initial scan.

    • To do this, you will need to choose a point in time. We recommend choosing a few minutes before your upgrade began.

      Note:

      The usual method of retrieving the high-water timestamp cursor is invalid in the presence of this bug.

      icon/buttons/copy
      CREATE CHANGEFEED FOR TABLE name, name2, name3 
      INTO '{sink URI}'
      WITH '{your existing options}', cursor='{see below}';
      
    • To get a cursor, note the time your upgrade began in UTC. You can use either:

      • An absolute point in time (e.g., 2022-06-02 13:59:00+00 in this example):

        icon/buttons/copy
        SELECT 1e9*('2022-06-02 13:59:00+00'::timestamptz)::int;
        
      • A relative time interval since the upgrade began (e.g., 3h10m in this example):

        icon/buttons/copy
        SELECT 1e9*(now()- interval '3h10m')::int;
        

Alternatively, you may wish to restart changefeeds to pick up new changes starting now, skipping the initial scan, and skipping change events since the upgrade began. You can do this with the following:

icon/buttons/copy
CREATE CHANGEFEED FOR TABLE name, name2, name3 
INTO '{sink URI}'
WITH '{your existing options}', no_initial_scan;

Impact

During and after the upgrade process from v21.2.x to v22.1.0, changefeeds will stop emitting data. This failure may go undetected, depending on your monitoring.

Please reach out to the support team if you need further information or assistance.


Yes No
On this page

Yes No