Protect Changefeed Data from Garbage Collection

On this page Carat arrow pointing down

By default, protected timestamps will protect changefeed data from garbage collection up to the time of the checkpoint.

Protected timestamps will protect changefeed data from garbage collection in the following scenarios:

  • The downstream changefeed sink is unavailable. Protected timestamps will protect changes until you either cancel the changefeed or the sink becomes available once again.
  • (deprecated) You pause a changefeed with the protect_data_from_gc_on_pause option enabled. Or, a changefeed with protect_data_from_gc_on_pause pauses from a retryable error. Protected timestamps will protect changes until you resume the changefeed.

However, if the changefeed lags too far behind, the protected changes could lead to an accumulation of garbage. This could result in increased disk usage and degraded performance for some workloads.

Prevent garbage accumulation

To prevent an accumulation of protected changes that could impact performance, consider defining an expiration duration:

In general, a few hours to a few days are appropriate values for these settings. A lower protected timestamp expiration should not have adverse effects on your changefeed as long as the changefeed is running. However, if the changefeed pauses, you will need to resume it before the defined expiration time. The value of either changefeed.protect_timestamp.max_age or gc_protect_expires_after should reflect how much time the changefeed may remain paused before it is canceled.

changefeed.protect_timestamp.max_age

By default, the changefeed.protect_timestamp.max_age cluster setting sets the maximum time that changefeeds making no forward progress will hold protected timestamp records. Once the changefeed.protect_timestamp.max_age duration is reached, the changefeed will fail with a permanent error. As a result, it is critical to monitor for changefeed failures because changefeeds will eventually fail with an unrecoverable error if they cannot progress before the duration is reached.

This cluster setting is enabled by default to 4 days. To disable expiration of protected timestamp records, you can set changefeed.protect_timestamp.max_age to 0; however, Cockroach Labs recommends implementing an expiration.

changefeed.protect_timestamp.max_age is a cluster-wide setting affecting all changefeeds.

icon/buttons/copy
SET CLUSTER SETTING changefeed.protect_timestamp.max_age = '120h';
Note:

changefeed.protect_timestamp.max_age applies only to newly created changefeeds in v23.2.

If you are upgrading to v23.2, we recommend setting protect_data_from_gc_on_pause on any existing changefeeds to ensure that it does not enter a situation of infinite retries, which could prevent garbage collection. You can use the ALTER CHANGEFEED statement to add protect_data_from_gc_on_pause to existing changefeeds.

gc_protect_expires_after

The gc_protect_expires_after option automatically expires the protected timestamp records that are older than the defined duration and cancels a changefeed job.

For example:

icon/buttons/copy
CREATE CHANGEFEED FOR TABLE db.table INTO 'external://sink' WITH on_error='pause', gc_protect_expires_after='24h';

If this changefeed runs into a retryable error, protected timestamps will protect changes for up to 24 hours. After this point, if the changefeed has not made any progress in the past 24 hours, the protected timestamp records will expire and the changefeed job will be canceled to prevent accumulation of garbage.

gc_protect_expires_after is an option applied to a single changefeed. To enable an expiration for protected timestamp records across changefeeds on the cluster, use the changefeed.protect_timestamp.max_age cluster setting.

Tip:

You can track changefeed metrics to monitor how changefeeds are using protected timestamps. Refer to Protected timestamp and garbage collection monitoring.

Release protected timestamp records

To release the protected timestamps manually and allow garbage collection to resume, you can:

  • Cancel the changefeed job.
  • Resume a paused changefeed job.

We recommend monitoring storage and the number of running changefeeds. If a changefeed is not advancing and is retrying, it will (without limit) accumulate garbage while it retries to run up to the settings outlined in Prevent garbage accumulation.

The only ways for changefeeds to not protect data are:

  • You cancel the changefeed.
  • The changefeed fails without on_error=pause set.

See also


Yes No
On this page

Yes No