Protected timestamps will protect changefeed data from garbage collection in the following scenarios:
- The downstream changefeed sink is unavailable. Protected timestamps will protect changes until you either cancel the changefeed or the sink becomes available once again.
- (deprecated) You pause a changefeed with the
protect_data_from_gc_on_pauseoption enabled. Or, a changefeed with
protect_data_from_gc_on_pausepauses from a retryable error. Protected timestamps will protect changes until you resume the changefeed.
However, if the changefeed lags too far behind, the protected changes could lead to an accumulation of garbage. This could result in increased disk usage and degraded performance for some workloads.
Prevent garbage accumulation
To prevent an accumulation of protected changes that could impact performance, consider defining an expiration duration:
changefeed.protect_timestamp.max_age: a cluster setting to define a protected timestamp expiration for all changefeeds on a cluster.
gc_protect_expires_after: a changefeed option to define a protected timestamp expiration for a changefeed.
In general, a few hours to a few days are appropriate values for these settings. A lower protected timestamp expiration should not have adverse effects on your changefeed as long as the changefeed is running. However, if the changefeed pauses, you will need to resume it before the defined expiration time. The value of either
gc_protect_expires_after should reflect how much time the changefeed may remain paused before it is canceled.
New in v23.2:
By default, the
changefeed.protect_timestamp.max_age cluster setting sets the maximum time that changefeeds making no forward progress will hold protected timestamp records. Once the
changefeed.protect_timestamp.max_age duration is reached, the changefeed will fail with a permanent error. As a result, it is critical to monitor for changefeed failures because changefeeds will eventually fail with an unrecoverable error if they cannot progress before the duration is reached.
This cluster setting is enabled by default to 4 days. To disable expiration of protected timestamp records, you can set
0; however, Cockroach Labs recommends implementing an expiration.
changefeed.protect_timestamp.max_age is a cluster-wide setting affecting all changefeeds.
SET CLUSTER SETTING changefeed.protect_timestamp.max_age = '120h';
changefeed.protect_timestamp.max_age applies only to newly created changefeeds in v23.2.
If you are upgrading to v23.2, we recommend setting
protect_data_from_gc_on_pause on any existing changefeeds to ensure that it does not enter a situation of infinite retries, which could prevent garbage collection. You can use the
ALTER CHANGEFEED statement to add
protect_data_from_gc_on_pause to existing changefeeds.
gc_protect_expires_after option automatically expires the protected timestamp records that are older than the defined duration and cancels a changefeed job.
CREATE CHANGEFEED FOR TABLE db.table INTO 'external://sink' WITH on_error='pause', gc_protect_expires_after='24h';
If this changefeed runs into a retryable error, protected timestamps will protect changes for up to 24 hours. After this point, if the changefeed has not made any progress in the past 24 hours, the protected timestamp records will expire and the changefeed job will be canceled to prevent accumulation of garbage.
gc_protect_expires_after is an option applied to a single changefeed. To enable an expiration for protected timestamp records across changefeeds on the cluster, use the
changefeed.protect_timestamp.max_age cluster setting.
You can track changefeed metrics to monitor how changefeeds are using protected timestamps. Refer to Protected timestamp and garbage collection monitoring.
Release protected timestamp records
To release the protected timestamps manually and allow garbage collection to resume, you can:
We recommend monitoring storage and the number of running changefeeds. If a changefeed is not advancing and is retrying, it will (without limit) accumulate garbage while it retries to run up to the settings outlined in Prevent garbage accumulation.
The only ways for changefeeds to not protect data are:
- You cancel the changefeed.
- The changefeed fails without