Advanced Changefeed Configuration

The configurations and settings explained on this page will have a significant impact on a changefeed’s behavior and could potentially affect a cluster’s performance. Thoroughly test before deploying any changes to production.

The following sections describe performance, settings, configurations, and details to tune :

Changefeed performance
High durability delivery
High throughput

Some options for the kafka_sink_config and webhook_sink_config parameters are discussed on this page. However, for more information on specific tuning for Kafka and Webhook sinks, refer to the following pages:

Changefeed performance

By default, changefeeds are integrated with elastic CPU, which helps to prevent changefeeds from affecting foreground traffic. For example, and can be CPU-intensive. This integration will result in a cluster prioritizing SQL traffic over changefeeds. Since this may affect changefeed latency, you can monitor your cluster’s on the and changefeed latency on the . This is controlled by the following , which are by default enabled:

changefeed.cpu.per_event_elastic_control.enabled
kvadmission.rangefeed_catchup_scan_elastic_control.enabled

For a more technical explanation of elastic CPU, refer to the Rubbing control theory on the Go scheduler blog post.

Mux rangefeeds

MuxRangefeed is a subsystem that improves the performance of rangefeeds with scale. It significantly reduces the overhead of running . Without MuxRangefeed enabled the number of RPC streams is proportional with the number of ranges in a table. For example, a large table could have tens of thousands of ranges. With MuxRangefeed enabled, this proportion improves so that the number of RPC streams is relative to the number of nodes in a cluster. We recommend enabling its functionality with the changefeed.mux_rangefeed.enabled . In v24.1 and later versions, MuxRangefeed is enabled by default. Use the following workflow to enable MuxRangefeed:

Enable the cluster setting:

SET CLUSTER SETTING changefeed.mux_rangefeed.enabled = true;

After enabling the setting, pause the changefeed:
PAUSE JOB {job ID};
You can use to retrieve the job ID.
Resume the changefeed for the cluster setting to take effect:
RESUME JOB {job ID};

Latency in changefeeds

When you are running large workloads, changefeeds can encounter or cause latency in a cluster in the following ways:

Changefeeds can have an impact on SQL latency in the cluster generally.
Changefeeds can encounter latency in events emitting. This latency is the total time CockroachDB takes to:
- Commit writes to the database.
- Encode .
- Deliver the message to the .

The following reduce bursts of work so that updates are paced steadily over time.

We do not recommend adjusting these settings unless you are running a large workload, or are working with the Cockroach Labs . Thoroughly test different cluster setting configurations before deploying to production.

`kv.closed_timestamp.target_duration`

Default: 3s

Adjusting kv.closed_timestamp.target_duration could have a detrimental impact on . If you are using follower reads, refer to the kv.rangefeed.closed_timestamp_refresh_interval cluster setting instead to ease changefeed impact on foreground SQL latency.

kv.closed_timestamp.target_duration controls the target lag duration, which determines how far behind the current time CockroachDB will attempt to maintain the closed timestamp. For example, with the default value of 3s, if the current time is 12:30:00 then CockroachDB will attempt to keep the closed timestamp at 12:29:57 by possibly retrying or aborting ongoing writes that are below this time. A changefeed aggregates checkpoints across all ranges, and once the timestamp on all the ranges advances, the changefeed can then . In the context of changefeeds, kv.closed_timestamp.target_duration affects how old the checkpoints will be, which will determine the latency before changefeeds can consider the history of an event complete.

`kv.rangefeed.closed_timestamp_refresh_interval`

Default: 3s This setting controls the interval at which updates are delivered to and in turn emitted as a . Increasing the interval value will lengthen the delay between each checkpoint, which will increase the latency of changefeed checkpoints, but reduce the impact on SQL latency due to on the cluster. This happens because every range with a rangefeed has to emit a checkpoint event with this 3s interval. As an example, 1 million ranges would result in 330,000 events per second, which would use more CPU resources. If you are running changefeeds at a large scale and notice foreground SQL latency, we recommend increasing this setting. As a result, adjusting kv.rangefeed.closed_timestamp_refresh_interval can affect changefeeds encountering latency and changefeeds causing foreground SQL latency. In clusters running large-scale workloads, it may be helpful to:

Decrease the value for a lower changefeed emission latency — that is, how often a client can confirm that all relevant events up to a certain timestamp have been emitted.
Increase the value to reduce the potential impact of changefeeds on SQL latency. This will lower the resource cost of changefeeds, which can be especially important for workloads with tables in the TB range of data.

It is important to note that a changefeed at default configuration does not checkpoint more often than once every 30 seconds. When you create a changefeed with , you can adjust this with the option.

`kv.closed_timestamp.side_transport_interval`

Default: 200ms The kv.closed_timestamp.side_transport_interval cluster setting controls how often the closed timestamp is updated. Although the closed timestamp is updated every 200ms, CockroachDB will only emit an event across the rangefeed containing the closed timestamp value every 3s as per the kv.rangefeed.closed_timestamp_refresh_interval value. kv.closed_timestamp.side_transport_interval is helpful when ranges are inactive. The closed timestamp subsystem usually propagates . However, an idle range that does not see any writes does not receive any Raft commands, so it would stall. This setting is an efficient mechanism to broadcast closed timestamp updates for all idle ranges between nodes. Adjusting kv.closed_timestamp.side_transport_interval will affect both and changefeeds. While you can use kv.closed_timestamp.side_transport_interval to tune the checkpointing interval, we recommend kv.rangefeed.closed_timestamp_refresh_interval if you are using follower reads.

`kv.rangefeed.closed_timestamp_smear_interval`

Default: 1ms This setting provides a mechanism to pace the notifications to follower replicas. At the default, the closed timestamp smear interval makes rangefeed closed timestamp delivery less spiky, which can reduce its impact on foreground SQL query latency. For example, if you have a large table, and one of the nodes in the cluster is hosting 6000 ranges from this table. Normally, the rangefeed system will wake up every kv.rangefeed.closed_timestamp_refresh_interval (default 3s) and every 3 seconds it will publish checkpoints for all 6000 ranges. In this scenario, the kv.rangefeed.closed_timestamp_smear_interval setting takes the 3s frequency and divides it into 1ms chunks. Instead of publishing checkpoints for all 6000 ranges, it will publish checkpoints for 2 ranges every 1ms. This produces a more predictable and level load, rather than spiky, large bursts of workload.

Lagging ranges

New in v23.2: Use the changefeed.lagging_ranges metric to track the number of that are behind in a changefeed. This is calculated based on the :

lagging_ranges_threshold sets a duration from the present that determines the length of time a range is considered to be lagging behind, which will then track in the metric. Note that ranges undergoing an for longer than the threshold duration are considered to be lagging. Starting a changefeed with an initial scan on a large table will likely increment the metric for each range in the table. As ranges complete the initial scan, the number of ranges lagging behind will decrease.
- Default: 3m
lagging_ranges_polling_interval sets the interval rate for when lagging ranges are checked and the lagging_ranges metric is updated. Polling adds latency to the lagging_ranges metric being updated. For example, if a range falls behind by 3 minutes, the metric may not update until an additional minute afterward.
- Default: 1m

New in v23.2.13: Use the changefeed.total_ranges metric to monitor the number of ranges that are watched by participating in the changefeed job. If you’re experiencing lagging ranges, changefeed.total_ranges may indicate that the number of ranges watched by aggregator processors in the job is unbalanced. You may want to try the changefeed and then it, so that the changefeed replans the work in the cluster. changefeed.total_ranges shares the same polling interval as the changefeed.lagging_ranges metric, which is controlled by the lagging_ranges_polling_interval option.

You can use the option to track the lagging_ranges and total_ranges metric per changefeed.

Tuning for high durability delivery

When designing a system that relies on high durability message delivery—that is, not missing any message acknowledgement at the downstream sink—consider the following settings and configuration in this section:

Pausing changefeeds and garbage collection
Defining Kafka message acknowledgment
Choosing changefeed sinks
Defining schema change behavior

Before tuning these settings, we recommend reading details on our .

Pausing changefeeds and garbage collection

By default, will protect changefeed data from up to the time of the . Protected timestamps will protect changefeed data from garbage collection if the downstream is unavailable until you either the changefeed or the sink becomes available once again. However, if the changefeed lags too far behind, the protected changes could lead to an accumulation of garbage. This could result in increased disk usage and degraded performance for some workloads. For more detail on changefeeds and protected timestamps, refer to . To balance protecting change data and prevent the over-accumulation of garbage, Cockroach Labs recommends creating a changefeed with options to define your protection duration and monitoring your changefeed for protected timestamp record collection.

Protecting change data on pause

with the following options so that your changefeed protects data when it is :

: to protect changes while the changefeed is paused until you the changefeed.
: to pause the changefeed when it encounters an error. By default, changefeeds treat errors as retryable apart from some .
: to automatically expire the that are older than your defined duration and the changefeed job.

Monitoring protected timestamp records

You can monitor changefeed jobs for usage. We recommend setting up for the following metrics:

jobs.changefeed.protected_age_sec: Tracks the age of the oldest record protected by changefeed jobs. We recommend monitoring if protected_age_sec is greater than . As protected_age_sec increases, garbage accumulation increases. will not progress on a table, database, or cluster if the protected timestamp record is present.
jobs.changefeed.currently_paused: Tracks the number of changefeed jobs currently considered . Since paused changefeed jobs can accumulate garbage, it is important to .
jobs.changefeed.expired_pts_records: Tracks the number of expired records owned by changefeed jobs. You can monitor this metric in conjunction with the .
jobs.changefeed.protected_record_count: Tracks the number of records held by changefeed jobs.

Defining Kafka message acknowledgment

To determine what a successful write to Kafka is, you can configure the . The 'RequiredAcks' field specifies what a successful write to Kafka is. CockroachDB —the 'RequiredAcks' value defines the delivery. For high durability delivery, Cockroach Labs recommends setting:

kafka_sink_config='{'RequiredAcks': 'ALL'}'

ALL provides the highest consistency level. A quorum of Kafka brokers that have committed the message must be reached before the leader can acknowledge the write. You must also set acks to ALL in your server-side Kafka configuration for this to provide high durability delivery.

Choosing changefeed sinks

Use or sinks when tuning for high durability delivery in changefeeds. Both Kafka and cloud storage sinks offer built-in advanced protocols, whereas the , while flexible, requires an understanding of how messages are acknowledged and committed by the particular system used for the webhook in order to ensure the durability of message delivery.

Defining schema change behavior

Ensure that data is ingested downstream in its new format after a by using the and options. For example, setting schema_change_events=column_changes and schema_change_policy=stop will trigger an error to the cockroach.log file on a and the changefeed to fail.

Tuning for high throughput

When designing a system that needs to emit a lot of changefeed messages, whether it be steady traffic or a burst in traffic, consider the following settings and configuration in this section:

Setting the resolved option
Batching and buffering messages
Configuring file and message format
Configuring for tables with many ranges
Adjusting concurrent changefeed work

Setting the `resolved` option

When a changefeed emits a message, it force flushes all outstanding messages that have buffered, which will diminish your changefeed’s throughput while the flush completes. Therefore, if you are aiming for higher throughput, we suggest setting the duration higher (e.g., 10 minutes), or not using the resolved option. If you are setting the resolved option when you are aiming for high throughput, you must also consider the option, which defaults to 30s. This option controls how often nodes flush their progress to the . As a result, resolved messages will not be emitted more frequently than the configured min_checkpoint_frequency. Set this option to at least as long as your resolved option duration.

Batching and buffering messages

Batch messages to your sink:
- For a , refer to the parameter for the kafka_sink_config option.
- For a , use the parameter to flush a file when it exceeds the specified size.
- For a , refer to the parameter for the webhook_sink_config option.
Set the cluster setting to a higher limit to give more memory for buffering changefeed data. This setting influences how often the changefeed will flush buffered messages. This is useful during heavy traffic.

Configuring file and message format

Use avro as the emitted message option with Kafka sinks; JSON encoding can potentially create a slowdown.

Compression

Use the when you create a changefeed emitting data files to a . For larger files, set compression to the zstd format.
Use the snappy compression format to emit messages to a sink. If you’re intending to do large batching for Kafka, use the lz4 compression format.

File size

To configure changefeeds emitting to for high throughput, you should consider:

Increasing the parameter to control the size of the files that the changefeed sends to the sink. The default is 16MB. To configure for high throughput, we recommend 32MB – 128MB. Note that this is not a hard limit, and a changefeed will flush the file when it reaches the specified size.
When you compress a file, it will contain many more events.
File size is also dependent on what kind of data the changefeed job is writing. For example, large JSON blobs will quickly fill up the file_size value compared to small rows.
When you change or increase file_size, ensure that you adjust the changefeed.memory.per_changefeed_limit , which has a default of 512MiB. Buffering messages can quickly reach this limit if you have increased the file size.

Configuring for tables with many ranges

If you have a table with 10,000 or more , you should consider increasing the following two . We strongly recommend increasing these settings slowly. That is, increase the setting and then its impact before adjusting further:

kv.rangefeed.catchup_scan_concurrency: The number of catchups a can execute concurrently. The default is 8.
kv.rangefeed.concurrent_catchup_iterators: The number of catchup iterators a store will allow concurrently before queuing. The default is 16.

Adjusting concurrent changefeed work

Increase the , which controls the number of concurrent scan requests per node issued during a . The default behavior, when this setting is at 0, is that the number of scan requests will be 3 times the number of nodes in the cluster (to a maximum of 100). While increasing this number will allow for higher throughput, it will increase the cluster load overall, including CPU and IO usage.
The is on by default. This causes to use time-bound iterators for catch-up scans when possible. Catch-up scans are run for each rangefeed request. This setting improves the performance of changefeeds during some .

Get Started

CockroachDB and AI

Feature Overview

Connect to an Application

Self-Hosted Deployments

Schema Design

Reads and Writes

Stream Data

Cross-Cluster Replication

Multi-Region Capabilities

Optimize Performance

Integrate

Advanced Changefeed Configuration

Changefeed performance

Mux rangefeeds

Latency in changefeeds

`kv.closed_timestamp.target_duration`

`kv.rangefeed.closed_timestamp_refresh_interval`

`kv.closed_timestamp.side_transport_interval`

`kv.rangefeed.closed_timestamp_smear_interval`

Lagging ranges

Tuning for high durability delivery

Pausing changefeeds and garbage collection

Protecting change data on pause

Monitoring protected timestamp records

Defining Kafka message acknowledgment

Choosing changefeed sinks

Defining schema change behavior

Tuning for high throughput

Setting the `resolved` option

Batching and buffering messages

Configuring file and message format

Compression

File size

Configuring for tables with many ranges

Adjusting concurrent changefeed work

See also

​Changefeed performance

​Mux rangefeeds

​Latency in changefeeds

​kv.closed_timestamp.target_duration

​kv.rangefeed.closed_timestamp_refresh_interval

​kv.closed_timestamp.side_transport_interval

​kv.rangefeed.closed_timestamp_smear_interval

​Lagging ranges

​Tuning for high durability delivery

​Pausing changefeeds and garbage collection

​Protecting change data on pause

​Monitoring protected timestamp records

​Defining Kafka message acknowledgment

​Choosing changefeed sinks

​Defining schema change behavior

​Tuning for high throughput

​Setting the resolved option

​Batching and buffering messages

​Configuring file and message format

​Compression

​File size

​Configuring for tables with many ranges

​Adjusting concurrent changefeed work

​See also

Changefeed performance

Mux rangefeeds

Latency in changefeeds

`kv.closed_timestamp.target_duration`

`kv.rangefeed.closed_timestamp_refresh_interval`

`kv.closed_timestamp.side_transport_interval`

`kv.rangefeed.closed_timestamp_smear_interval`

Lagging ranges

Tuning for high durability delivery

Pausing changefeeds and garbage collection

Protecting change data on pause

Monitoring protected timestamp records

Defining Kafka message acknowledgment

Choosing changefeed sinks

Defining schema change behavior

Tuning for high throughput

Setting the `resolved` option

Batching and buffering messages

Configuring file and message format

Compression

File size

Configuring for tables with many ranges

Adjusting concurrent changefeed work

See also