Change data capture: Fine-tuning changefeeds for performance and durability

NOTE: This blog requires a fairly in-depth understanding of your application and changefeeds. If you want to learn more about change data capture at Cockroach Labs, start with our intro blogs (What is change data capture? and When and why to use change data capture) and our docs for streaming data out of CockroachDB with CDC.

Disclaimer: While this blog covers the impact of configuration changes at a high level, you should always test on your workload to assess the impact of configuration changes.

Whether you are streaming to an analytics platform for business intelligence or building event-driven services, CockroachDB’s change data capture (CDC) capabilities are powerful and adaptable to your application needs. But how can you leverage your changefeed setup to get the performance that best fits your application?

Most changefeed settings & cluster settings outlined here will have some tradeoff associated with them. Going in, you should have an understanding of what you are targeting and compromises you are willing to make. We recommend testing your configuration under your workloads!

To get started, there are a few questions to ask yourself:

Do you care more about end-to-end latency or resiliency in the face of difficult situations?
What about latency or resiliency in comparison to throughput of messages?
What will your changefeed traffic look like?
How will it change over time?
Are there particular performance requirements for your system?

Does Change Data Capture affect performance?

Using change data capture can impact the performance of a CockroachDB cluster. For example, enabling rangefeeds will incur a 5-10% performance cost, on average.

However, the highly configurable nature of changefeeds allows you to balance additional performance costs (CPU, disk usage, SQL latency, etc) to achieve the desired changefeed behavior. This blog outlines various tradeoffs you can make.

How does a changefeed work?

Changefeeds are CockroachDB’s distributed change data capture mechanism. At a high level, a changefeed works like this:

The changefeed runs on every node. When you start a changefeed, the node you start the changefeed on becomes the aggregator node, taking on administrative overhead for the changefeed.
As rows are updated/added/deleted in the watched table(s), the changefeed sends messages from a node to a sink endpoint (Kafka, in the diagram example below).
Each node sends back checkpointing information to the aggregator node, which updates the high watermark timestamp. The highwater mark acts as a checkpoint for the changefeed, and guarantees that all changes before (or at) the timestamp have been emitted. If restarted, the changefeed will send duplicate messages from the (current time - high watermark) interval.
If you are using the resolved setting, the aggregator node sends resolved messages to each endpoint in the sink.

Tuning CDC for durability in the face of disaster

Murphy’s law happens. Whether it be network hiccups, sinks going down, or sudden spikes in traffic, you should be thinking about how you want your changefeed to behave when imperfect conditions arise…Because they inevitably will.

If your priority is durability — meaning that during these situations you want your changefeed to stay active, and/or your application cannot tolerate any missed messages — consider these options:

RequiredAcks (Kafka only): Kafka allows us to configure how many of their brokers need to have committed a message for us to consider it committed, and thus consider the message to have been safely delivered and acknowledged by the sink. By default, only one broker needs to have committed the message. For a higher level of durability, consider setting this to ALL brokers.
*Tradeoff: An increase in round trip latency. Since every broker must replicate and commit the message to its logs, the total round trip latency of changefeed messages will increase.*
Gc.ttl: This setting controls how long data lives before it is garbage collected. For changefeeds, this is the “window” we have to recover during a situation like a sink going offline. After the gc time has passed, you are in danger of missing messages, and the changefeed will by default go into a failure state if we are unable to advance the changefeed past the gc.ttl window. Additionally, if the changefeed job fails for some reason, a new changefeed can be started from where the failed changefeed left off if it is within this window of time.
*Tradeoff: The impact of the gc.ttl is highly dependent on your workload. If you have an append-only table, there is very little garbage. However, more garbage increases disk usage since CockroachDB cannot reclaim space by deleting data outside of the ttl window. More versions leads to more data to search through, which can impact SQL latency as well. We recommend exercising caution when adjusting this setting.*
Protect on Pause and Pause on Error: We give you the ability to circumvent the garbage collector with the protect_on_pause option. We will protect table data from the garbage collector while the changefeed is paused. If combined with the pause_on_error option, the changefeed will automatically pause in situations where it would normally enter into a terminal failure state. This one-two punch works best when combined with the right monitoring and alerting for your changefeed.
*Tradeoff: This configuration has the same tradeoffs as increasing the gc.ttl, but with considerably less impact since you only accumulate protected data when changefeeds are paused. Alerting for paused changefeeds is HIGHLY recommended when using these options, as protected data can grow unbounded as long as the changefeed is paused.*
Monitoring and alerting: Tuning for durability necessitates setting up changefeed monitoring. We recommend alerting on: changefeed pauses or failures, changefeeds falling behind, and changefeed retries. For more information, check out our docs.
*Tradeoff: No tradeoffs here, this is something you absolutely should do!*

In addition to these four main options, there are some additional durability considerations for setting up a changefeed:

Webhook/Cloud Storage HTTP sinks: Make sure your ingestion protocol is robust. These endpoints are flexible, but they do not have built-in RequiredAcks settings like Kafka.
Schema change policy: If your downstream application cannot handle ingesting a new schema, consider setting: schema_change_policy=stop to prevent a schema change from causing a breaking change downstream. This can also be paired with pause_on_error.
Ingestion limits: Make sure the downstream sink is configured to properly ingest your workload. Kafka for example, is often configured with per-message size limits.
Tune for Scale: See below…Be sure to configure your changefeed for your traffic!

Tuning CDC for scale and performance

As the data flowing through your changefeed scales up, there are many ways to keep your changefeed pipeline healthy and performant. Changefeed health mainly comes down to whether our egress of messages can keep pace with the ingress of messages into the changefeed pipeline.

Know thyself by asking these questions at the start:

How many writes per second do you expect on your table?
Will it be steady or with large spikes in traffic?
How do you expect that traffic to grow over time?

Ideally you will have already have an idea of the answers to these questions when creating your changefeed.

You should be thinking about these questions from day one, but a general rule of thumb is once your underlying table reaches 1TB of data or 10K writes per second, you should consider tuning these configuration options:

Batching: Batching allows us to increase our throughput of messages to the sink, and thus increase our egress of messages out of Cockroach (Cloud storage, Kafka, Webhook). We recommend batching for high-throughput workloads.
*Tradeoff: For spiky traffic, batching can increase average per-message latency.*
Resolved timestamps: Resolved timestamps periodically send the high-watermark of the changefeed to downstream sinks. If you are using resolved timestamps, they are by default sent every time we advance the changefeed high watermark. Every high water mark advance necessitates flushing to the sinks, which negatively impacts throughput. If you’re configuring for high throughput, consider not using resolved or setting resolved to a higher interval.
*Tradeoff: If you are using resolved messages for deduplication or reordering downstream, longer resolved intervals will increase the amount of time messages need to be buffered.*
Memory budget (changefeed.memory.per_changefeed_limit): This cluster setting gives our internal memory buffers more room to work with. This can make the changefeed more durable during times of traffic spikes or when the sink is unreachable.
*Tradeoff: This could potentially lead to more changefeed memory usage. Additionally, we globally monitor all changefeed memory usage to prevent unbounded memory usage by changefeeds. More memory usage per changefeeds could lead to us running up our overall budget, which will cause the changefeed to move into a failure state.*
Memory pushback: In CockroachDB versions 21.2+, we introduced memory pushback which prevents our memory buffers from running out of space, making the memory budget setting obsolete. In 21.1, partial pushback is implemented and recommended to turn on using the changefeed.mem.pushback_enabled setting.
Message format: Avro is a more compact and efficient message format than json. Consider using avro format if supported for your sink (avro is not supported on webhook sinks).
*Tradeoff: This may increase the complexity of your application, as you may need to use a schema registry for deserialization.*
Compression: If you are using json as your message format, gzip compression may lower impact on the network, especially when batching messages.
Scan request parallelism (changefeed.backfill.concurrent_scan_requests): Scanning the table can be an expensive operation, whether it be the initial scan of the table or catch-up scans after periods of sink-instability, and with a high amount of underlying data it is important to quickly catch up to the current time. Increasing scan request parallelism increases the efficiency of scans.
*Tradeoff: Higher CPU usage.*
Time based iterator (kv.rangefeed.catchup_scan_iterator_optimization.enabled): Using this optimization will minimize the impact to SQL statement latency when starting a new changefeed.
*Tradeoff: No tradeoff here.*
Experimental_poll_interval: In the normal operations case, as soon as a new value is committed, the max time until the change is output is bounded by the cluster setting changefeed.experimental_poll_interval plus a few milliseconds for bookkeeping. This is by default set to 1s. Increasing this value will make changefeed operations more efficient at scale.
*Tradeoff: increasing will increase the average per-message latency of changefeed messages. On the other hand, we do not recommend setting this below 100ms. Decreasing this value too low (currently) leads to an increase in the frequency of “descriptor lookups” which may also have a negative impact on per-message latency. We plan on addressing this limitation in the future.*
Consequently, emission latency in the expected (best) case would look like:
Time to write operation without CDC (quorum write time + follower write time IF it’s not part of the quorum write) + changefeed.experimental_poll_interval
Note: This is an experimental feature. The interface may be subject to change, and there may be bugs.
Number of changefeeds: Consider using a changefeed per table instead of grouping tables under one changefeed if you have some tables with very heavy traffic.
*Tradeoff: Each additional changefeed occurs overhead costs (CPU usage).*

As you can see, there are many different ways to assess and balance the tradeoffs required to fine-tune both durability and performance for your applications. Knowing the costs and benefits of various priorities for your application — must never go down vs. can tolerate some missed messages, for example — helps you go in with the understanding of what you are targeting and compromises you are willing to make. Being informed before fine-tuning change data capture for an exact fit with your application’s needs is the way to keep your changefeed pipeline healthy and performant.