Monitor and Debug Changefeeds

Changefeeds work as jobs in CockroachDB, which allows for monitoring and debugging through the DB console's Jobs page and SHOW JOBS SQL statements using the job ID.

Monitor a changefeed

Note:

Monitoring is only available for Enterprise changefeeds.

Changefeed progress is exposed as a high-water timestamp that advances as the changefeed progresses. This is a guarantee that all changes before or at the timestamp have been emitted. You can monitor a changefeed:

  • On the Changefeed Dashboard of the DB Console.
  • On the Jobs page of the DB Console. Hover over the high-water timestamp to view the system time.
  • Using SHOW CHANGEFEED JOB <job_id>:

    icon/buttons/copy
    SHOW CHANGEFEED JOB 383870400694353921;
    
            job_id       |  job_type  |                              description                              | ... |      high_water_timestamp      | ... |
    +--------------------+------------+-----------------------------------------------------------------------+ ... +--------------------------------+ ... +
      383870400694353921 | CHANGEFEED | CREATE CHANGEFEED FOR TABLE office_dogs INTO 'kafka://localhost:9092' | ... | 1537279405671006870.0000000000 | ... |
    (1 row)
    
  • Setting up an alert on the changefeed.max_behind_nanos metric to track when a changefeed's high-water mark timestamp is at risk of falling behind the cluster's garbage collection window. For more information, see Monitoring and Alerting.

Note:

You can use the high-water timestamp to start a new changefeed where another ended.

Using changefeed metrics labels

Warning:

This is an experimental feature. The interface and output are subject to change.

Note:

An Enterprise license is required to use metrics labels in changefeeds.

To measure metrics per changefeed, define a "metrics label" to which one or multiple changefeed(s) will increment each changefeed metric. Metrics label information is sent with time-series metrics to http://{host}:{http-port}/_status/vars, viewable via the Prometheus endpoint. An aggregated metric of all changefeeds is also measured.

It is necessary to consider the following when applying metrics labels to changefeeds:

  • Metrics labels are not available in CockroachDB Cloud.
  • The COCKROACH_EXPERIMENTAL_ENABLE_PER_CHANGEFEED_METRICS environment variable must be specified to use this feature.
  • The server.child_metrics.enabled cluster setting must be set to true before using the metrics_label option.
  • Metrics label information is sent to the _status/vars endpoint, but will not show up in debug.zip or the DB Console.
  • Introducing labels to isolate a changefeed's metrics can increase cardinality significantly. There is a limit of 1024 unique labels in place to prevent cardinality explosion. That is, when labels are applied to high-cardinality data (data with a higher number of unique values), each changefeed with a label then results in more metrics data to multiply together, which will grow over time. This will have an impact on performance as the metric-series data per changefeed quickly populates against its label.
  • The maximum length of a metrics label is 128 bytes.

To start a changefeed with a metrics label, set the following cluster setting to true:

icon/buttons/copy
SET CLUSTER SETTING server.child_metrics.enabled=true;

Create the changefeed, passing the metrics_label option with the label name as its value:

icon/buttons/copy
CREATE CHANGEFEED FOR TABLE movr.rides INTO 'kafka://host:port' WITH metrics_label=rides;
icon/buttons/copy
CREATE CHANGEFEED FOR TABLE movr.vehicles INTO 'kafka://host:port' WITH metrics_label=vehicles;

Multiple changefeeds can be added to a label:

icon/buttons/copy
CREATE CHANGEFEED FOR TABLE movr.vehicle_location_histories INTO 'kafka://host:port' WITH metrics_label=vehicles;

http://{host}:{http-port}/_status/vars shows the defined changefeed(s) by label and the aggregated metric for all changefeeds. This output also shows the default scope, which will include changefeeds started without a metrics label:

changefeed_running 4
changefeed_running{scope="default"} 1
changefeed_running{scope="rides"} 1
changefeed_running{scope="vehicles"} 2
changefeed_emitted_messages 4144
changefeed_emitted_messages{scope="default"} 0
changefeed_emitted_messages{scope="rides"} 2772
changefeed_emitted_messages{scope="vehicles"} 1372
changefeed_emitted_bytes 781591
changefeed_emitted_bytes{scope="default"} 0
changefeed_emitted_bytes{scope="rides"} 598034
changefeed_emitted_bytes{scope="vehicles"} 183557

Metrics

Metric Description Unit
changefeed_running Number of currently running changefeeds, including sinkless changefeeds. Changefeeds
emitted_messages Number of messages emitted, which increments when messages are flushed. Messages
emitted_bytes Number of bytes emitted, which increments as messages are flushed. Bytes
flushed_bytes Bytes emitted by all changefeeds. This may differ from emitted_bytes when compression is enabled. Bytes
changefeed_flushes Total number of flushes for a changefeed. Flushes
emit_latency Difference between the event's MVCC timestamp and the time the event was emitted by CockroachDB. Nanoseconds
admit_latency Difference between the event's MVCC timestamp and the time the event is put into the memory buffer. Nanoseconds
commit_latency Difference between the event's MVCC timestamp and the time it is acknowledged by the downstream sink. If the sink is batching events, then the difference is between the oldest event and when the acknowledgment is recorded. Nanoseconds
backfill_count Number of changefeeds currently executing a backfill (schema change or initial scan). Changefeeds
sink_batch_hist_nanos Time messages spend batched in the sink buffer before being flushed and acknowledged. Nanoseconds
flush_hist_nanos Time spent flushing messages across all changefeeds. Nanoseconds
checkpoint_hist_nanos Time spent checkpointing changefeed progress. Nanoseconds
error_retries Total retryable errors encountered by changefeeds. Errors
backfill_pending_ranges Number of ranges in an ongoing backfill that are yet to be fully emitted. Ranges
message_size_hist Distribution in the size of emitted messages. Bytes

Debug a changefeed

Using logs

For Enterprise changefeeds, use log information to debug connection issues (i.e., kafka: client has run out of available brokers to talk to (Is your cluster reachable?)). Debug by looking for lines in the logs with [kafka-producer] in them:

I190312 18:56:53.535646 585 vendor/github.com/Shopify/sarama/client.go:123  [kafka-producer] Initializing new client
I190312 18:56:53.535714 585 vendor/github.com/Shopify/sarama/client.go:724  [kafka-producer] client/metadata fetching metadata for all topics from broker localhost:9092
I190312 18:56:53.536730 569 vendor/github.com/Shopify/sarama/broker.go:148  [kafka-producer] Connected to broker at localhost:9092 (unregistered)
I190312 18:56:53.537661 585 vendor/github.com/Shopify/sarama/client.go:500  [kafka-producer] client/brokers registered new broker #0 at 172.16.94.87:9092
I190312 18:56:53.537686 585 vendor/github.com/Shopify/sarama/client.go:170  [kafka-producer] Successfully initialized new client

Using SHOW CHANGEFEED JOBS

New in v21.2: For Enterprise changefeeds, use SHOW CHANGEFEED JOBS to check the status of your changefeed jobs:

icon/buttons/copy
> SHOW CHANGEFEED JOBS;
job_id               |                                                                                   description                                                                  | user_name | status  |              running_status              |          created           |          started           | finished |          modified          |      high_water_timestamp      | error |         sink_uri       |      full_table_names      | format
---------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+---------+------------------------------------------+----------------------------+----------------------------+----------+----------------------------+--------------------------------+-------+------------------------+----------------------------+---------
685724608744325121   | CREATE CHANGEFEED FOR TABLE mytable INTO 'kafka://localhost:9092' WITH confluent_schema_registry = 'http://localhost:8081', format = 'avro', resolved, updated | root      | running | running: resolved=1629336943.183631090,0 | 2021-08-19 01:35:43.19592  | 2021-08-19 01:35:43.225445 | NULL     | 2021-08-19 01:35:43.252318 | 1629336943183631090.0000000000 |       | kafka://localhost:9092 | {defaultdb.public.mytable} | avro
685723987509116929   | CREATE CHANGEFEED FOR TABLE mytable INTO 'kafka://localhost:9092' WITH confluent_schema_registry = 'http://localhost:8081', format = 'avro', resolved, updated | root      | paused  | NULL                                     | 2021-08-19 01:32:33.609989 | 2021-08-19 01:32:33.64293  | NULL     | 2021-08-19 01:35:44.224961 | NULL                           |       | kafka://localhost:9092 | {defaultdb.public.mytable} | avro
(2 rows)

For more information, see SHOW JOBS.

Using the DB Console

On the Custom Chart debug page of the DB Console:

  1. To add a chart, click Add Chart.
  2. Select changefeed.error_retries from the Metric Name dropdown menu.

    A graph of changefeed restarts due to retryable errors will display.

See also


Yes No