Operational FAQs

Cockroach Labs recommends against using the --background flag when starting a cluster. In production, operators usually use a process manager like systemd to start and manage the cockroach process on each node. Refer to . When testing locally, starting nodes in the foreground is recommended so you can monitor the runtime closely.If you do use --background, you should also set --pid-file. To stop or restart a cluster, send SIGTERM or SIGHUP signal to the process ID in the PID file.

Check whether you have previously run a multi-node cluster using the same data directory. If you have not, refer to . If you have previously started and stopped a multi-node cluster, and are now trying to bring it back up, note the following: The flag of causes the start command to wait until the node has fully initialized and is able to start serving queries. In addition, to keep your data consistent, CockroachDB waits until a majority of nodes are running. This means that if only one node of a three-node cluster is running, that one node will not be operational. As a result, starting nodes with the --background flag will cause cockroach start to hang until a majority of nodes are fully initialized. To restart your cluster, you should either:

Use multiple terminal windows to start multiple nodes in the foreground.
Start each node in the background using your shell’s functionality (e.g., cockroach start & ) instead of using the --background flag.

Like most databases, CockroachDB caches the most recently accessed data in memory so that it can provide faster reads, and its periodic writes of time-series data cause that cache size to increase until it hits its configured limit. For information about manually controlling the cache size, see . By default, stores time-series cluster metrics within the cluster. By default, data is retained at 10-second granularity for 10 days, and at 30-minute granularity for 90 days. An automatic job periodically runs and prunes historical data. For the first several days of your cluster’s life, the cluster’s time-series data grows continually. CockroachDB writes about 15 KiB per second per node to the time-series database. About half of that is optimized away by the storage engine. Therefore an estimated calculation of how much data will be stored in the time-series database is: 8 KiB * 24 hours * 3600 seconds/hour * number of days For the first 10 days of your cluster’s life, you can expect storage per node to increase by about the following amount: 8 * 24 * 3600 * 10 = 6912000 or about 6 GiB. With on-disk compression, the actual disk usage is likely to be about 4 GiB. However, depending on your usage of time-series charts in the , you may prefer to reduce the amount of disk used by time-series data. To reduce the amount of time-series data stored, refer to Can I reduce the storage of time-series data? There are several reasons why disk usage may not decrease right after deleting data:

The data could be preserved for MVCC history.
The data could be in the process of being compacted.

For instructions on how to free up disk space as quickly as possible after dropping a table, see

The data could be preserved for MVCC history

CockroachDB implements , which means that it maintains a history of all mutations to a row. This history is used for a wide range of functionality: , historical queries, , , , and so on. The requirement to preserve history means that CockroachDB “soft deletes” data: The data is marked as deleted by a tombstone record so that CockroachDB will no longer surface the deleted rows to queries, but the old data is still present on disk. The length of history preserved by MVCC is determined by two things: the of the zone that contains the data, and whether any exist. You can check the range’s statistics to observe the key_bytes, value_bytes, and live_bytes. The live_bytes metric reflects data that’s not garbage. The value of (key_bytes + value_bytes) - live_bytes will tell you how much MVCC garbage is resident within a range. This information can be accessed in the following ways:

Using the SQL statement, which lists the above values under the names live_bytes, key_bytes, and val_bytes.
In the DB Console, under , click the Range Status link, which takes you to a page where the values are displayed in a tabular format like the following: MVCC Live Bytes/Count | 2.5 KiB / 62 count.

When data has been deleted for at least the duration specified by , CockroachDB will consider it eligible for ‘garbage collection’. Asynchronously, CockroachDB will perform garbage collection of ranges that contain significant quantities of garbage. Note that if there are backups or other processes that haven’t completed yet but require the data, these processes may prevent the garbage collection of that data by setting a protected timestamp until these processes have completed. For more information about how MVCC works, see .

The data could be in the process of being compacted

When MVCC garbage is deleted by garbage collection, the data is still not yet physically removed from the filesystem by the . Removing data from the filesystem requires rewriting the files containing the data using a process also known as , which can be expensive. The storage engine has heuristics to compact data and remove deleted rows when enough garbage has accumulated to warrant a compaction. It strives to always restrict the overhead of obsolete data (called the space amplification) to at most 10%. If a lot of data was just deleted, it may take the storage engine some time to compact the files and restore this property. For instructions on how to free up disk space as quickly as possible after dropping a table, see If you’ve noticed that your disk space is not freeing up quickly enough after dropping a table, you can take the following steps to free up disk space more quickly the next time you drop a table. This example assumes a table t exists.

The procedure shown here only works if you get the range IDs from the table before running . If you are in an emergency situation due to running out of disk, see What happens when a node runs out of disk space?

Lower the to 10 minutes.

ALTER TABLE t CONFIGURE ZONE USING gc.ttlseconds = 600;

Find the IDs of the storing the table data using :

SELECT range_id FROM [SHOW RANGES FROM TABLE t];

  range_id
------------
        68
        69
        70
        ...

Drop the table using :

DROP TABLE t;

Visit the and click the link Run a range through an internal queue to visit the Manually enqueue range in a replica queue page. On this page, select mvccGC from the Queue dropdown and enter each range ID from the previous step. Check the SkipShouldQueue checkbox to speed up the MVCC process.
Monitor GC progress in the DB Console by watching the and the overall disk space used as shown on the .

When a query is executed, a process records query execution statistics on system tables. This is done by recording . The CockroachDB internal-delete-old-sql-stats process cleans up query execution statistics collected on system tables, including system.statement_statistics and system.transaction_statistics. These system tables have a default row limit of 1 million, set by the sql.stats.persisted_rows.max . When this limit is exceeded, there is an hourly cleanup job that deletes all of the data that surpasses the row limit, starting with the oldest data first. For more information about the cleanup job, use the following query:

> SELECT * FROM crdb_internal.jobs WHERE job_type='AUTO SQL STATS COMPACTION';

In general, the internal-delete-old-sql-stats process is not expected to impact cluster performance. There are a few cases where there has been a spike in CPU due to an incredibly large amount of data being processed; however, those cases were resolved through and general improvements over time. Yes, you can reduce the interval for time-series storage. After reducing time-series storage, it can take up to 24 hours for time-series data to be deleted and for the change to be reflected in DB Console metrics.

Reduce the interval for time-series storage

To reduce the interval for storage of time-series data:

For data stored at 10-second resolution, reduce the timeseries.storage.resolution_10s.ttl cluster setting to an value less than 240h0m0s (10 days).

For example, to change the storage interval for time-series data at 10s resolution to 5 days, run the following command:

  > SET CLUSTER SETTING timeseries.storage.resolution_10s.ttl = '120h0m0s';

  > SHOW CLUSTER SETTING timeseries.storage.resolution_10s.ttl;

    timeseries.storage.resolution_10s.ttl
  +---------------------------------------+
    120:00:00
  (1 row)

This setting has no effect on time-series data aggregated at 30-minute resolution, which is stored for 90 days by default.

For data stored at 30-minute resolution, reduce the timeseries.storage.resolution_30m.ttl cluster setting to an value less than 2160h0m0s (90 days).

Cockroach Labs recommends that you avoid increasing the period of time that DB Console retains time-series metrics. If you need to retain this data for a longer period, consider using a third-party tool such as Prometheus to collect and store metrics. Refer to .

Disable time-series storage

Disabling time-series storage is recommended only if you exclusively use a third-party tool such as for time-series monitoring. Prometheus and other such tools do not rely on CockroachDB-stored time-series data; instead, they ingest metrics exported by CockroachDB from memory and then store the data themselves. When storage of time-series metrics is disabled, the DB Console Metrics dashboards in the DB Console are still available, but their visualizations are blank. This is because the dashboards rely on data that is no longer available. To disable the storage of time-series data, run the following command:

> SET CLUSTER SETTING timeseries.storage.enabled = false;

> SHOW CLUSTER SETTING timeseries.storage.enabled;

  timeseries.storage.enabled
+----------------------------+
            false
(1 row)

This setting only prevents the collection of new time-series data. To also delete all existing time-series data, also change both the timeseries.storage.resolution_10s.ttl and timeseries.storage.resolution_30m.ttl cluster settings:

> SET CLUSTER SETTING timeseries.storage.resolution_10s.ttl = '0s';

> SET CLUSTER SETTING timeseries.storage.resolution_30m.ttl = '0s';

Historical data is not deleted immediately, but is eventually removed by a background job within 24 hours. When a node runs out of disk space, it shuts down and cannot be restarted until space is freed up. To prepare for this case, CockroachDB in each node’s storage directory that can be deleted to free up enough space to be able to restart the node. For more information about troubleshooting disk usage issues, see .

In addition to using ballast files, it is important to actively .

For instructions on how to free up disk space as quickly as possible after dropping a table, see If queries operate on different data, then increasing the number of nodes should improve the overall throughput (transactions/second or QPS). However, if your queries operate on the same data, you may be observing transaction contention. For details, see . Cockroach Labs collects information about CockroachDB’s real-world usage to help prioritize the development of product features. We choose our default as “opt-in” to strengthen the information collected, and are careful to send only anonymous, aggregate usage statistics. For details on what information is collected and how to opt out, see . CockroachDB requires moderate levels of clock synchronization to preserve data consistency. For this reason, when a node detects that its clock is out of sync with at least half of the other nodes in the cluster by 80% of the maximum offset allowed, it spontaneously shuts down. This offset defaults to 500ms but can be changed via the flag when starting each node. While serializable consistency is maintained regardless of clock skew, skew outside the configured clock offset bounds can result in violations of single-key linearizability between causally dependent transactions. It’s therefore important to prevent clocks from drifting too far by running NTP or other clock synchronization software on each node. In very rare cases, CockroachDB can momentarily run with a stale clock. This can happen when using vMotion, which can suspend a VM running CockroachDB, migrate it to different hardware, and resume it. This will cause CockroachDB to be out of sync for a short period before it jumps to the correct time. During this window, it would be possible for a client to read stale data and write data derived from stale reads. By enabling the server.clock.forward_jump_check_enabled , you can be alerted when the CockroachDB clock jumps forward, indicating it had been running with a stale clock. To protect against this on vMotion, however, use the flag to specify a PTP hardware clock for CockroachDB to use when querying the current time. When doing so, you should not enable server.clock.forward_jump_check_enabled because forward jumps will be expected and harmless. For more information on how --clock-device interacts with vMotion, see this blog post.

In CockroachDB versions prior to v22.2.13, and in v23.1 versions prior to v23.1.9, the flag had a bug that could cause it to generate timestamps in the far future. This could cause nodes to crash due to incorrect timestamps, or in the worst case irreversibly advance the cluster’s HLC clock into the far future. This bug is fixed in CockroachDB v23.2.

Considerations

When setting up clock synchronization:

All nodes in the cluster must be synced to the same time source, or to different sources that implement leap second smearing in the same way. For example, Google and Amazon have time sources that are compatible with each other (they implement leap second smearing in the same way), but are incompatible with the default NTP pool (which does not implement leap second smearing).
For nodes running in AWS, we recommend Amazon Time Sync Service. For nodes running in GCP, we recommend Google’s internal NTP service. For nodes running elsewhere, we recommend Google Public NTP. Note that the Google and Amazon time services can be mixed with each other, but they cannot be mixed with other time services (unless you have verified leap second behavior). Either all of your nodes should use the Google and Amazon services, or none of them should.
If you do not want to use the Google or Amazon time sources, you can use chrony and enable client-side leap smearing, unless the time source you’re using already does server-side smearing. In most cases, we recommend the Google Public NTP time source because it handles smearing the leap second. If you use a different NTP time source that doesn’t smear the leap second, you must configure client-side smearing manually and do so in the same way on each machine.
Do not run more than one clock sync service on VMs where cockroach is running.
For new clusters using the , Cockroach Labs recommends lowering the setting to 250ms. This setting is especially helpful for lowering the write latency of . Nodes can run with different values for --max-offset, but only for the purpose of updating the setting across the cluster using a rolling upgrade.

Tutorials

For guidance on synchronizing clocks, see the tutorial for your deployment environment:

Environment	Featured Approach
	Use NTP with Google’s external NTP service.
	Use the Amazon Time Sync Service.
	Disable Hyper-V time synchronization and use NTP with Google’s external NTP service.
	Use NTP with Google’s external NTP service.
	Use NTP with Google’s internal NTP service.

As explained in more detail , each CockroachDB node exports a wide variety of metrics at http://<host:<http-port/_status/vars in the format used by the popular Prometheus timeseries database. Two of these metrics export how close each node’s clock is to the clock of all other nodes:

Metric	Definition
`clock\_offset\_meannanos`	The mean difference between the node’s clock and other nodes’ clocks in nanoseconds
`clock\_offset\_stddevnanos`	The standard deviation of the difference between the node’s clock and other nodes’ clocks in nanoseconds

As described in the above answer, a node will shut down if the mean offset of its clock from the other nodes’ clocks exceeds 80% of the maximum offset allowed. It’s recommended to monitor the clock_offset_meannanos metric and alert if it’s approaching the 80% threshold of your cluster’s configured max offset. You can also see these metrics in on the DB Console. Perform a to temporarily stop a node that you plan to restart.

Architecture

Cockroach Commands

Logs

Metrics

Policies

Third-Party Support

Security

System Catalogs

FAQs

The data could be preserved for MVCC history

The data could be in the process of being compacted

Reduce the interval for time-series storage

Disable time-series storage

Considerations

Tutorials

See also

​The data could be preserved for MVCC history

​The data could be in the process of being compacted

​Reduce the interval for time-series storage

​Disable time-series storage

​Considerations

​Tutorials

​See also

The data could be preserved for MVCC history

The data could be in the process of being compacted

Reduce the interval for time-series storage

Disable time-series storage

Considerations

Tutorials

See also