Why is my process hanging when I try to start nodes with the
Check whether you have previously run a multi-node cluster using the same data directory. If you have not, refer to Troubleshoot Cluster Setup.
If you have previously started and stopped a multi-node cluster, and are now trying to bring it back up, note the following:
--background flag of
cockroach start causes the
start command to wait until the node has fully initialized and is able to start serving queries. In addition, to keep your data consistent, CockroachDB waits until a majority of nodes are running. This means that if only one node of a three-node cluster is running, that one node will not be operational.
As a result, starting nodes with the
--background flag will cause
cockroach start to hang until a majority of nodes are fully initialized.
To restart your cluster, you should either:
- Use multiple terminals to start multiple nodes at once.
- Start each node in the background using your shell's functionality (e.g.,
cockroach start &) instead of using the
Why is memory usage increasing despite lack of traffic?
Like most databases, CockroachDB caches the most recently accessed data in memory so that it can provide faster reads, and its periodic writes of time-series data cause that cache size to increase until it hits its configured limit. For information about manually controlling the cache size, see Recommended Production Settings.
Why is disk usage increasing despite lack of writes?
The time-series data used in the DB Console is stored within the cluster and accumulates for 10 days before it starts to be truncated. As a result, for the first 10 days or so of a cluster's life, you will see a steady increase in disk usage and the number of ranges even if you are not writing data to the cluster.
Can I reduce or disable the storage of time-series data?
Yes, you can either reduce the interval for time-series storage or disable time-series storage entirely.
After reducing or disabling time-series storage, it can take up to 24 hours for time-series data to be deleted and for the change to be reflected in DB Console metrics.
Reduce the interval for time-series storage
By default, CockroachDB stores time-series data at 10s resolution for 10 days. This data is aggregated into time-series data at 30m resolution, which is stored for 90 days.
To reduce the interval for storage of time-series data:
- For data stored at 10s resolution, change the
timeseries.storage.resolution_10s.ttlcluster setting to an
INTERVALvalue less than
For example, to change the storage interval for time-series data at 10s resolution to 5 days, run the following
SET CLUSTER SETTING command:
> SET CLUSTER SETTING timeseries.storage.resolution_10s.ttl = '120h0m0s';
> SHOW CLUSTER SETTING timeseries.storage.resolution_10s.ttl;
timeseries.storage.resolution_10s.ttl +---------------------------------------+ 120:00:00 (1 row)
Note that this data is still aggregated into data at 30m resolution, which is stored for 90 days by default.
- For data stored at 30m resolution, change the
timeseries.storage.resolution_30m.ttlcluster setting to an
INTERVALvalue less than
Disable time-series storage entirely
Disabling time-series storage is recommended only if you exclusively use a third-party tool such as Prometheus for time-series monitoring. Prometheus and other such tools do not rely on CockroachDB-stored time-series data; instead, they ingest metrics exported by CockroachDB from memory and then store the data themselves.
To disable the storage of time-series data entirely, run the following command:
> SET CLUSTER SETTING timeseries.storage.enabled = false;
> SHOW CLUSTER SETTING timeseries.storage.enabled;
timeseries.storage.enabled +----------------------------+ false (1 row)
If you want all existing time-series data to be deleted, also change both the
timeseries.storage.resolution_30m.ttl cluster settings:
> SET CLUSTER SETTING timeseries.storage.resolution_10s.ttl = '0s';
> SET CLUSTER SETTING timeseries.storage.resolution_30m.ttl = '0s';
What happens when a node runs out of disk space?
When a node runs out of disk space, it shuts down and cannot be restarted until space is freed up. To prepare for this case, place a ballast file in each node's storage directory that can be deleted to free up enough space to be able to restart the node. If you did not create a ballast file, look for other files that can be deleted, such as log files.
In addition to using ballast files, it is important to actively monitor remaining disk space.
Why would increasing the number of nodes not result in more operations per second?
If queries operate on different data, then increasing the number of nodes should improve the overall throughput (transactions/second or QPS).
However, if your queries operate on the same data, you may be observing transaction contention. For details, see SQL Performance Best Practices.
Why does CockroachDB collect anonymized cluster usage details by default?
Cockroach Labs collects information about CockroachDB's real-world usage to help prioritize the development of product features. We choose our default as "opt-in" to strengthen the information collected, and are careful to send only anonymous, aggregate usage statistics. For details on what information is collected and how to opt out, see Diagnostics Reporting.
What happens when node clocks are not properly synchronized?
CockroachDB requires moderate levels of clock synchronization to preserve data consistency. For this reason, when a node detects that its clock is out of sync with at least half of the other nodes in the cluster by 80% of the maximum offset allowed, it spontaneously shuts down. This offset defaults to 500ms but can be changed via the
--max-offset flag when starting each node.
While serializable consistency is maintained regardless of clock skew, skew outside the configured clock offset bounds can result in violations of single-key linearizability between causally dependent transactions. It's therefore important to prevent clocks from drifting too far by running NTP or other clock synchronization software on each node.
In very rare cases, CockroachDB can momentarily run with a stale clock. This can happen when using vMotion, which can suspend a VM running CockroachDB, migrate it to different hardware, and resume it. This will cause CockroachDB to be out of sync for a short period before it jumps to the correct time. During this window, it would be possible for a client to read stale data and write data derived from stale reads. By enabling the
server.clock.forward_jump_check_enabled cluster setting, you can be alerted when the CockroachDB clock jumps forward, indicating it had been running with a stale clock. To protect against this on vMotion, however, use the
--clock-device flag to specify a PTP hardware clock for CockroachDB to use when querying the current time. When doing so, you should not enable
server.clock.forward_jump_check_enabled because forward jumps will be expected and harmless. For more information on how
--clock-device interacts with vMotion, see this blog post.
When setting up clock synchronization:
- All nodes in the cluster must be synced to the same time source, or to different sources that implement leap second smearing in the same way. For example, Google and Amazon have time sources that are compatible with each other (they implement leap second smearing in the same way), but are incompatible with the default NTP pool (which does not implement leap second smearing).
- For nodes running in AWS, we recommend Amazon Time Sync Service. For nodes running in GCP, we recommend Google's internal NTP service. For nodes running elsewhere, we recommend Google Public NTP. Note that the Google and Amazon time services can be mixed with each other, but they cannot be mixed with other time services (unless you have verified leap second behavior). Either all of your nodes should use the Google and Amazon services, or none of them should.
- If you do not want to use the Google or Amazon time sources, you can use
chronyand enable client-side leap smearing, unless the time source you're using already does server-side smearing. In most cases, we recommend the Google Public NTP time source because it handles smearing the leap second. If you use a different NTP time source that doesn't smear the leap second, you must configure client-side smearing manually and do so in the same way on each machine.
- Do not run more than one clock sync service on VMs where
- For new clusters using the multi-region SQL abstractions, we recommend lowering the
250ms. This is especially helpful for lowering the write latency of global tables. Note that this will require restarting all of the nodes in your cluster at the same time; it cannot be done with a rolling restart.
For guidance on synchronizing clocks, see the tutorial for your deployment environment:
|On-Premises||Use NTP with Google's external NTP service.|
|AWS||Use the Amazon Time Sync Service.|
|Azure||Disable Hyper-V time synchronization and use NTP with Google's external NTP service.|
|Digital Ocean||Use NTP with Google's external NTP service.|
|GCE||Use NTP with Google's internal NTP service.|
How can I tell how well node clocks are synchronized?
As explained in more detail in our monitoring documentation, each CockroachDB node exports a wide variety of metrics at
http://<host>:<http-port>/_status/vars in the format used by the popular Prometheus timeseries database. Two of these metrics export how close each node's clock is to the clock of all other nodes:
||The mean difference between the node's clock and other nodes' clocks in nanoseconds|
||The standard deviation of the difference between the node's clock and other nodes' clocks in nanoseconds|
As described in the above answer, a node will shut down if the mean offset of its clock from the other nodes' clocks exceeds 80% of the maximum offset allowed. It's recommended to monitor the
clock_offset_meannanos metric and alert if it's approaching the 80% threshold of your cluster's configured max offset.
You can also see these metrics in the Clock Offset graph on the DB Console.
How do I prepare for planned node maintenance?
By default, if a node stays offline for more than 5 minutes, the cluster will consider it dead and will rebalance its data to other nodes. Before temporarily stopping nodes for planned maintenance (e.g., upgrading system software), if you expect any nodes to be offline for longer than 5 minutes, you can prevent the cluster from unnecessarily rebalancing data off the nodes by increasing the
server.time_until_store_dead cluster setting to match the estimated maintenance window.
For example, let's say you want to maintain a group of servers, and the nodes running on the servers may be offline for up to 15 minutes as a result. Before shutting down the nodes, you would change the
server.time_until_store_dead cluster setting as follows:
> SET CLUSTER SETTING server.time_until_store_dead = '15m0s';
After completing the maintenance work and restarting the nodes, you would then change the setting back to its default:
> RESET CLUSTER SETTING server.time_until_store_dead;
It's also important to ensure that load balancers do not send client traffic to a node about to be shut down, even if it will only be down for a few seconds. If you find that your load balancer's health check is not always recognizing a node as unready before the node shuts down, you can increase the
server.shutdown.drain_wait setting, which tells the node to wait in an unready state for the specified duration. For example:
> SET CLUSTER SETTING server.shutdown.drain_wait = '10s';