Publication date: February 12, 2020
Description
When running SHOW JOBS
or viewing the Jobs page in the Admin UI, high memory usage can be incurred on a node to the point it could crash.
Statement
SHOW JOBS
and the Jobs page in the Admin UI internally load all the job descriptions from the cluster in RAM before displaying them.
Under reasonable production settings, a single backup job payload may exceed 5MB in size. Considering an hourly backup and default property for jobs.retention_time
set to 336h, a single use of SHOW JOBS
or a single user of the Jobs page in the Admin UI can incur ~1.7GB of memory utilization. This allocation is then multiplied by the number of concurrent accesses to the jobs table.
Starting in CockroachDB v19.2.3, new jobs payloads are reduced in size. A later version will also avoid loading old job entries in memory when viewing recent jobs.
This public issue is tracked as #44166.
Mitigation
It is possible to reduce the number of job entries overall by setting the jobs.retention_time
cluster setting to a value closer to 48h or 24h.
For example:
SET CLUSTER SETTING jobs.retention_time='48:00:00'.
Additionally, if the nodes are observed to crash due to excessive memory usage, it may be necessary to truncate the job history. This can be achieved, for example, with:
DELETE from system.jobs
WHERE status = 'succeeded'
AND created < (now() - '2 days'::interval);
Impact
All deployments running CockroachDB v19.2.0 to v19.2.2 are affected.
Questions about any technical alert can be directed to our support team.