Publication date: February 12, 2020.

Description

When running SHOW JOBS or viewing the Jobs page in the Admin UI, high memory usage can be incurred on a node to the point it could crash.

Statement

SHOW JOBS and the Jobs page in the Admin UI internally load all the job descriptions from the cluster in RAM before displaying them.

Under reasonable production settings, a single backup job payload may exceed 5MB in size. Considering an hourly backup and default property for jobs.retention_time set to 336h, a single use of SHOW JOBS or a single user of the Jobs page in the Admin UI can incur ~1.7GB of memory utilization. This allocation is then multiplied by the number of concurrent accesses to the jobs table.

Starting in CockroachDB v19.2.3, new jobs payloads are reduced in size. A later version will also avoid loading old job entries in memory when viewing recent jobs.

This public issue is tracked as #44166.

Mitigation

It is possible to reduce the number of job entries overall by setting the jobs.retention_time cluster setting to a value closer to 48h or 24h.

For example:

SET CLUSTER SETTING jobs.retention_time='48:00:00'.

Additionally, if the nodes are observed to crash due to excessive memory usage, it may be necessary to truncate the job history. This can be achieved, for example, with:

DELETE from system.jobs
WHERE status = 'succeeded'
  AND created < (now() - '2 days'::interval);

Impact

All deployments running CockroachDB v19.2.0 to v19.2.2 are affected.

Questions about any technical alert can be directed to our support team.



Yes No