SQL Performance Best Practices

This page provides best practices for optimizing query performance in CockroachDB.

DML best practices

Use multi-row statements instead of multiple single-row statements

For INSERT, UPSERT, and DELETE statements, a single multi-row statement is faster than multiple single-row statements. Whenever possible, use multi-row statements for DML queries instead of multiple single-row statements. For more information, see:

How to improve IoT application performance with multi-row DML

Use `UPSERT` instead of `INSERT ON CONFLICT` on tables with no secondary indexes

When inserting or updating columns on a table that does not have , Cockroach Labs recommends using an UPSERT statement instead of INSERT ON CONFLICT DO UPDATE. Whereas INSERT ON CONFLICT always performs a read, the UPSERT statement writes without reading, making it faster. This may be useful if you are using a simple SQL table of two columns to . If the table has a secondary index, there is no performance difference between UPSERT and INSERT ON CONFLICT. However, INSERT without an ON CONFLICT clause may not scan the table for existing values. This can provide a performance improvement over UPSERT.

Bulk-insert best practices

Use multi-row `INSERT` statements for bulk-inserts into existing tables

To bulk-insert data into an existing table, batch multiple rows in one multi-row INSERT statement. Experimentally determine the optimal batch size for your application by monitoring the performance for different batch sizes (10 rows, 100 rows, 1000 rows). Do not include bulk INSERT statements within an explicit transaction.

You can also use the statement to bulk-insert CSV data into an existing table.

For more information, see .

Large multi-row INSERT queries can lead to long-running transactions that result in . If a multi-row INSERT query results in an error code , we recommend breaking up the query up into smaller batches of rows.

Bulk-delete best practices

Use `TRUNCATE` instead of `DELETE` to delete all rows in a table

The statement removes all rows from a table by dropping the table and recreating a new table with the same name. This performs better than using DELETE, which performs multiple transactions to delete all rows.

Use batch deletes to delete a large number of rows

To delete a large number of rows, we recommend iteratively deleting batches of rows until all of the unwanted rows are deleted. For an example, see .

Batch delete “expired” data

CockroachDB has support for Time to Live (“TTL”) expiration on table rows, also known as Row-Level TTL. Row-Level TTL is a mechanism whereby rows from a table are considered “expired” and can be automatically deleted once those rows have been stored longer than a specified expiration time. For more information, see .

Assign column families

A column family is a group of columns in a table that is stored as a single key-value pair in the underlying key-value store. When a table is created, all columns are stored as a single column family. This default approach ensures efficient key-value storage and performance in most cases. However, when frequently updated columns are grouped with seldom updated columns, the seldom updated columns are nonetheless rewritten on every update. Especially when the seldom updated columns are large, it’s therefore more performant to .

Unique ID best practices

The best practices for generating unique IDs in a distributed database like CockroachDB are very different than for a single-node database. To create unique and non-sequential IDs, we recommend the following approaches (listed in order from best to worst performance):

Approach	Pros	Cons
1. Use multi-column primary keys	Potentially fastest, if done right	Complex, requires up-front design and testing to ensure performance
2. Use functions to generate unique IDs	Good performance; spreads load well; easy choice	May leave some performance on the table; requires other columns to be useful in queries
3. Use `INSERT` with the `RETURNING` clause	Easy to query against; familiar design	Slower performance than the other options; higher chance of transaction contention

Traditional approaches using monotonically increasing or data types will create hotspots for both reads and writes in a distributed database like CockroachDB. We . If a table must be indexed on sequential keys, use . Hash-sharded indexes distribute sequential traffic uniformly across ranges, eliminating single-range and improving write performance on sequentially-keyed indexes at a small cost to read performance.

Use multi-column primary keys

A well-designed multi-column primary key can yield even better performance than a UUID primary key, but it requires more up-front schema design work. To get the best performance, ensure that any monotonically increasing field is located after the first column of the primary key. When done right, such a composite primary key should result in:

Enough randomness in your primary key to spread the table data / query load relatively evenly across the cluster, which will avoid hotspots. By “enough randomness” we mean that the prefix of the primary key should be relatively uniformly distributed over its domain. Its domain should have at least as many elements as you have nodes.
A monotonically increasing column that is part of the primary key (and thus indexed) which is also useful in your queries.

For example, consider a social media website. Social media posts are written by users, and on login the user’s last 10 posts are displayed. A good choice for a primary key might be (username, post_timestamp). For example:

> CREATE TABLE posts (
    username STRING,
    post_timestamp TIMESTAMP,
    post_id INT,
    post_content STRING,
    CONSTRAINT posts_pk PRIMARY KEY(username, post_timestamp)
);

This would make the following query efficient.

> SELECT * FROM posts
          WHERE username = 'alyssa'
       ORDER BY post_timestamp DESC
          LIMIT 10;

  username |      post_timestamp       | post_id | post_content
+----------+---------------------------+---------+--------------+
  alyssa   | 2019-07-31 18:01:00+00:00 |    ...  | ...
  alyssa   | 2019-07-30 10:22:00+00:00 |    ...  | ...
  alyssa   | 2019-07-30 09:12:00+00:00 |    ...  | ...
  alyssa   | 2019-07-29 13:48:00+00:00 |    ...  | ...
  alyssa   | 2019-07-29 13:47:00+00:00 |    ...  | ...
  alyssa   | 2019-07-29 13:46:00+00:00 |    ...  | ...
  alyssa   | 2019-07-29 13:43:00+00:00 |    ...  | ...
  ...

Time: 924µs

To see why, let’s look at the output. It shows that the query is fast because it does a point lookup on the indexed column username (as shown by the line spans | /"alyssa"-...). Furthermore, the column post_timestamp is already in an index, and sorted (since it’s a monotonically increasing part of the primary key).

> EXPLAIN (VERBOSE)
    SELECT * FROM posts
            WHERE username = 'alyssa'
         ORDER BY post_timestamp DESC
            LIMIT 10;

                              info
----------------------------------------------------------------
  distribution: local
  vectorized: true

  • revscan
    columns: (username, post_timestamp, post_id, post_content)
    ordering: -post_timestamp
    estimated row count: 10 (missing stats)
    table: posts@posts_pk
    spans: /"alyssa"-/"alyssa"/PrefixEnd
    limit: 10
(10 rows)

Time: 1ms total (execution 1ms / network 0ms)

Note that the above query also follows the of indexing all columns in the WHERE clause.

Use functions to generate unique IDs

To auto-generate unique row identifiers, you can use the following :

Use gen_random_uuid(): Generates a UUIDv4 with UUID data type.
Use uuid_v4(): Generates a UUIDv4 with BYTES data type.
Use unique_rowid(): Generates a globally unique INT data type

For performance reasons, if you are going to use UUIDs, Cockroach Labs strongly recommends using UUIDv4 as defined by RFC 4122. This is the format generated by the . Other types of UUID are largely untested with CockroachDB and will require performance testing to avoid hotspots.

Use `gen_random_uuid()`

To use the column with the gen_random_uuid() as the :

CREATE TABLE users (
    id UUID NOT NULL DEFAULT gen_random_uuid(),
    city STRING NOT NULL,
    name STRING NULL,
    address STRING NULL,
    credit_card STRING NULL,
    CONSTRAINT "primary" PRIMARY KEY (city ASC, id ASC),
    FAMILY "primary" (id, city, name, address, credit_card)
);

INSERT INTO users (name, city) VALUES ('Petee', 'new york'), ('Eric', 'seattle'), ('Dan', 'seattle');

SELECT * FROM users;

                   id                  |   city   | name  | address | credit_card
+--------------------------------------+----------+-------+---------+-------------+
  cf8ee4e2-cd74-449a-b6e6-a0fb2017baa4 | new york | Petee | NULL    | NULL
  2382564e-702f-42d9-a139-b6df535ae00a | seattle  | Eric  | NULL    | NULL
  7d27e40b-263a-4891-b29b-d59135e55650 | seattle  | Dan   | NULL    | NULL
(3 rows)

For performance, CockroachDB defaults to skipping uniqueness checks for gen_random_uuid() due to the near-zero probability of UUID collisions. UUID uniqueness checks can be enabled by setting the cluster setting to true.

Use `uuid_v4()`

Alternatively, you can use the column with the uuid_v4() function as the default value:

CREATE TABLE users2 (
    id BYTES DEFAULT uuid_v4(),
    city STRING NOT NULL,
    name STRING NULL,
    address STRING NULL,
    credit_card STRING NULL,
    CONSTRAINT "primary" PRIMARY KEY (city ASC, id ASC),
    FAMILY "primary" (id, city, name, address, credit_card)
);

INSERT INTO users2 (name, city) VALUES ('Anna', 'new york'), ('Jonah', 'seattle'), ('Terry', 'chicago');

SELECT * FROM users;

                        id                       |   city   | name  | address | credit_card
+------------------------------------------------+----------+-------+---------+-------------+
  4\244\277\323/\261M\007\213\275*\0060\346\025z | chicago  | Terry | NULL    | NULL
  \273*t=u.F\010\274f/}\313\332\373a             | new york | Anna  | NULL    | NULL
  \004\\\364nP\024L)\252\364\222r$\274O0         | seattle  | Jonah | NULL    | NULL
(3 rows)

In either case, generated IDs will be 128-bit, sufficiently large to generate unique values. Once the table grows beyond a single key-value range’s , new IDs will be scattered across all of the table’s ranges and, therefore, likely across different nodes. This means that multiple nodes will share in the load. This approach has the disadvantage of creating a primary key that may not be useful in a query directly, which can require a join with another table or a secondary index.

Use `unique_rowid()`

If it is important for generated IDs to be stored in the same key-value range, you can use an with the unique_rowid() as the default value, either explicitly or via the :

CREATE TABLE users3 (
    id INT DEFAULT unique_rowid(),
    city STRING NOT NULL,
    name STRING NULL,
    address STRING NULL,
    credit_card STRING NULL,
    CONSTRAINT "primary" PRIMARY KEY (city ASC, id ASC),
    FAMILY "primary" (id, city, name, address, credit_card)
);

INSERT INTO users3 (name, city) VALUES ('Blake', 'chicago'), ('Hannah', 'seattle'), ('Bobby', 'seattle');

SELECT * FROM users3;

          id         |  city   |  name  | address | credit_card
+--------------------+---------+--------+---------+-------------+
  469048192112197633 | chicago | Blake  | NULL    | NULL
  469048192112263169 | seattle | Hannah | NULL    | NULL
  469048192112295937 | seattle | Bobby  | NULL    | NULL
(3 rows)

Upon insert or upsert, the unique_rowid() function generates a default value from the timestamp and ID of the node executing the insert. Such time-ordered values are likely to be globally unique except in cases where a very large number of IDs (100,000+) are generated per node per second. Also, there can be gaps and the order is not completely guaranteed. In multi-region deployments with tables, uniqueness checks can add latency due to cross-partition validation. You can improve performance by setting the skip_unique_checks index storage parameter to true, but this is only recommended if the application can guarantee uniqueness. For more information, refer to . To understand the differences between the UUID and unique_rowid() options, see the . For further background on UUIDs, see What is a UUID, and Why Should You Care?.

Use `INSERT` with the `RETURNING` clause to generate unique IDs

If something prevents you from using multi-column primary keys or UUIDs to generate unique IDs, you might resort to using INSERTs with SELECTs to return IDs. Instead, as shown below for improved performance.

Generate monotonically-increasing unique IDs

Suppose the table schema is as follows:

> CREATE TABLE X (
	ID1 INT,
	ID2 INT,
	ID3 INT DEFAULT 1,
	PRIMARY KEY (ID1,ID2)
  );

The common approach would be to use a transaction with an INSERT followed by a SELECT:

> BEGIN;

> INSERT INTO X VALUES (1,1,1)
  	ON CONFLICT (ID1,ID2)
  	DO UPDATE SET ID3=X.ID3+1;

> SELECT * FROM X WHERE ID1=1 AND ID2=1;

> COMMIT;

However, the performance best practice is to use a RETURNING clause with INSERT instead of the transaction:

> INSERT INTO X VALUES (1,1,1),(2,2,2),(3,3,3)
	ON CONFLICT (ID1,ID2)
	DO UPDATE SET ID3=X.ID3 + 1
	RETURNING ID1,ID2,ID3;

Generate random unique IDs

Suppose the table schema is as follows:

> CREATE TABLE X (
	ID1 INT,
	ID2 INT,
	ID3 INT DEFAULT unique_rowid(),
	PRIMARY KEY (ID1,ID2)
  );

The common approach to generate random Unique IDs is a transaction using a SELECT statement:

> BEGIN;

> INSERT INTO X VALUES (1,1);

> SELECT * FROM X WHERE ID1=1 AND ID2=1;

> COMMIT;

However, the performance best practice is to use a RETURNING clause with INSERT instead of the transaction:

> INSERT INTO X VALUES (1,1),(2,2),(3,3)
	RETURNING ID1,ID2,ID3;

Secondary index best practices

See .

Join best practices

See .

Subquery best practices

See .

Authorization best practices

See .

Table scan best practices

Avoid `SELECT *` for large tables

For large tables, avoid table scans (that is, reading the entire table data) whenever possible. Instead, define the required fields in a SELECT statement. For example, suppose the table schema is as follows:

> CREATE TABLE accounts (
	id INT,
	customer STRING,
	address STRING,
	balance INT
	nominee STRING
	);

Now if we want to find the account balances of all customers, an inefficient table scan would be:

> SELECT * FROM ACCOUNTS;

This query retrieves all data stored in the table. A more efficient query would be:

 > SELECT CUSTOMER, BALANCE FROM ACCOUNTS;

This query returns the account balances of the customers.

Avoid `SELECT DISTINCT` for large tables

SELECT DISTINCT allows you to obtain unique entries from a query by removing duplicate entries. However, SELECT DISTINCT is computationally expensive. As a performance best practice, use instead.

Use secondary indexes to optimize queries

See .

Use `AS OF SYSTEM TIME` to decrease conflicts with long-running queries

If you have long-running queries (such as analytics queries that perform full table scans) that can tolerate slightly out-of-date reads, consider using the . Using this, your query returns data as it appeared at a distinct point in the past and will not cause with other concurrent transactions, which can increase your application’s performance. However, because AS OF SYSTEM TIME returns historical data, your reads might be stale.

Prevent the optimizer from planning full scans

To avoid overloading production clusters, there are several ways to prevent the from generating query plans with .

Use index hints to prevent full scans on tables

To prevent the optimizer from planning a full scan for a specific table, specify the NO_FULL_SCAN index hint. For example:
SELECT * FROM table_name@{NO_FULL_SCAN};
To prevent a full scan of a for a specific table, you must specify NO_FULL_SCAN in combination with the index name using . For example:
SELECT * FROM table_name@{FORCE_INDEX=index_name,NO_FULL_SCAN} WHERE b > 0;
This forces a constrained scan of the partial index. If a constrained scan of the partial index is not possible, an error is returned.

Disallow query plans that use full scans

When the disallow_full_table_scans is enabled, the optimizer will not plan full table or index scans on “large” tables (i.e., those with more rows than ).

At the cluster level, set disallow_full_table_scans for some or . For example:
ALTER ROLE ALL SET disallow_full_table_scans = true;
At the application level, add disallow_full_table_scans to the connection string using the .

If you disable full scans, you can set the to specify the maximum table size allowed for a full scan. If no alternative plan is possible, the optimizer will return an error. If you disable full scans, and you provide an , the optimizer will try to avoid a full scan while also respecting the index hint. If this is not possible, the optimizer will return an error. If you do not provide an index hint and it is not possible to avoid a full scan, the optimizer will return an error, the full scan will be logged, and the sql.guardrails.full_scan_rejected.count will be updated.

Disallow query plans that scan more than a number of rows

When the transaction_rows_read_err is enabled, the will not create query plans with scans that exceed the specified row limit. See Disallow transactions from reading or writing many rows.

Disallow transactions from reading or writing many rows

When the transaction_rows_read_err is enabled, transactions that read more than the specified number of rows will fail. In addition, the will not create query plans with scans that exceed the specified row limit. For example, to set a default value for at the cluster level:
ALTER ROLE ALL SET transaction_rows_read_err = 1000;
When the transaction_rows_written_err is enabled, transactions that write more than the specified number of rows will fail. For example, to set a default value for all users at the cluster level:
ALTER ROLE ALL SET transaction_rows_written_err = 1000;

To assess the impact of configuring these session settings, use the corresponding session settings and to log transactions that read or write the specified number of rows. Transactions are logged to the channel.

Transaction contention

Transaction contention occurs when the following three conditions are met:

There are multiple concurrent transactions or statements (sent by multiple clients connected simultaneously to a single CockroachDB cluster).
They operate on table rows with the same index key values (either on or secondary ).
At least one of the transactions holds a or exclusive on the data.

Writing transactions to prevent interactions with concurrent transactions. Locking reads issued with perform a similar function by placing an on rows, which can cause contention. By default under isolation, transactions that operate on the same index key values (specifically, that operate on the same for a given index key) are strictly serialized. To maintain this isolation, SERIALIZABLE transactions at commit time to verify that the values they read were not subsequently updated by other, concurrent transactions. If read refreshing is unsuccessful, then the transaction must be retried. , you may observe:

. This occurs when multiple transactions are trying to write to the same “locked” data at the same time, making a transaction unable to complete. This is also known as lock contention.
performed automatically by CockroachDB. This occurs if a transaction cannot be placed into a serializable ordering among all of the currently-executing transactions. This is also called a serialization conflict.
, which are emitted to your client when an automatic retry is not possible or fails. Under SERIALIZABLE isolation, your application must address transaction retry errors with .
Cluster hotspots.

To mitigate these effects, and . For further background on transaction contention, see What is Database Contention, and Why Should You Care?.

Reduce transaction contention

You can reduce the causes of transaction contention:

Limit the number of affected rows by following (e.g., avoiding full scans, creating secondary indexes, etc.). Not only will transactions run faster, lock fewer rows, and hold locks for a shorter duration, but the chances of when the transaction’s , due to a conflicting write, are decreased because of a smaller read set (i.e., a smaller number of rows read).
Break down larger transactions (e.g., ) into smaller ones to have transactions hold locks for a shorter duration. For example, use to group multiple clauses together in a single SQL statement. This will also decrease the likelihood of . For instance, as the size of writes (number of rows written) decreases, the chances of the transaction’s timestamp getting bumped by concurrent reads decreases.
Use to aggressively lock rows that will later be updated in the transaction. Updates must operate on the most recent version of a row, so a concurrent write to the row will cause a retry error (). Locking early in the transaction forces concurrent writers to block until the transaction is finished, which prevents the retry error. Note that this locks the rows for the duration of the transaction; whether this is tenable will depend on your workload. For more information, see When and why to use SELECT FOR UPDATE in CockroachDB.
Use historical reads (), preferably or when possible to reduce conflicts with other writes. This reduces the likelihood of errors as fewer writes will happen at the historical timestamp. More specifically, writes’ timestamps are less likely to be pushed by historical reads as they would . Note that if the AS OF SYSTEM TIME value is below the closed timestamp, the read cannot be invalidated.
When replacing values in a row, use and specify values for all columns in the inserted rows. This will usually have the best performance under contention, compared to combinations of , , and .
If applicable to your workload, assign and separate columns that are frequently read and written into separate columns. Transactions will operate on disjoint column families and reduce the likelihood of conflicts.
For workloads where large or transactions run concurrently over similar key ranges, watch for anchor hotspots (for example, many concurrent transactions with on the same ). In these cases, consider enabling the cluster setting to randomize the location of transaction anchor keys. This can spread transaction records across ranges and reduce hotspotting. Only use this setting after confirming anchor hotspots via contention and range-level observability.
As a last resort, consider adjusting the using the kv.closed_timestamp.target_duration to reduce the likelihood of long-running write transactions having their . This setting should be carefully adjusted if no other mitigations are available because there can be downstream implications (e.g., historical reads, change data capture feeds, statistics collection, handling zone configurations, etc.). For example, a transaction A is forced to refresh (i.e., change its timestamp) due to hitting the maximum interval (closed timestamps enable and ). This can happen when transaction A is a long-running transaction, and there is a write by another transaction to data that A has already read.

If you increase the kv.closed_timestamp.target_duration setting, it means that you are increasing the amount of time by which the data available in and lags behind the current state of the cluster. In other words, there is a trade-off here: if you absolutely must execute long-running transactions that execute concurrently with other transactions that are writing to the same data, you may have to settle for longer delays on Follower Reads and/or CDC to avoid frequent serialization errors. The anomaly that would be exhibited if these transactions were not retried is called write skew.

Improve transaction performance by sizing and configuring the cluster

To maximize transaction performance, you’ll need to maximize the performance of a single . To achieve this, you can apply multiple strategies:

Minimize the network distance between the , possibly using and , or the newer .
Use the fastest available.
If the contending transactions operate on different keys within the same range, add . However, if the transactions all operate on the same key, this may not provide an improvement.

Hotspots

A hotspot is any location on the cluster receiving significantly more requests than another. Hotspots are a symptom of resource contention and can create problems as requests increase, including excessive transaction contention. For a detailed explanation of hotspot causes and mitigation strategies, refer to .

​DML best practices

​Use multi-row statements instead of multiple single-row statements

​Use UPSERT instead of INSERT ON CONFLICT on tables with no secondary indexes

​Bulk-insert best practices

​Use multi-row INSERT statements for bulk-inserts into existing tables

​Bulk-delete best practices

​Use TRUNCATE instead of DELETE to delete all rows in a table

​Use batch deletes to delete a large number of rows

​Batch delete “expired” data

​Assign column families

​Unique ID best practices

​Use multi-column primary keys

​Use functions to generate unique IDs

​Use gen_random_uuid()

​Use uuid_v4()

​Use unique_rowid()

​Use INSERT with the RETURNING clause to generate unique IDs

​Generate monotonically-increasing unique IDs

​Generate random unique IDs

​Secondary index best practices

​Join best practices

​Subquery best practices

​Authorization best practices

​Table scan best practices

​Avoid SELECT * for large tables

​Avoid SELECT DISTINCT for large tables

​Use secondary indexes to optimize queries

​Use AS OF SYSTEM TIME to decrease conflicts with long-running queries

​Prevent the optimizer from planning full scans

​Use index hints to prevent full scans on tables

​Disallow query plans that use full scans

​Disallow query plans that scan more than a number of rows

​Disallow transactions from reading or writing many rows

​Transaction contention

​Reduce transaction contention

​Improve transaction performance by sizing and configuring the cluster

​Hotspots

​See also

DML best practices

Use multi-row statements instead of multiple single-row statements

Use `UPSERT` instead of `INSERT ON CONFLICT` on tables with no secondary indexes

Bulk-insert best practices

Use multi-row `INSERT` statements for bulk-inserts into existing tables

Bulk-delete best practices

Use `TRUNCATE` instead of `DELETE` to delete all rows in a table

Use batch deletes to delete a large number of rows

Batch delete “expired” data

Assign column families

Unique ID best practices

Use multi-column primary keys

Use functions to generate unique IDs

Use `gen_random_uuid()`

Use `uuid_v4()`

Use `unique_rowid()`

Use `INSERT` with the `RETURNING` clause to generate unique IDs

Generate monotonically-increasing unique IDs

Generate random unique IDs

Secondary index best practices

Join best practices

Subquery best practices

Authorization best practices

Table scan best practices

Avoid `SELECT *` for large tables

Avoid `SELECT DISTINCT` for large tables

Use secondary indexes to optimize queries

Use `AS OF SYSTEM TIME` to decrease conflicts with long-running queries

Prevent the optimizer from planning full scans

Use index hints to prevent full scans on tables

Disallow query plans that use full scans

Disallow query plans that scan more than a number of rows

Disallow transactions from reading or writing many rows

Transaction contention

Reduce transaction contention

Improve transaction performance by sizing and configuring the cluster

Hotspots

See also