Troubleshoot Common Problems

On this page

Warning:

As of June 5, 2024, CockroachDB v22.2 is no longer supported. For more details, refer to the Release Support Policy.

This page has instructions for handling errors and troubleshooting problems that may arise during application development.

Troubleshoot query problems

If you are not satisfied with your SQL query performance, follow the instructions in Optimize Statement Performance Overview to be sure you are avoiding common performance problems like full table scans, missing indexes, etc.

If you have already optimized your SQL queries as described in Optimize Statement Performance Overview and are still having issues such as:

Hanging or "stuck" queries, usually due to contention with long-running transactions
Queries that are slow some of the time (but not always)
Low throughput of queries

Take a look at Troubleshoot SQL Behavior.

Note:

If you aren't sure whether SQL query performance needs to be improved on your cluster, see Identify slow statements.

Transaction retry errors

Messages with the PostgreSQL error code 40001 and the string restart transaction indicate that a transaction failed because it conflicted with another concurrent or recent transaction accessing the same data. The transaction needs to be retried by the client.

If your language's client driver or ORM implements transaction retry logic internally (e.g., if you are using Python and SQLAlchemy with the CockroachDB dialect), then you do not need to handle this logic from your application.

If your driver or ORM does not implement this logic, then you will need to implement a retry loop in your application.

The Python-like pseudocode below shows how to implement an application-level retry loop; it does not require your driver or ORM to implement advanced retry handling logic, so it can be used from any programming language or environment. In particular, your retry loop must:

Raise an error if the max_retries limit is reached
Retry on 40001 error codes
COMMIT at the end of the try block
Implement exponential backoff logic as shown below for best performance

while true:
    n++
    if n == max_retries:
        throw Error("did not succeed within N retries")
    try:
        # add logic here to run all your statements
        conn.exec('COMMIT')
        break
    catch error:
        if error.code != "40001":
            throw error
        else:
            # This is a retry error, so we roll back the current transaction
            # and sleep for a bit before retrying. The sleep time increases
            # for each failed transaction.  Adapted from
            # https://colintemple.com/2017/03/java-exponential-backoff/
            conn.exec('ROLLBACK');
            sleep_ms = int(((2**n) * 100) + rand( 100 - 1 ) + 1)
            sleep(sleep_ms) # Assumes your sleep() takes milliseconds

Note:

If a consistently high percentage of your transactions are resulting in transaction retry errors with the error code 40001 and the string restart transaction, then you may need to evaluate your schema design and data access patterns to find and remove sources of contention. For more information about contention, see Transaction Contention.

For more information about what is causing a specific transaction retry error code, see the Transaction Retry Error Reference.

For more information about transaction retry errors, see Transaction retries.

Unsupported SQL features

CockroachDB has support for most SQL features.

Additionally, CockroachDB supports the PostgreSQL wire protocol and the majority of its syntax. This means that existing applications can often be migrated to CockroachDB without changing application code.

However, you may encounter features of SQL or the PostgreSQL dialect that are not supported by CockroachDB. For example, the following PostgreSQL features are not supported:

Stored procedures
- New in v22.2: CockroachDB has support for user-defined functions.
Triggers.
Events.
FULLTEXT functions and indexes.
- Depending on your use case, you may be able to get by using trigram indexes to do fuzzy string matching and pattern matching.
Drop primary key.

Note:

Each table must have a primary key associated with it. You can drop and add a primary key constraint within a single transaction.
XML functions.
Column-level privileges.
XA syntax.
Creating a database from a template.
Dropping a single partition from a table.
Foreign data wrappers.
Advisory Lock Functions (although some functions are defined with no-op implementations).

For more information about the differences between CockroachDB and PostgreSQL feature support, see PostgreSQL Compatibility.

For more information about the SQL standard features supported by CockroachDB, see SQL Feature Support.

Troubleshoot cluster problems

As a developer, you will mostly be working with the CockroachDB SQL API.

However, you may need to access the underlying cluster to troubleshoot issues where the root cause is not your SQL, but something happening at the cluster level. Symptoms of cluster-level issues can include:

Cannot join a node to an existing cluster
Networking, client connection, or authentication issues
Clock sync, replication, or node liveness issues
Capacity planning, storage, or memory issues
Node decommissioning failures

For more information about how to troubleshoot cluster-level issues, see Troubleshoot Cluster Setup.

Troubleshoot SQL client application problems

High client CPU load, connection pool exhaustion, or increased connection latency when SCRAM Password-based Authentication is enabled

Overview
Mitigation steps while keeping SCRAM enabled
Downgrade from SCRAM authentication

Overview

When SASL/SCRAM-SHA-256 Secure Password-based Authentication (SCRAM Authentication) is enabled on a cluster, some additional CPU load is incurred on client applications, which are responsible for handling SCRAM hashing. It's important to plan for this additional CPU load to avoid performance degradation, CPU starvation, and connection pool exhaustion on the client. For example, the following set of circumstances can exhaust the client application's resources:

SCRAM Authentication is enabled on the cluster (the server.user_login.password_encryption cluster setting is set to scram-sha-256).
The client driver's connection pool has no defined maximum number of connections, or is configured to close idle connections eagerly.
The client application issues transactions concurrently.

In this situation, each new connection uses more CPU on the client application server than connecting to a cluster without SCRAM Authentication enabled. Because of this additional CPU load, each concurrent transaction is slower, and a larger quantity of concurrent transactions can accumulate, in conjunction with a larger number of concurrent connections. In this situation, it can be difficult for the client application server to recover.

Some applications may also see increased connection latency. This can happen because SCRAM incurs additional round trips during authentication which can add latency to the initial connection.

For more information about how SCRAM works, see SASL/SCRAM-SHA-256 Secure Password-based Authentication.

Mitigation steps while keeping SCRAM enabled

To mitigate against this situation while keeping SCRAM authentication enabled, Cockroach Labs recommends that you:

Test and adjust your workloads in batches when migrating to SCRAM authentication.
Start by enabling SCRAM authentication in a testing environment, and test the performance of your client application against the types of workloads you expect it to handle in production before rolling the changes out to production.
Limit the maximum number of connections in the client driver's connection pool.
Limit the maximum number of concurrent transactions the client application can issue.

If the above steps don't work, you can try lowering the default hashing cost and reapplying the password as described below.

Lower default hashing cost and reapply the password

To decrease the CPU usage of SCRAM password hashing while keeping SCRAM enabled:

Set the server.user_login.password_hashes.default_cost.scram_sha_256 cluster setting to 4096:
```
SET CLUSTER SETTING server.user_login.password_hashes.default_cost.scram_sha_256 = 4096;
```
Make sure the server.user_login.rehash_scram_stored_passwords_on_cost_change.enabled cluster setting is set to true (the default).

Tip:

When lowering the default hashing cost, we recommend that you use strong, complex passwords for SQL users.

If you are still seeing higher connection latencies than before, you can downgrade from SCRAM authentication.

Downgrade from SCRAM authentication

As an alternative to the mitigation steps listed above, you can downgrade from SCRAM authentication to bcrypt as follows:

Set the server.user_login.password_encryption cluster setting to crdb-bcrypt:
```
SET CLUSTER SETTING server.user_login.password_encryption = 'crdb-bcrypt';
```
Ensure the server.user_login.downgrade_scram_stored_passwords_to_bcrypt.enabled cluster setting is set to true:
```
SET CLUSTER SETTING server.user_login.downgrade_scram_stored_passwords_to_bcrypt.enabled = true;
```

Note:

The server.user_login.upgrade_bcrypt_stored_passwords_to_scram.enabled cluster setting can be left at its default value of true.

Cockroach
University

Docs Hub

Troubleshoot Common Problems

Troubleshoot query problems

Transaction retry errors

Unsupported SQL features

Troubleshoot cluster problems

Troubleshoot SQL client application problems

High client CPU load, connection pool exhaustion, or increased connection latency when SCRAM Password-based Authentication is enabled

Overview

Mitigation steps while keeping SCRAM enabled

Lower default hashing cost and reapply the password

Downgrade from SCRAM authentication

See also

Tasks

Reference

Cockroach University

Docs Hub

Cockroach University

Docs Hub

Troubleshoot Common Problems

Troubleshoot query problems

Transaction retry errors

Unsupported SQL features

Troubleshoot cluster problems

Troubleshoot SQL client application problems

High client CPU load, connection pool exhaustion, or increased connection latency when SCRAM Password-based Authentication is enabled

Overview

Mitigation steps while keeping SCRAM enabled

Lower default hashing cost and reapply the password

Downgrade from SCRAM authentication

See also

Tasks

Reference

Cockroach
University

Cockroach
University