Cost-Based Optimizer

On this page

The cost-based optimizer seeks the lowest cost for a query, usually related to time.

How is cost calculated?

A given SQL query can have thousands of equivalent query plans with vastly different execution times. The cost-based optimizer enumerates these plans and chooses the lowest cost plan.

Cost is roughly calculated by:

Estimating how much time each node in the query plan will use to process all results
Modeling how data flows through the query plan

The most important factor in determining the quality of a plan is cardinality (i.e., the number of rows); the fewer rows each SQL operator needs to process, the faster the query will run.

Table statistics

The cost-based optimizer can often find more performant query plans if it has access to statistical data on the contents of your tables. This data needs to be generated from scratch for new tables, and refreshed periodically for existing tables.

The optimizer can use three types of statistics to plan queries:

Full statistics
Partial statistics
Forecasted statistics

For best query performance, most users should leave automatic statistics enabled with the default settings. Advanced users can follow the steps provided in the following sections for performance tuning and troubleshooting.

Full statistics

By default, CockroachDB automatically generates full statistics when tables are created and after schema changes. Full statistics for a table are automatically refreshed when approximately 20% of its rows are updated.

A background job automatically determines which columns to get statistics on. Specifically, the optimizer chooses:

Columns that are part of the primary key or an index (in other words, all indexed columns).
Up to 100 non-indexed columns.

By default, CockroachDB also automatically collects multi-column statistics on columns that prefix an index.

To control automatic collection of full statistics, use the following settings:

sql.stats.automatic_full_collection.enabled: Cluster setting that enables automatic collection of full table statistics across all tables in the cluster.
sql_stats_automatic_full_collection_enabled: Table storage parameter that overrides the cluster setting when applied to a specific table.

Control statistics refresh rate

Full statistics are refreshed in the following cases:

When there are no statistics.
When it has been a long time since the last refresh, where "long time" is based on a moving average of the time across the last several refreshes.
After a successful IMPORT INTO or RESTORE into the table.
After any schema change affecting the table.

After each mutation operation (INSERT, UPDATE, or DELETE), the probability of a refresh is calculated using a formula that takes the cluster settings shown in the following table as inputs. These settings define the target number of rows in a table that must be stale before statistics on that table are refreshed. Increasing either setting will reduce the frequency of refreshes. In particular, min_stale_rows impacts the frequency of refreshes for small tables, while fraction_stale_rows has more of an impact on larger tables.

Setting	Default Value	Details
`sql.stats.automatic_collection.fraction_stale_rows`	0.2	Target fraction of stale rows per table that will trigger a statistics refresh.
`sql.stats.automatic_collection.min_stale_rows`	500	Target minimum number of stale rows per table that will trigger a statistics refresh.

Note:

Because the formula for statistics refreshes is probabilistic, you will not see statistics update immediately after changing these settings, or immediately after exactly 500 rows have been updated.

The "stale row" cluster settings also have the table setting counterparts sql_stats_automatic_collection_fraction_stale_rows and sql_stats_automatic_collection_min_stale_rows. For example:

CREATE TABLE accounts (
    id INT PRIMARY KEY,
    balance DECIMAL)
WITH (sql_stats_automatic_collection_enabled = true,
sql_stats_automatic_collection_min_stale_rows = 1000000,
sql_stats_automatic_collection_fraction_stale_rows= 0.05
);

ALTER TABLE accounts
SET (sql_stats_automatic_collection_fraction_stale_rows = 0.1,
sql_stats_automatic_collection_min_stale_rows = 2000);

Automatic statistics rules are checked once per minute. While altered automatic statistics table settings take immediate effect for any subsequent DML statements on a table, running row mutations that started prior to modifying the table settings may still trigger statistics collection based on the settings that existed before you ran the ALTER TABLE ... SET statement.

Small versus large table examples

Suppose the cluster settings sql.stats.automatic_collection.fraction_stale_rows and sql.stats.automatic_collection.min_stale_rows have the default values .2 and 500 as shown in the preceding table.

If a table has 100 rows and 20 became stale, a re-collection would not be triggered because, even though 20% of the rows are stale, they do not meet the 500-row minimum.

On the other hand, if a table has 1,500,000,000 rows, then 20% of that, or 300,000,000 rows, would have to become stale before automatic statistics collection was triggered. With a table this large, you would have to lower sql.stats.automatic_collection.fraction_stale_rows significantly to allow for regular statistics collections. Doing so can cause smaller tables to have statistics collected much more frequently, because it is a global setting that affects automatic statistics collection for all tables.

In such cases, we recommend that you use the sql_stats_automatic_collection_enabled storage parameter, which lets you configure automatic statistics collection on a per-table basis.

Partial statistics

Partial statistics are collected on a subset of table data without scanning the full table. Partial statistics can improve query performance in large tables where only a portion of rows are regularly updated or queried.

Whereas full statistics refresh infrequently and can allow stale rows to accumulate, partial statistics automatically refresh at a lower threshold of stale rows. Partial statistics automatically collect on extreme index values, which is particularly valuable for timestamp indexes where workloads commonly access the most recent data. They can also be collected manually.

Partial statistics have the following constraints:

Partial statistics can only be collected if full statistics already exist for the table.
Partial statistics collected automatically, or with USING EXTREMES and no ON clause, are collected on all single-column prefixes of non-inverted indexes. Indexes that are partial, hash-sharded, or implicitly partitioned (such as in REGIONAL BY ROW tables) are excluded.
For manual collection with specific columns, each specified column must be the first key column of a non-inverted index. When using the WHERE clause, the predicate must also filter on the index column. If no matching index exists or if full statistics were not previously collected on the specified column, the statement returns an error.

By default, the optimizer uses partial statistics for query planning. It merges partial statistics with existing full statistics to produce more accurate cardinality estimates.

Automatically collect partial statistics

Partial statistics are automatically collected on the highest and lowest index values when:

Automatic collection is enabled.
The number of stale rows in a table reaches a specified threshold.

This is particularly beneficial for large tables where only a portion is regularly updated or queried, such as tables with timestamp columns where recent data is frequently accessed.

To control automatic collection of partial statistics, use the following cluster settings to configure behavior across all tables in the cluster:

Cluster setting	Description
`sql.stats.automatic_partial_collection.enabled`	Enable automatic collection of partial table statistics.
`sql.stats.automatic_partial_collection.min_stale_rows`	Minimum number of stale rows that triggers partial statistics collection.
`sql.stats.automatic_partial_collection.fraction_stale_rows`	Target fraction of stale rows that triggers partial statistics collection. If lower than the `0.2` threshold for full statistics, partial statistics refresh more frequently than full statistics.

Override cluster settings for specific tables using the following table storage parameters:

Table storage parameter	Description
`sql_stats_automatic_partial_collection_enabled`	Enable automatic collection of partial statistics on the table.
`sql_stats_automatic_partial_collection_min_stale_rows`	Minimum number of stale rows on the table that triggers partial statistics collection.
`sql_stats_automatic_partial_collection_fraction_stale_rows`	Target fraction of stale rows on the table that triggers partial statistics collection. If lower than the `0.2` threshold for full statistics, partial statistics refresh more frequently than full statistics.

Manually collect partial statistics

You can manually create partial statistics on:

The highest and lowest index values, when enable_create_stats_using_extremes session variable is enabled, using the USING EXTREMES clause: CREATE STATISTICS stats FROM table USING EXTREMES
Specific columns and values, using the WHERE clause: CREATE STATISTICS stats ON column FROM table WHERE condition

Toggle automatic statistics collection

Enable and disable automatic statistics collection for clusters

Automatic statistics collection is enabled by default. To disable automatic full and partial statistics collection, follow these steps:

Set the sql.stats.automatic_collection.enabled cluster setting to false:
```
> SET CLUSTER SETTING sql.stats.automatic_collection.enabled = false;
```
Use the SHOW STATISTICS statement to view automatically generated statistics.
Delete the automatically generated statistics:
```
> DELETE FROM system.table_statistics WHERE true;
```
Restart the nodes in your cluster to clear the statistics caches.

To learn how to manually generate statistics, see the CREATE STATISTICS examples.

Enable and disable automatic statistics collection for tables

Automatic statistics collection can be expensive for large tables, and you may prefer to defer collection until after data is finished loading or during off-peak hours. Tables that are frequently updated, including small tables, may trigger statistics collection more often, which can lead to unnecessary overhead and unpredictable query plan changes.

You can enable and disable automatic full and partial statistics collection for individual tables using the sql_stats_automatic_collection_enabled storage parameter. This table setting takes precedence over the sql.stats.automatic_collection.enabled cluster setting described in Enable and disable automatic statistics collection for clusters.

You can either configure this setting during table creation:

CREATE TABLE accounts (
    id INT PRIMARY KEY,
    balance DECIMAL)
WITH (sql_stats_automatic_collection_enabled = false);

Or by using ALTER TABLE ... SET:

CREATE TABLE accounts (
    id INT PRIMARY KEY,
    balance DECIMAL);

ALTER TABLE accounts
SET (sql_stats_automatic_collection_enabled = false);

The current table settings are shown in the WITH clause output of SHOW CREATE TABLE:

  table_name |                    create_statement
-------------+---------------------------------------------------------
  accounts   | CREATE TABLE public.accounts (
             |     id INT8 NOT NULL,
             |     balance DECIMAL NULL,
             |     CONSTRAINT accounts_pkey PRIMARY KEY (id ASC)
             | ) WITH (sql_stats_automatic_collection_enabled = false)
(1 row)

ALTER TABLE accounts RESET (sql_stats_automatic_collection_enabled) removes the table setting, in which case the sql.stats.automatic_collection.enabled cluster setting is in effect for the table.

Configure non-default statistics retention

By default, when CockroachDB refreshes statistics for a column, it deletes the previous statistics for the column (while leaving the most recent 4-5 historical statistics). When CockroachDB refreshes statistics, it also deletes the statistics for any "non-default" column sets, or columns for which statistics are not collected by default.

Historical statistics on non-default column sets should not be retained indefinitely, because they will not be refreshed automatically and could cause the optimizer to choose a suboptimal plan if they become stale. Such non-default historical statistics may exist because columns were deleted or removed from an index, and are therefore no longer part of a multi-column statistic.

CockroachDB deletes statistics on non-default columns according to the sql.stats.non_default_columns.min_retention_period cluster setting, which defaults to a 24-hour retention period.

Forecasted statistics

Forecasted statistics use a simple regression model that predicts how the statistics have changed since they were last collected. CockroachDB generates forecasted statistics when the following conditions are met:

There have been at least 3 historical statistics collections.
The historical statistics closely fit a linear pattern.

You can enable and disable forecasted statistics collection for individual tables using the sql_stats_forecasts_enabled table parameter. This table setting takes precedence over the sql.stats.forecasts.enabled cluster setting.

You can either configure this setting during table creation:

CREATE TABLE accounts (
    id INT PRIMARY KEY,
    balance DECIMAL)
WITH (sql_stats_forecasts_enabled = false);

Or by using ALTER TABLE ... SET:

CREATE TABLE accounts (
    id INT PRIMARY KEY,
    balance DECIMAL);

ALTER TABLE accounts
SET (sql_stats_forecasts_enabled = false);

The current table settings are shown in the WITH clause output of SHOW CREATE TABLE:

  table_name |                 create_statement
-------------+----------------------------------------------------
  accounts   | CREATE TABLE public.accounts (
             |     id INT8 NOT NULL,
             |     balance DECIMAL NULL,
             |     CONSTRAINT accounts_pkey PRIMARY KEY (id ASC)
             | ) WITH (sql_stats_forecasts_enabled = false)
(1 row)

ALTER TABLE accounts RESET (sql_stats_forecasts_enabled) removes the table setting, in which case the sql.stats.forecasts.enabled cluster setting is in effect for the table.

Control histogram collection

By default, the optimizer collects histograms for all index columns (specifically the first column in each index) during automatic statistics collection. If a single column statistic is explicitly requested using manual invocation of CREATE STATISTICS, a histogram will be collected, regardless of whether or not the column is part of an index.

Note:

CockroachDB does not support:

Histograms on ARRAY-typed columns. As a result, statistics created on ARRAY-typed columns do not include histograms.
Multi-column histograms.

If you are an advanced user and need to disable histogram collection for troubleshooting or performance tuning reasons, change the sql.stats.histogram_collection.enabled cluster setting by running SET CLUSTER SETTING as follows:

SET CLUSTER SETTING sql.stats.histogram_collection.enabled = false;

When sql.stats.histogram_collection.enabled is set to false, histograms are never collected, either as part of automatic statistics collection or by manually invoking CREATE STATISTICS.

Control whether the `avg_size` statistic is used to cost scans

The avg_size table statistic represents the average size of a table column. If a table does not have an average size statistic available for a column, it uses the default value of 4 bytes.

The optimizer uses avg_size to cost scans and relevant joins. Costing scans per row regardless of the size of the columns comprising the row doesn't account for time to read or transport a large number of bytes over the network. This can lead to undesirable plans when there are multiple options for scans or joins that read directly from tables.

We recommend that you allow the optimizer to consider column size when costing plans. If you are an advanced user and need to disable using avg_size for troubleshooting or performance tuning reasons, set the cost_scans_with_default_col_size session variable to true with SET cost_scans_with_default_col_size=true.

Control whether the optimizer creates a plan with a full scan

Even if you have secondary indexes, the optimizer may determine that a full table scan will be faster. For example, if you add a secondary index to a table with a large number of rows and find that a statement plan is not using the secondary index, then it is likely that performing a full table scan using the primary key is faster than doing a secondary index scan plus an index join.

You can disable statement plans that perform full table scans with the disallow_full_table_scans session variable.

At the cluster level, set disallow_full_table_scans for some or all users and roles. For example:
```
ALTER ROLE ALL SET disallow_full_table_scans = true;
```
At the application level, add disallow_full_table_scans to the connection string using the options parameter.

If you disable full scans, you can set the large_full_scan_rows session variable to specify the maximum table size allowed for a full scan. If no alternative plan is possible, the optimizer will return an error.

If you disable full scans, and you provide an index hint, the optimizer will try to avoid a full scan while also respecting the index hint. If this is not possible, the optimizer will return an error. If you do not provide an index hint and it is not possible to avoid a full scan, the optimizer will return an error, the full scan will be logged, and the sql.guardrails.full_scan_rejected.count metric will be updated.

Control whether the optimizer uses an index

You can specify whether an index is visible to the cost-based optimizer. By default, indexes are visible. If not visible, the index will not be used in queries unless it is specifically selected with an index hint. This allows you to create an index and check for query plan changes without affecting production queries. For an example, see Set an index to be not visible.

You can also set an index as partially visible within a range of 0.0 to 1.0, where 0.0 means not visible and 1.0 means visible. Any value between 0.0 and 1.0 means that an index is visible to the specified fraction of queries. For the purposes of index recommendations, partially visible indexes are treated as not visible. If a partially visible index can be used to improve a query plan, the optimizer will recommend making it fully visible. For an example, refer to Set an index as partially visible.

Note:

Indexes that are not visible are still used to enforce UNIQUE and FOREIGN KEY constraints. For more considerations, see Index visibility considerations.

You can instruct the optimizer to use indexes marked as not visible with the optimizer_use_not_visible_indexes session variable. By default, the variable is set to off.

Locality optimized search in multi-region clusters

In multi-region deployments with regional by row tables, the optimizer, in concert with the SQL engine, may perform a locality optimized search to attempt to avoid high-latency, cross-region communication between nodes. If there is a possibility that the results of a query all live in local rows, the database will first search for rows in the gateway node's region. The search only continues in remote regions if rows in the local region did not satisfy the query. Examples of queries that can use locality optimized search include unique key lookups and queries with LIMIT clauses.

Even if a value cannot be read locally, CockroachDB takes advantage of the fact that some of the other regions are much closer than others and thus can be queried with lower latency. Unless queries are limited to a single region, CockroachDB performs all lookups against the remote regions in parallel and returns the result once it is retrieved, without having to wait for each lookup to come back. This can lead to increased performance in multi-region deployments, since it means that results can be returned from wherever they are first found without waiting for all of the other lookups to return.

Note:

The asynchronous parallel lookup behavior does not occur if you disable vectorized execution.

Locality optimized search is supported for scans that are guaranteed to return 100,000 keys or fewer. This optimization allows the execution engine to avoid visiting remote regions if all requested keys are found in the local region, thus reducing the latency of the query.

Known limitations

Locality optimized search works only for queries selecting a limited number of records (up to 100,000 unique keys).
Locality optimized search does not work for queries that use partitioned unique indexes on virtual computed columns. A workaround for computed columns is to make the virtual computed column a stored computed column. Locality optimized search does not work for queries that use partitioned unique expression indexes. #68129

Control whether queries are limited to a single region

Although the optimizer prefers to read from rows in local regions when possible, by default, it does not guarantee that any query will not visit a remote region. This can occur if a query has no home region (for example, if it reads from different home regions in a regional by row table) or a query's home region differs from the gateway region.

For some latency-sensitive applications, cross-region latency may not be acceptable. In these cases, set the enforce_home_region session variable to on. This configures the optimizer to return one of the following error types, and in some cases a suggested resolution, if a query cannot be run entirely in a single region:

Query has no home region. The optimizer provides a hint on how to run the query in a single region.
Query is not running in its home region. The optimizer provides a hint containing the home region of the query. The application should disconnect and then reconnect with a connection string specifying a node in the query's home region.

Only tables with ZONE survivability can be scanned without error when this setting is enabled.

Query plan cache

CockroachDB caches some of the query plans generated by the optimizer. Caching query plans leads to faster query execution: rather than generating a new plan each time a query is executed, CockroachDB reuses a query plan that was previously generated.

The plan cache is used to cache the following types of statements across sessions:

Prepared statements.
Non-prepared statements using identical constant values.

The following statements can use the plan cache: SELECT, INSERT, UPDATE, UPSERT, and DELETE.

The plan cache is enabled by default. To disable it, execute the following statement:

SET CLUSTER SETTING sql.query_cache.enabled = false;

Query plan type

Two types of plans can be cached:

Custom query plans are generated for a given query structure and optimized for specific placeholder values, and are re-optimized on subsequent executions. By default, the optimizer uses custom plans. Custom plans are included in the plan cache.
Generic query plans are generated and optimized once without considering specific placeholder values, and are not regenerated on subsequent executions, unless the plan becomes stale due to schema changes or new table statistics and must be re-optimized. This approach eliminates most of the query latency attributed to planning. For example:
```
PREPARE p AS SELECT ...;
SET plan_cache_mode = force_generic_plan;
EXECUTE p; -- The query plan is generated and optimized.
EXECUTE p; -- The query plan is reused without re-optimizing.
EXECUTE p; -- The query plan is reused without re-optimizing.
```
Generic plans are not included in the plan cache, but are cached per session. This means that they must still be re-optimized each time a session prepares a statement using a generic plan. To reuse generic query plans for maximum performance, a prepared statement should be executed multiple times instead of prepared and executed once.

Tip:

Generic query plans will only benefit workloads that use prepared statements, which are issued via explicit PREPARE statements or by client libraries using the PostgreSQL extended wire protocol. Generic query plans are most beneficial for queries with high planning times, such as queries with many joins. For more information on reducing planning time for such queries, refer to Reduce planning time for queries with many joins.

To change the type of plan that is cached, use the plan_cache_mode session setting. This setting applies when a statement is executed, not when it is prepared. Statements are therefore not associated with a specific query plan type when they are prepared.

The following modes can be set:

auto (default): Automatically determine whether to use custom or generic query plans for prepared statements. Custom plans are used for the first five statement executions. Subsequent executions use a generic plan if its estimated cost is not significantly higher than the average cost of the preceding custom plans.
force_custom_plan: Force the use of custom plans.
force_generic_plan: Force the use of generic plans.

Note:

Generic plans are always used for non-prepared statements that do not contain placeholders or stable functions, regardless of the plan_cache_mode setting.

In some cases, generic query plans are less efficient than custom plans. For this reason, Cockroach Labs recommends setting plan_cache_mode to auto (the default mode) instead of force_generic_plan. Under the auto setting, the optimizer avoids bad generic plans by falling back to custom plans. For example:

Set plan_cache_mode to auto at the session level:

SET plan_cache_mode = auto

At the database level:

ALTER DATABASE db SET plan_cache_mode = auto;

At the role level:

ALTER ROLE db_user SET plan_cache_mode = auto;

To verify the plan type used by a query, check the EXPLAIN ANALYZE output for the query.

If a generic query plan is optimized for the current execution, the plan type in the output is generic, re-optimized.
If a generic query plan is reused for the current execution without performing optimization, the plan type in the output is generic, reused.
If a custom query plan is used for the current execution, the plan type in the output is custom.

Join reordering

For a query involving multiple joins, the cost-based optimizer will explore additional join orderings in an attempt to find the lowest-cost execution plan, which can lead to significantly better performance in some cases.

Because this process leads to an exponential increase in the number of possible execution plans for such queries, it's only used to reorder subtrees containing 8 or fewer joins by default.

To change this setting, which is controlled by the reorder_joins_limit session variable, run the following statement:

SET reorder_joins_limit = 0;

To disable this feature, set the variable to 0. You can configure the default reorder_joins_limit session setting with the cluster setting sql.defaults.reorder_joins_limit, which has a default value of 8.

Warning:

To avoid performance degradation, Cockroach Labs strongly recommends setting this value to a maximum of 8. If set too high, the cost of generating and costing execution plans can end up dominating the total execution time of the query.

For more information about selecting an optimal join ordering, see our blog post An Introduction to Join Ordering.

Note:

Use ALTER ROLE ALL SET {sessionvar} = {val} instead of the sql.defaults.* cluster settings. This allows you to set a default value for all users for any session variable that applies during login, making the sql.defaults.* cluster settings redundant.

Reduce planning time for queries with many joins

The cost-based optimizer explores multiple join orderings to find the lowest-cost plan. If there are many joins or join subtrees in the query, this can increase the number of execution plans the optimizer explores, and therefore the exploration and planning time. If the planning phase of a query takes a long time (on the order of multiple seconds or minutes) to plan, or the query plan involves many joins, consider the following alternatives to reduce the planning time:

To limit the size of the subtree that can be reordered, set the reorder_joins_limit session variable to a lower value, for example:
```
SET reorder_joins_limit = 2;
```
If the join ordering inherent in the query is acceptable, for the shortest planning time, you can set reorder_joins_limit to 0. This disables exploration of join orderings entirely.

By reducing reorder_joins_limit CockroachDB reduces the number of plans explored, so a less efficient plan may be chosen by the optimizer.

If one query has a slow planning time, you can avoid interfering with other query plans by setting reorder_joins_limit to the desired lower value before executing that query and resetting the session variable to the default after executing the query.
If setting and resetting the session variable is cumbersome or if there are multiple independent joins in the query where some may benefit from join reordering, you can use a join hint. If the join has a hint specifying the type of join to something other than the default INNER (i.e., INNER LOOKUP, MERGE, HASH, etc.), join reordering will be disabled and the plan will respect the join order inherent in the way the query is written. This works at the expression level and doesn't affect the entire query (for instance, if you have a union of two joins they are independent join expressions).

Join hints

To force the use of a specific join algorithm even if the optimizer determines that a different plan would have a lower cost, you can use a join hint. You specify a join hint as <join type> <join algorithm> JOIN. For example:

INNER HASH JOIN
OUTER MERGE JOIN
LEFT LOOKUP JOIN
CROSS HASH JOIN
INNER INVERTED JOIN
LEFT INVERTED JOIN
INNER STRAIGHT JOIN
LEFT STRAIGHT JOIN
RIGHT STRAIGHT JOIN

Note:

Due to SQL's implicit AS syntax, you cannot specify a join hint with only the join algorithm keyword (e.g., MERGE). For example, a MERGE JOIN b will be interpreted as having an implicit AS and be executed as a AS MERGE JOIN b, which is equivalent to a JOIN b. Because the resulting query might execute without returning any hint-related error (because it is valid SQL), it will seem like the join hint "worked", but actually it didn't affect which join algorithm was used. The correct syntax is a INNER MERGE JOIN b.

For a join hint example, see Use the right join type.

Tip:

You can rewrite inline hints to apply join hints without modifying the original query text.

Supported join algorithms

HASH: Forces a hash join; in other words, it disables merge and lookup joins. A hash join is always possible, even if there are no equality columns: CockroachDB treats a nested loop join without an index as a special case of a hash join, where the hash table effectively has one bucket.
MERGE: Forces a merge join, even if it requires re-sorting both sides of the join.
LOOKUP: Forces a lookup join into the right side; the right side must be a table with a suitable index. LOOKUP can be used only with INNER and LEFT joins.
INVERTED: Forces an inverted join into the right side; the right side must be a table with a suitable GIN index. INVERTED can be used only with INNER and LEFT joins.

Note:

You cannot use inverted joins on partial GIN indexes.
STRAIGHT: Forces a straight join in the order specified in the query, without hinting a join algorithm. This can potentially override a more efficient query plan. A straight join that turns into another join type behaves as follows: hash or cross joins build and probe the right side; lookup joins probe an index on the right side; inverted joins probe an inverted index on the right side; and merge joins behave as standard merge joins. The join type is independent of whether a INNER, LEFT, or RIGHT straight join is specified.

If it is not possible to use the algorithm specified in the hint, an error is signaled.

Note:

To make the optimizer prefer lookup joins to merge joins when performing foreign key checks, set the prefer_lookup_joins_for_fks session variable to on.

Additional considerations

This syntax is consistent with the SQL Server syntax for join hints, except that:
- SQL Server uses LOOP instead of LOOKUP.
- CockroachDB does not support LOOP and instead supports LOOKUP for the specific case of nested loop joins with an index.
When you specify a join hint, the two tables will not be reordered by the optimizer. The reordering behavior has the following characteristics, which can be affected by hints:
- Given a JOIN b, CockroachDB will not try to commute to b JOIN a. This means that you will need to pay attention to this ordering, which is especially important for lookup joins. Without a hint, a JOIN b might be executed as b INNER LOOKUP JOIN a using an index into a, whereas a INNER LOOKUP JOIN b requires an index into b.
- (a JOIN b) JOIN c might be changed to a JOIN (b JOIN c), but this does not happen if a JOIN b uses a hint; the hint forces that particular join to happen as written in the query.
You should reconsider hint usage with each new release of CockroachDB. Due to improvements in the optimizer, hints specified to work with an older version may cause decreased performance in a newer version.

Statement hints

Note:

This feature is in preview and subject to change. To share feedback and/or issues, contact Support.

Unlike inline hints, which apply hints for a specific statement execution, statement hints allow you to apply the same hints for every execution of a given statement fingerprint. You can add and maintain statement hints with built-in functions.

Statement hints are useful when you want to apply a hint without modifying the original statement, when you cannot modify application code, when you need to optimize queries from ORMs or third-party applications, or when you want to test different hints without changing queries in production.

There are two types of statement hints:

REWRITE INLINE HINTS: These hints allow you to apply inline hints (such as index hints and join hints) to a statement fingerprint, so that you can apply the hint for every matching query without modifying the statement.
SET VARIABLE: These hints allow you to override session variable values for every execution of a statement fingerprint without modifying application code.

You can create multiple statement hints for a single statement fingerprint. Statement hints can be scoped to a particular database, such that the hint is only applied when executing on that database. Statement hints can also be disabled (but are enabled by default upon creation).

Note the known limitations for statement hints.

Rewrite inline hints

To rewrite inline hints for a statement fingerprint, invoke the information_schema.crdb_rewrite_inline_hints() built-in function. You must have the REPAIRCLUSTER privilege.

Provide two SQL statement fingerprints to the function: a fingerprint identifying which statements to optimize, and a donor fingerprint with inline hints that must be applied to those statements. This function supports all inline hint types, and automatically applies them to statements that match the given fingerprint:

SELECT information_schema.crdb_rewrite_inline_hints(
  '{fingerprint}',
  '{donor fingerprint}'
);

Optionally, you can scope the hint to a specific database by providing a third argument:

SELECT information_schema.crdb_rewrite_inline_hints(
  '{fingerprint}',
  '{donor fingerprint}',
  '{database_name}'
);

When a database is specified, the hint only applies to executions of the matching statement on that database. If the database parameter is omitted, the hint applies to all databases.

Both fingerprints in the function invocation must have the same syntactic structure and use underscores (_) as placeholders for constants (for example, SELECT * FROM users WHERE city = _). If you use a regular SQL statement in either argument, it is automatically replaced by the matching fingerprint.

Tip:

To find the fingerprint for a statement, use the Statements page of the DB Console or EXPLAIN (FINGERPRINT) {statement}.

The information_schema.crdb_rewrite_inline_hints() function does the following:

Stores the statement hint in the system.statement_hints table.
Matches SQL statements with the fingerprint in the first argument.
Removes any existing inline hints from the matching SQL statements.
Rewrites the matching SQL statements with the inline hints from the donor fingerprint.

For example, the following call applies the users_city_idx index hint (provided an index was created on the city column) to SQL statements matching SELECT * FROM users WHERE city = _:

SELECT information_schema.crdb_rewrite_inline_hints(
  'SELECT * FROM users WHERE city = _',
  'SELECT * FROM users@users_city_idx WHERE city = _'
);

A unique ID is returned:

  information_schema.crdb_rewrite_inline_hints
------------------------------------------------
                           1143727380739620865

Afterward, any executed statement that matches the SELECT * FROM users WHERE city = _ fingerprint will first be rewritten with the users_city_idx index hint, regardless of the city value.

The following call applies a hint for the MERGE join algorithm:

SELECT information_schema.crdb_rewrite_inline_hints(
  'SELECT * FROM users AS u INNER JOIN rides AS r ON u.id = r.rider_id',
  'SELECT * FROM users AS u INNER MERGE JOIN rides AS r ON u.id = r.rider_id'
);

Fingerprints distinguish between queries with different inline hints. For example, the fingerprint SELECT * FROM users WHERE city = _ does not match the statement SELECT * FROM users@users_pkey WHERE city = 'new york' (which matches the fingerprint SELECT * FROM users@users_pkey WHERE city = _). As a result, you can use information_schema.crdb_rewrite_inline_hints() to remove inline hints:

SELECT information_schema.crdb_rewrite_inline_hints(
  'SELECT * FROM users@users_pkey WHERE city = _',
  'SELECT * FROM users WHERE city = _'
);

Tip:

If multiple REWRITE INLINE HINTS are enabled for the same statement fingerprint, only the most recently created hint is applied. To replace an existing hint, you can create a new one for the same fingerprint, which will take precedence.

Override session variables

You can override session variable values for every execution of a statement fingerprint without modifying application code. The override applies only to statements matching a specific fingerprint and does not apply to other session activity.

While there are no restrictions on what session variables can be hinted, some session variable hints will not be meaningful at the statement level. For example, changing the default_transaction_isolation session variable at the statement level will have no effect, since the transaction has already started and the variable will be reverted after execution finishes.

To override a session variable for a statement fingerprint, invoke the information_schema.crdb_set_session_variable_hint() built-in function. You must have the REPAIRCLUSTER privilege.

Provide the required three arguments:

SELECT information_schema.crdb_set_session_variable_hint(
  '{fingerprint}',
  '{variable_name}',
  '{variable_value}'
);

Optionally, you can scope the hint to a specific database by providing a fourth argument:

SELECT information_schema.crdb_set_session_variable_hint(
  '{fingerprint}',
  '{variable_name}',
  '{variable_value}',
  '{database_name}'
);

When a database is specified, the hint only applies to executions of the matching statement on that database. If the database parameter is omitted, the hint applies to all databases.

Transaction control statements (BEGIN, COMMIT, ROLLBACK, SAVEPOINT, etc.) cannot be targets for session variable hints. Attempting to create a hint for such statements will return an error.

Tip:

To find the fingerprint for a statement, use the Statements page of the DB Console or EXPLAIN (FINGERPRINT) {statement}.

The information_schema.crdb_set_session_variable_hint() function does the following:

Stores the hint in the system.statement_hints table.
Matches SQL statements with the specified fingerprint.
Applies the session variable override for the duration of the matching statement's execution.

For example, to limit join reordering for a specific query without modifying the application:

SELECT information_schema.crdb_set_session_variable_hint(
  'SELECT * FROM users AS u INNER JOIN rides AS r ON u.id = r.rider_id WHERE r.city = _',
  'reorder_joins_limit',
  '4'
);

A unique ID is returned:

  information_schema.crdb_set_session_variable_hint
-----------------------------------------------------
                           1143850492847104001

Afterward, any statement matching the fingerprint will execute with reorder_joins_limit set to 4, regardless of the session's or role's default value.

You can override multiple session variables for the same statement by calling the function multiple times with the same fingerprint:

SELECT information_schema.crdb_set_session_variable_hint(
  'SELECT * FROM large_table WHERE status = _',
  'disallow_full_table_scans',
  'false'
);

SELECT information_schema.crdb_set_session_variable_hint(
  'SELECT * FROM large_table WHERE status = _',
  'vectorize',
  'on'
);

Tip:

Multiple SET VARIABLE hints can be applied to the same statement fingerprint, as long as they override different session variables. If multiple SET VARIABLE hints are enabled for the same statement fingerprint and session variable, only the most recently created hint is applied. To replace an existing hint, you can create a new one for the same fingerprint and session variable, which will take precedence.

View statement hints

Use the SHOW STATEMENT HINTS statement to view statement hints for a specific fingerprint. You must have the VIEWCLUSTERMETADATA privilege.

SHOW STATEMENT HINTS FOR $$ SELECT * FROM users WHERE city = _ $$ WITH DETAILS;

        row_id        |            fingerprint             |      hint_type       | database | enabled |          created_at           |                              details
----------------------+------------------------------------+----------------------+----------+---------+-------------------------------+--------------------------------------------------------------------
  1143727380739620865 | SELECT * FROM users WHERE city = _ | REWRITE INLINE HINTS | NULL     | t       | 2026-01-22 18:58:16.952689+00 | {"donorSql": "SELECT * FROM users@users_city_idx WHERE city = _"}

You can also query the system.statement_hints table directly to view all fingerprints with statement hints:

SELECT row_id, fingerprint, hint_type, database, enabled, created_at FROM system.statement_hints;

        row_id        |                             fingerprint                             |      hint_type       | database | enabled |          created_at
----------------------+---------------------------------------------------------------------+----------------------+----------+---------+--------------------------------
  1143727380739620865 | SELECT * FROM users WHERE city = _                                  | REWRITE INLINE HINTS | NULL     | t       | 2026-01-22 18:58:16.952689+00
  1143727427097886721 | SELECT * FROM users AS u INNER JOIN rides AS r ON u.id = r.rider_id | REWRITE INLINE HINTS | NULL     | t       | 2026-01-22 18:58:31.100944+00
  1143727443278921729 | SELECT * FROM users@users_pkey WHERE city = _                       | REWRITE INLINE HINTS | NULL     | t       | 2026-01-22 18:58:36.039125+00
(3 rows)

To verify that these hints are being applied to your queries, use EXPLAIN or EXPLAIN ANALYZE. When hints from system.statement_hints are applied, the output includes a statement hints count field showing the number of hints applied. For example:

EXPLAIN SELECT * FROM users WHERE city = 'new york';

                                       info
----------------------------------------------------------------------------------
  distribution: local
  statement hints count: 1

  • index join
  │ estimated row count: 6
  │ table: users@users_pkey
  │
  └── • scan
        estimated row count: 6 (12% of the table; stats collected 4 minutes ago)
        table: users@users_city_idx
        spans: [/'new york' - /'new york']

Note:

The statement hints count field shows the number of system.statement_hints rows that were applied. A single row in system.statement_hints corresponds to a single REWRITE INLINE HINTS or SET VARIABLE hint.

Enable and disable statement hints

Statement hints are enabled by default when created. You can temporarily disable hints without deleting them using the information_schema.crdb_enable_statement_hints() built-in function. Disabled hints remain in the system.statement_hints table but are not applied during query execution. To enable or disable statement hints, you must have the REPAIRCLUSTER privilege.

The status of a hint is stored in the boolean enabled field of system.statement_hints.

Enable or disable a hint by its `row_id`

To enable or disable a specific hint, pass its row_id into information_schema.crdb_enable_statement_hints():

SELECT information_schema.crdb_enable_statement_hints({enabled}, {row_id});

For example:

SELECT information_schema.crdb_enable_statement_hints(false, 1143727380739620865);

Enable or disable all hints for a statement fingerprint

To enable or disable all hints associated with a specific statement fingerprint, pass the fingerprint into information_schema.crdb_enable_statement_hints().

SELECT information_schema.crdb_enable_statement_hints({enabled}, '{statement_fingerprint}');

For example:

SELECT information_schema.crdb_enable_statement_hints(true, 'SELECT * FROM users WHERE city = _');

When using a statement fingerprint to enable or disable statement hints, you can optionally scope the operation to hints on a specific database:

SELECT information_schema.crdb_enable_statement_hints({enabled}, '{statement_fingerprint}', '{database}');

For example:

SELECT information_schema.crdb_enable_statement_hints(false, 'SELECT * FROM users WHERE city = _', 'mydb');

Note:

When you omit the database parameter, the function affects all hints for the given fingerprint, regardless of their database value (including hints with database = NULL). When you specify a database, the function only affects hints with that exact database value and will not affect hints with database = NULL.

Delete statement hints

To remove hints, you can use the information_schema.crdb_delete_statement_hints() built-in function. This removes the hints from the system.statement_hints table. To delete statement hints, you must have the REPAIRCLUSTER privilege.

Delete a hint by its `row_id`

To delete a specific hint, pass its row_id into information_schema.crdb_delete_statement_hints():

SELECT information_schema.crdb_delete_statement_hints({row_id});

For example:

SELECT information_schema.crdb_delete_statement_hints(1143727380739620865);

Delete all hints for a statement fingerprint

To delete all hints associated with a specific statement fingerprint, pass the fingerprint into information_schema.crdb_delete_statement_hints().

SELECT information_schema.crdb_delete_statement_hints('{statement_fingerprint}');

For example:

SELECT information_schema.crdb_delete_statement_hints('SELECT * FROM users WHERE city = _');

When using a statement fingerprint to delete statement hints, you can optionally scope the operation to hints on a specific database:

SELECT information_schema.crdb_delete_statement_hints('{statement_fingerprint}', '{database}');

For example:

SELECT information_schema.crdb_delete_statement_hints('SELECT * FROM users WHERE city = _', 'mydb');

Note:

Zigzag joins

The optimizer may plan a zigzag join when there are at least two secondary indexes on the same table and the table is filtered in a query with at least two filters constraining different attributes to a constant. A zigzag join works by "zigzagging" back and forth between two indexes and returning only rows with matching primary keys within a specified range. For example:

CREATE TABLE abc (
  a INT,
  b INT,
  INDEX (a),
  INDEX (b)
);

EXPLAIN SELECT * FROM abc WHERE a = 10 AND b = 20;

               info
----------------------------------
  distribution: local
  vectorized: true

  • zigzag join
    pred: (a = 10) AND (b = 20)
    left table: abc@abc_a_idx
    left columns: (a)
    left fixed values: 1 column
    right table: abc@abc_b_idx
    right columns: (b)
    right fixed values: 1 column
(11 rows)

Prevent or force a zigzag join

The optimizer supports index hints to prevent or force a zigzag join. Apply the hints in the same way as other existing index hints.

To prevent the optimizer from planning a zigzag join for the specified table, use the hint NO_ZIGZAG_JOIN. For example:

SELECT * FROM abc@{NO_ZIGZAG_JOIN};

To force the optimizer to plan a zigzag join for the specified table, use the hint FORCE_ZIGZAG. For example:

SELECT * FROM abc@{FORCE_ZIGZAG};

Warning:

If you have an index named FORCE_ZIGZAG and use the hint table@{FORCE_ZIGZAG} it will no longer have the same behavior.

Inverted join examples

To run these examples, initialize a demo cluster with the MovR workload.

Start the MovR database on a 3-node CockroachDB demo cluster with a larger data set.

cockroach demo movr --num-histories 250000 --num-promo-codes 250000 --num-rides 125000 --num-users 12500 --num-vehicles 3750 --nodes 3

Create a GIN index on the vehicles table's ext column.

CREATE INVERTED INDEX idx_vehicle_details ON vehicles(ext);

Check the statement plan for a SELECT statement that uses an inner inverted join.

EXPLAIN SELECT * FROM vehicles@vehicles_pkey AS v2 INNER INVERTED JOIN vehicles@idx_vehicle_details AS v1 ON v1.ext @> v2.ext;

                    info
---------------------------------------------
  distribution: full
  vectorized: true

  • lookup join
  │ table: vehicles@vehicles_pkey
  │ equality: (city, id) = (city,id)
  │ equality cols are key
  │ pred: ext @> ext
  │
  └── • inverted join
      │ table: vehicles@idx_vehicle_details
      │
      └── • scan
            estimated row count: 3,750 (100% of the table; stats collected 1 hour ago)
            table: vehicles@vehicles_pkey
            spans: FULL SCAN
(16 rows)

You can omit the INNER INVERTED JOIN statement by putting v1.ext on the left side of a @> join condition in a WHERE clause and using an index hint for the GIN index.

EXPLAIN SELECT * FROM vehicles@idx_vehicle_details AS v1, vehicles AS v2 WHERE v1.ext @> v2.ext;

                                            info
--------------------------------------------------------------------------------------------
  distribution: full
  vectorized: true

  • lookup join
  │ table: vehicles@vehicles_pkey
  │ equality: (city, id) = (city,id)
  │ equality cols are key
  │ pred: ext @> ext
  │
  └── • inverted join
      │ table: vehicles@idx_vehicle_details
      │
      └── • scan
            estimated row count: 3,750 (100% of the table; stats collected 1 hour ago)
            table: vehicles@vehicles_pkey
            spans: FULL SCAN
(16 rows)

Use the LEFT INVERTED JOIN hint to perform a left inverted join.

EXPLAIN SELECT * FROM vehicles AS v2 LEFT INVERTED JOIN vehicles AS v1 ON v1.ext @> v2.ext;

                                           info
------------------------------------------------------------------------------------------
  distribution: full
  vectorized: true

  • lookup join (left outer)
  │ table: vehicles@vehicles_pkey
  │ equality: (city, id) = (city,id)
  │ equality cols are key
  │ pred: ext @> ext
  │
  └── • inverted join (left outer)
      │ table: vehicles@idx_vehicle_details
      │
      └── • scan
            estimated row count: 3,750 (100% of the table; stats collected 1 hour ago)
            table: vehicles@vehicles_pkey
            spans: FULL SCAN
(16 rows)

Known limitations

The automatic statistics refresher automatically checks whether it needs to refresh statistics for every table in the database upon startup of each node in the cluster. If statistics for a table have not been refreshed in a while, this will trigger collection of statistics for that table. If statistics have been refreshed recently, it will not force a refresh. As a result, the automatic statistics refresher does not necessarily perform a refresh of statistics after an upgrade. This could cause a problem, for example, if the upgrade moves from a version without histograms to a version with histograms. To refresh statistics manually, use CREATE STATISTICS. #54816
The following cluster settings do not immediately take effect, and instead only take effect when new statistics are collected for a table.
Although SHOW STATISTICS WITH FORECAST shows the settings taking effect immediately, they do not actually take effect until new statistics are collected (as can be verified with EXPLAIN).

As a workaround, disable and enable forecasting at the cluster or table level. This will invalidate the statistics cache and cause these settings to take effect immediately. #123852
Statement hints do not apply to statements within views. The workaround for REWRITE INLINE HINTS is to modify the inline hints directly in the body by replacing the view. There is no workaround for SET VARIABLE hints. #166782
Statement hints do not apply to statements within routines. The workaround for REWRITE INLINE HINTS is to modify the inline hints directly in the body by replacing the routine. There is no workaround for SET VARIABLE hints. #162627

Cost-Based Optimizer

How is cost calculated?

Table statistics

Full statistics

Control statistics refresh rate

Small versus large table examples

Partial statistics

Automatically collect partial statistics

Manually collect partial statistics

Toggle automatic statistics collection

Enable and disable automatic statistics collection for clusters

Enable and disable automatic statistics collection for tables

Configure non-default statistics retention

Forecasted statistics

Control histogram collection

Control whether the avg_size statistic is used to cost scans

Control whether the optimizer creates a plan with a full scan

Control whether the optimizer uses an index

Locality optimized search in multi-region clusters

Known limitations

Control whether queries are limited to a single region

Query plan cache

Query plan type

Join reordering

Reduce planning time for queries with many joins

Join hints

Supported join algorithms

Additional considerations

Statement hints

Rewrite inline hints

Override session variables

View statement hints

Enable and disable statement hints

Enable or disable a hint by its row_id

Enable or disable all hints for a statement fingerprint

Delete statement hints

Delete a hint by its row_id

Delete all hints for a statement fingerprint

Zigzag joins

Prevent or force a zigzag join

Inverted join examples

Known limitations

See also

Tell us about your experience

Select the problem area

Thank you for your feedback!

Explore More Documentation:

Control whether the `avg_size` statistic is used to cost scans

Enable or disable a hint by its `row_id`

Delete a hint by its `row_id`