New in v19.2: CockroachDB supports column-oriented ("vectorized") query execution.
Many SQL databases execute query plans one row of table data at a time. Row-oriented execution models can offer good performance for online transaction processing (OLTP) queries, but suboptimal performance for online analytical processing (OLAP) queries. The CockroachDB vectorized execution engine dramatically improves performance over row-oriented execution by processing each component of a query plan on type-specific batches of column data.
Configuring vectorized execution
By default, vectorized execution is enabled in CockroachDB for all operations that are guaranteed to execute in memory, on tables with supported data types.
You can turn vectorized execution on or off for all operations in the current session with the
vectorize session variable. The following options are supported:
||Instructs CockroachDB to use the vectorized execution engine on operations that always execute in memory, without the need to spill intermediate results to disk.
||Turns on vectorized execution for all operations. We do not recommend using this option in production environments, as it can lead to memory issues.
See Known Limitations for more information.
||Turns off vectorized execution for all operations.|
For information about setting session variables, see
SET <session variable>.
To see if CockroachDB will use the vectorized execution engine for a query, run a simple
EXPLAIN statement on the query. If
true, the query will be executed with the vectorized engine. If it is
false, the row-oriented execution engine is used instead.
Setting the row threshold for vectorized execution
The efficiency of vectorized execution increases with the number of rows processed. If you are querying a table with a small number of rows, it is more efficient to use row-oriented execution.
By default, vectorized execution is enabled for queries on tables of 1000 rows or more. If the number of rows in a table falls below 1000, CockroachDB uses the row-oriented execution engine instead.
For performance tuning, you can change the minimum number of rows required to use the vectorized engine to execute a query plan in the current session with the
vectorize_row_count_threshold session variable. This variable is ignored if
How vectorized execution works
When you issue a query, the gateway node (i.e., the node from which you issue the query) parses the query and creates a physical plan for execution on each node that receives the plan. If vectorized execution is enabled, the physical plan is sent to each node to be executed by the vectorized execution engine.
To see a detailed view of the vectorized execution plan for a query, run the
EXPLAIN(VEC) statement on the query.
For information about vectorized execution in the context of the CockroachDB architecture, see Query Execution.
Vectorized execution is not as extensively tested as CockroachDB's existing row-oriented execution engine. In addition, some data types are not supported, and support for some operations is experimental.
Supported data types
Vectorized execution is supported for the following data types and their aliases:
CockroachDB uses the vectorized engine to execute queries on columns with supported data types, even if a column's parent table includes unused columns with unsupported data types.
Queries with constant
The vectorized execution engine does not support queries that contain constant
NULL arguments, with the exception of the
IS projection operators
IS NULL and
IS NOT NULL.
SELECT x IS NOT NULL FROM t is supported, but
SELECT x + NULL FROM t returns an
unable to vectorize execution plan error.
For more information, see the tracking issue.
Support for vectorized execution is experimental for the following memory-intensive operations:
- Global sorts.
- Window functions.
- Unordered aggregations.
- Hash joins.
- Merge joins on non-unique columns. Merge joins on columns that are guaranteed to have one row per value, also known as "key columns", can execute entirely in-memory.
These operations require memory buffering during execution, and are therefore prone to spilling intermediate execution results to disk. We do not recommend using vectorized execution in production environments for operations that could spill to disk, as these operations can buffer an unlimited number of rows before they can start producing output, causing memory issues.
You can configure a node's budget for in-memory query processing at node startup with the
--max-sql-memory flag. If a SQL query exceeds the memory budget, the node spills intermediate execution results to disk. The
--max-disk-temp-storage flag sets the maximum on-disk storage capacity.