The SQL layer of CockroachDB's architecture exposes a SQL API to developers and converts high-level SQL statements into low-level read and write requests to the underlying key-value store, which are passed to the transaction Layer.
It consists of the following sublayers:
- SQL API, which forms the user interface.
- Parser, which converts SQL text into an abstract syntax tree (AST).
- Cost-based optimizer, which converts the AST into an optimized logical query plan.
- Physical planner, which converts the logical query plan into a physical query plan for execution by one or more nodes in the cluster.
- SQL execution engine, which executes the physical plan by making read and write requests to the underlying key-value store.
If you haven't already, we recommend reading the Architecture Overview.
Because each node in a CockroachDB cluster behaves symmetrically, developers can send requests to any node (which means CockroachDB works well with load balancers). Whichever node receives the request acts as the "gateway node," which processes the request and responds to the client.
Requests to the cluster arrive as SQL statements, but data is ultimately written to and read from the storage layer as key-value (KV) pairs. To handle this, the SQL layer converts SQL statements into a plan of KV operations, which it then passes along to the transaction layer.
Interactions with other layers
In relationship to other layers in CockroachDB, the SQL layer:
- Receives requests from the outside world via its SQL API.
- Converts SQL statements into low-level KV operations, which it sends as requests to the transaction layer.
Developers experience data stored in CockroachDB as a relational structure comprised of rows and columns. Sets of rows and columns are further organized into tables. Collections of tables are then organized into databases. A CockroachDB cluster can contain many databases.
CockroachDB provides typical relational features like constraints (e.g., foreign keys). These features mean that application developers can trust that the database will ensure consistent structuring of the application's data; data validation doesn't need to be built into the application logic separately.
CockroachDB implements most of the ANSI SQL standard to manifest its relational structure. For a complete list of the SQL features CockroachDB supports, see SQL Feature Support.
PostgreSQL wire protocol
SQL queries reach your cluster through the PostgreSQL wire protocol. This makes connecting your application to the cluster simple by supporting many PostgreSQL-compatible drivers and ORMs.
SQL parser, planner, executor
When a node in a CockroachDB cluster receives a SQL request from a client, it parses the statement and creates an optimized logical query plan that is further translated into a physical query plan. Finally, it executes the physical plan.
SQL queries are parsed against our
yacc file (which describes our supported syntax), and the SQL version of each query is converted into an abstract syntax tree (AST).
During the logical planning phase, the AST is transformed into a query plan in the following steps:
The AST is transformed into a high-level logical query plan. During this transformation, CockroachDB also performs semantic analysis, which includes operations like:
- Checking whether the query is a valid statement in the SQL language.
- Resolving names, such as the names of tables or variables to their values.
- Eliminating unneeded intermediate computations, e.g., by replacing
0.6 + 0.4with
1.0. This is also known as constant folding.
- Finalizing which data types to use for intermediate results, e.g., when a query contains one or more subqueries.
The logical plan is simplified using a series of transformations that are always valid. For example,
a BETWEEN b AND cmay be converted to
a >= b AND a <= c.
The logical plan is optimized using a search algorithm that evaluates many possible ways to execute a query and selects an execution plan with the least costs.
The physical planning phase decides which nodes will participate in the execution of the query, based on range locality information. This is where CockroachDB decides to distribute a query to perform some computations close to where the data is stored.
More concretely, the physical planning phase transforms the optimized logical plan generated during logical planning into a directed acyclic graph (DAG) of physical SQL operators. These operators can be viewed by running the
Because the distribution layer presents the abstraction of a single key space, the SQL layer can perform read and write operations for any range on any node. This allows the SQL operators to behave identically whether planned in gateway or distributed mode.
The decision about whether to distribute a query across multiple nodes is made by a heuristic that estimates the quantity of data that would need to be sent over the network. Queries that only need a small number of rows are executed on the gateway node. Other queries are distributed across multiple nodes.
For example, when a query is distributed, the physical planning phase splits the scan operations from the logical plan into multiple physical TableReader operators, one for each node containing a range read by the scan. The remaining logical operations (which may perform filters, joins, and aggregations) are then scheduled on the same nodes as the TableReaders. This results in computations being performed as close to the physical data as possible.
Components of the physical plan are sent to one or more nodes for execution. On each node, CockroachDB spawns a logical processor to compute a part of the query. Logical processors inside or across nodes communicate with each other over a logical flow of data. The combined results of the query are sent back to the first node where the query was received, to be sent further to the SQL client.
Each processor uses an encoded form for the scalar values manipulated by the query. This is a binary form which is different from that used in SQL. So the values listed in the SQL query must be encoded, and the data communicated between logical processors, and read from disk, must be decoded before it is sent back to the SQL client.
Vectorized query execution
If vectorized execution is enabled, the physical plan is sent to nodes to be processed by the vectorized execution engine.
Upon receiving the physical plan, the vectorized engine reads batches of table data from disk and converts the data from row format to columnar format. These batches of column data are stored in memory so the engine can access them quickly during execution.
The vectorized engine uses specialized, precompiled functions that quickly iterate over the type-specific arrays of column data. The columnar output from the functions is stored in memory as the engine processes each column of data.
After processing all columns of data in the input buffer, the engine converts the columnar output back to row format, and then returns the processed rows to the SQL interface. After a batch of table data has been fully processed, the engine reads the following batch of table data for processing, until the query has been executed.
Though SQL queries are written in parsable strings, lower layers of CockroachDB deal primarily in bytes. This means at the SQL layer, in query execution, CockroachDB must convert row data from their SQL representation as strings into bytes, and convert bytes returned from lower layers into SQL data that can be passed back to the client.
It's also important––for indexed columns––that this byte encoding preserve the same sort order as the data type it represents. This is because of the way CockroachDB ultimately stores data in a sorted key-value map; storing bytes in the same order as the data it represents lets us efficiently scan KV data.
However, for non-indexed columns (e.g., non-
PRIMARY KEY columns), CockroachDB instead uses an encoding (known as "value encoding") which consumes less space but does not preserve ordering.
You can find more exhaustive detail in the Encoding Tech Note.
Because CockroachDB is a distributed database, we've developed a Distributed SQL (DistSQL) optimization tool for some queries, which can dramatically speed up queries that involve many ranges. Though DistSQL's architecture is worthy of its own documentation, this cursory explanation can provide some insight into how it works.
In non-distributed queries, the coordinating node receives all of the rows that match its query, and then performs any computations on the entire data set.
However, for DistSQL-compatible queries, each node does computations on the rows it contains, and then sends the results (instead of the entire rows) to the coordinating node. The coordinating node then aggregates the results from each node, and finally returns a single response to the client.
This dramatically reduces the amount of data brought to the coordinating node, and leverages the well-proven concept of parallel computing, ultimately reducing the time it takes for complex queries to complete. In addition, this processes data on the node that already stores it, which lets CockroachDB handle row-sets that are larger than an individual node's storage.
To run SQL statements in a distributed fashion, we introduce a couple of concepts:
- Logical plan: Similar to the AST/
planNodetree described above, it represents the abstract (non-distributed) data flow through computation stages.
- Physical plan: A physical plan is conceptually a mapping of the logical plan nodes to physical machines running
cockroach. Logical plan nodes are replicated and specialized depending on the cluster topology. Like
planNodesabove, these components of the physical plan are scheduled and run on the cluster.
You can find much greater detail in the DistSQL RFC.
CockroachDB performs schema changes, such as the addition of columns or secondary indexes, using a protocol that allows tables to remain online (i.e., able to serve reads and writes) during the schema change. This protocol allows different nodes in the cluster to asynchronously transition to a new table schema at different times.
The schema change protocol decomposes each schema change into a sequence of incremental changes that will achieve the desired effect.
For example, the addition of a secondary index requires two intermediate schema versions between the start and end versions to ensure that the index is being updated on writes across the entire cluster before it becomes available for reads. To ensure that the database will remain in a consistent state throughout the schema change, we enforce the invariant that there are at most two successive versions of this schema used in the cluster at all times.
This approach is based on the paper Online, Asynchronous Schema Change in F1.
For more information, including examples and limitations, see Online Schema Changes.
Technical interactions with other layers
SQL and transaction layer
KV operations from executed
planNodes are sent to the transaction layer.
Learn how CockroachDB handles concurrent requests in the transaction layer.