In 2018, CockroachDB implemented a cost-based query optimizer from scratch, which has been steadily improved in each release. The query optimizer is the part of the system that understands the semantics of SQL queries and decides how to execute them; execution plans can vary wildly in terms of execution time, so choosing a good plan is important. In this post we go over some of the optimizer-related improvements in CockroachDB v20.1.
A large chunk of the changes we’ve made are around adding transformation rules to improve various types of queries and would be too extensive to discuss in detail. Throughout this release, we’ve added 15 new transformation rules (for a total of 224). There are however a few significant areas of effort that are worth going over: optimizer-driven foreign key checks; propagation of limit “hints”; WITH RECURSIVE clauses; and a new way to extract information about query planning and execution.
Optimizer-driven foreign key checks
CockroachDB has supported foreign keys since the first version, with foreign key actions added in v2.0. Much has changed in the SQL layer since then - most notably, we now have an optimizer with important capabilities, like locality awareness. However, the foreign key infrastructure remained largely unchanged and still used the original design from a pre-optimizer world. We have been working on reimplementing foreign keys so that they leverage the optimizer and its ever-increasing intelligence, and in v20.1 foreign key checks are planned by the optimizer by default. In this section we go over some of the details of how they worked before and how they work now.
A foreign key describes a relationship between two tables; in the context of a particular relationship, we call one the "child" table and the other the "parent" table. A foreign key restricts the possible values on a certain column in the child table to the set of values that exist in the parent table (again, on a certain column). A simple example is to consider a parent table of customers and a child table of orders. The orders table has a "customer" column; by declaring a foreign key constraint we are saying that orders can only refer to valid customers.
It is the responsibility of the database to enforce the constraint. This means that if we insert an order, we must check that the customer is valid (exists in the customer table). Conversely, when we delete a customer, we must make sure that there are no remaining orders for that customer which would become "orphaned".
This brings up the problem of how exactly do we perform these FK (foreign key) checks? Our original implementation solved this by requiring both the child and the parent table to have indexes on the foreign key column. Then the relationship is effectively a mapping between these two indexes, and the indexes would always be used to perform the FK checks.
The problem with this approach is that it is restrictive in terms of schema. The extra indexes entail overhead (in terms of storage, and extra work during insertions) and in some cases they don't help speed up any important aspect of the workload. In a multi-region deployment, the fact that a specific index is always used prevents effective use of the duplicate indexes pattern where each region has its own locally accessible index.
The new implementation removes the shackles and divorces the foreign key relationship from indexes. Now the optimizer builds the FK checks as generic "subqueries" and plans them together with relevant mutations. This allows the optimizer to use its usual logic in determining the best way to execute these checks (including locality-sensitive index selection).
Unfortunately, there is a catch. Foreign key actions can be used for more complex behaviors, like automatically cascading a deletion in the parent table into a deletion in the child table. We haven't yet reimplemented cascades, so we must fall back to the existing code implementing these actions (which still relies on the dual-index scheme). Because the existing code needs to continue to function, we weren't able to remove the dual-index requirement for v20.1. We are however working on this as a high priority item for v20.2. Despite this caveat, this work does present tangible benefits in v20.1 - most importantly locality-sensitive index selection for the foreign key checks.
The optimizer had various rules for the
Limit operator since its inception. In particular, we try to “push down” limits as much as possible (“push down” is a fancy way of saying that we try to apply limits as early as we can during execution). But the situations which allow pushing down the
Limit operator are pretty restrictive. This is because the operator is a “hard” limit, meaning that we can only use it when we absolutely know for sure that a certain number of rows is sufficient to get a correct query result.
There are many situations where we can guess (estimate) a limit, without hard guarantees. For example, consider a simple query like
SELECT * FROM t WHERE status=’done’ LIMIT 10. The execution plan for this query (assuming we don’t have an index on
status) is to perform a full table scan and retain only the first 10 rows that pass the filter. In the worst case we may need to read the entire table; but in practice, depending on how many rows pass the condition, we will probably need to scan many fewer rows. During execution of this plan, once 10 rows are produced, execution will stop early so we won’t actually scan the entire table. However, the optimizer didn't take this into consideration when estimating the cost of the plan, which means it might have chosen another plan that it mistakenly thinks is lower cost. In addition to helping choose the right plan, an estimate of how many rows we will actually need to scan can help make things more efficient: we can configure the internal “batch” size of the scan (we retrieve 10,000 keys by default!).
In v20.1 we added infrastructure for “soft” limits. A soft limit is a property inside the optimizer that is treated as a hint, defined roughly as “no promises, but execution will likely complete early, after this many rows are produced by this operator”. In the example above, we use table statistics to estimate the selectivity of the
status='done' condition and calculate the soft limit for the scan (which sizes its initial batch of keys according to the soft limit). If the statistics show that (for example) about half of the rows have
status='done', the soft limit would be 20.
An important aspect of this work is that the optimizer is now able to estimate the cost of expressions more accurately by taking into account early execution completion. The corrected costing can make a dramatic difference in more complicated cases, in particular when lookup joins are involved (an example is nicely described here).
Acknowledgement: Thanks to Celine O’Neil who did the soft limit work during her internship.
A new feature we implemented in this version is support for the
WITH RECURSIVE syntax which defines a recursive common-table expression. This improves our coverage of SQL in general and compatibility with PostgreSQL in particular.
It’s interesting to point out that the addition of recursive CTEs makes SQL a Turing-complete language: you can in principle use SQL to perform any computation (like a full-fledged programming language); see this presentation. In practice, this means that you can do cool stuff, like fractals.
Sierpinski Triangle generated in SQL using recursive CTEs. Source: Michael Malis
Statement diagnostics bundle
The 20.1 release also added a statement diagnostics bundle to the optimizer. CockroachDB provides a few ways to get insight about the plan for a query and its execution:
- the AdminUI shows logical query plans on the statements page;
- flavors of the EXPLAIN statement show query plans with varying levels of detail, as well as execution statistics;
- tracing the execution of queries provides a lot of low-level detail (but is not easily consumable by most users).
Unfortunately, there wasn’t a single action a user could take to extract all the relevant information when reporting an issue to us. In many cases this leads to a time-consuming back and forth between the user and our engineers when investigating issues.
The problem of unifying these pieces of information into something easily consumable by users is complex and we are working on attacking it. In the short term, we wanted to at least simplify the collection of information. The solution was, of course, to introduce yet another variant of EXPLAIN: in v20.1 EXPLAIN ANALYZE (DEBUG) is similar to
EXPLAIN ANALYZE but creates a statement diagnostics bundle: a .zip file that contains all the information from many “flavors” of
EXPLAIN, as well as complete table statistics and query trace data. The bundle can be downloaded from the AdminUI.
We also introduce a new mechanism for triggering the collection of this information: from the AdminUI statements page, users can select a statement fingerprint and activate diagnostics. The next time a query with this fingerprint runs, a diagnostics bundle will be collected automatically.
Building a cost-based SQL optimizer from scratch is no small endeavor, and we’ll continue to add more improvements and functionality in future releases. To get started with the new features in this release, download CockroachDB v20.1.
And as always, if building a SQL optimizer is your cup of tea, we’re hiring.