As a developer, you are often faced with the question: why is my application slow? You may have already identified the database as the bottleneck (even this process can take some troubleshooting), and you then find yourself asking:
The query troubleshooting journey starts by first isolating the offending query that is issued by your application.
It is best practice to tag your application with a label or name when connecting to your database, enabling you to track the source of incoming queries. Most databases support this capability, and you can use this to your advantage to quickly filter and isolate the application-specific workloads containing problematic queries.
In CockroachDB, you can follow these instructions to set the application name when connecting to your database. You can then use the DB console to filter by this application name and retrieve the set of queries that may be experiencing performance issues and impacting your application. Starting with CockroachDB 21.1, you can also filter by statement type to pinpoint specific operations within your application.
The next step is to identify the specific slow queries issued from your application. Databases typically offer system views into running and previously ran queries, including runtime statistics, to help you find troublesome queries.
CockroachDB provides an out-of-the-box DB Console to enable you to monitor and optimize cluster performance. You can use the console to filter and sort for specific slow queries including their runtime statistics on the Statements page.
For a quick cheat, CockroachDB supports a slow query log where queries passing a custom duration threshold will be automatically captured and written to a separate view.
SET CLUSTER SETTING sql.log.slow_query.latency_threshold = '100ms';
Slow Query Log: "Timestamp":1626455175848119000, "EventType":"slow_query", "Statement":"‹SELECT l_returnflag, l_linestatus, sum(l_quantity) AS sum_qty, sum(l_extendedprice) AS sum_base_price, sum(l_extendedprice * (1 - l_discount)) AS sum_disc_price, sum((l_extendedprice * (1 - l_discount)) * (1 + l_tax)) AS sum_charge, avg(l_quantity) AS avg_qty, avg(l_extendedprice) AS avg_price, avg(l_discount) AS avg_disc, count(*) AS count_order FROM \"\".\"\".lineitem WHERE l_shipdate <= ('1998-12-01' - '95 days') GROUP BY l_returnflag, l_linestatus ORDER BY l_returnflag, l_linestatus›", "Tag":"SELECT", "User":"root", "ApplicationName":"$ cockroach sql", "ExecMode":"exec", "NumRows":4, "Age":7681.835, "FullTableScan":true, "TxnCounter":8
One of the most common query plan issues is when the optimizer chooses a plan that scans more data than required, leading to longer query times. This issue can be indicated by a FULL scan in CockroachDB within the query plan.
You can disallow queries with FULL scans using a cluster setting to preemptively fail and be alerted of this issue - this is particularly helpful when getting started on your application with an initial basic database schema design. You can also filter for queries with FULL scans in the DB console in conjunction with your application name. These capabilities allow you to locate queries with suboptimal plans and begin the tuning process.
SET CLUSTER SETTING sql.defaults.disallow_full_table_scans.enabled = 'true';
SELECT * FROM region;
ERROR: query `SELECT * FROM region` contains a full table/index scan which is explicitly disallowed
After identifying a problematic query, use
EXPLAIN is one of the most critical commands in your tooling arsenal when introspecting and tuning the query. Running
EXPLAIN will return the logical query plan that the optimizer has selected. In this example, we’ve run
EXPLAIN on a troublesome query identified in CockroachDB:
info -------------------------------------------------------------------------------------------------------- distribution: full vectorized: true • sort │ estimated row count: 6 │ order: +l_returnflag,+l_linestatus │ └── • group │ estimated row count: 6 │ group by: l_returnflag, l_linestatus │ └── • render │ estimated row count: 5,917,362 │ └── • filter │ estimated row count: 5,917,362 │ filter: l_shipdate <= '1998-08-28' │ └── • scan estimated row count: 6,002,293 (100% of the table; stats collected 39 minutes ago) table: lineitem@primary spans: FULL SCAN (22 rows) Time: 2ms total (execution 2ms / network 0ms)
In the logical query plan above, notice where it says
spans: FULL SCAN. As previously mentioned, a
FULL SCAN in the query plan is suboptimal especially when scanning a relatively large number of rows. This plan tree is a typical indicator that the query would benefit from a secondary index. Secondary indexes copies and sorts rows based on the indexed columns which allow you to efficiently access rows in the database. Secondary indexes should be carefully created as you pay an additional write penalty to maintain the index for better read performance. You can learn more about secondary indexes specifically for CockroachDB in our documentation.
If you’re using CockroachDB, the DB console also surfaces query plans graphically to help you quickly detect FULL scans:
To further understand how your query will be executed, most databases support running
EXPLAIN with query execution. This enables you to execute the query and capture the actual physical plan including runtime statistics. You can then use runtime statistics to understand where the bottleneck can be which is especially helpful for distributed SQL databases.
This is done in CockroachDB using
EXPLAIN ANALYZE, and in 21.1, CockroachDB released several enhancements to
EXPLAIN ANALYZE. You can now see the physical plan directly in the SQL CLI to understand how query execution is distributed across your cluster with runtime statistics such as contention, duration across each node, etc.
info -------------------------------------------------------------------------------------------- planning time: 353µs execution time: 3ms distribution: local vectorized: true rows read from KV: 52 (5.8 KiB) cumulative time spent in KV: 2ms maximum memory usage: 60 KiB network usage: 0 B (0 messages) cluster regions: us-east1 • update │ cluster nodes: n1 │ cluster regions: us-east1 │ actual row count: 1 │ table: users │ set: name │ auto commit │ └── • render │ cluster nodes: n1 │ cluster regions: us-east1 │ actual row count: 1 │ estimated row count: 0 │ └── • filter │ cluster nodes: n1 │ cluster regions: us-east1 │ actual row count: 1 │ estimated row count: 0 │ filter: id = '32a962b7-8440-4b31-97cd-a7d7757d6eac' │ └── • scan cluster nodes: n1 cluster regions: us-east1 actual row count: 52 KV time: 2ms KV contention time: 0µs KV rows read: 52 KV bytes read: 5.8 KiB estimated row count: 50 (100% of the table; stats collected 3 minutes ago) table: users@primary spans: FULL SCAN (42 rows) Time: 4ms total (execution 4ms / network 0ms)
In the physical query plan above, you can see that statistics are up to date. Information such as contention time is surfaced to help you troubleshoot slow performance due to transaction contention. We also see a FULL SCAN during the execution returning 52 records which can potentially be reduced with a secondary index.
Inspecting query plans surfaced through `EXPLAIN` can surface many other insights into designing and updating your schema. CockroachDB provides a comprehensive guide to using `EXPLAIN` when designing your database schema.
For general information on query performance with CockroachDB, sign up for our Cockroach University Course on Query Performance for Developers. If you have questions or feedback, we encourage you to join the #cockroach-university channel in our CockroachDB Community Slack.