How to use indexes for better workload performance

How to use indexes for better workload performance

Indexes are a crucial part of your database schema. They improve workload performance by helping the database locate data without having to scan every row of a table. Although it might be tempting to create an index for every column that your workload uses to filter data, it’s important to consider the performance tradeoffs of indexes. While indexes greatly improve the performance of read queries that use filters, they do come with a cost to write performance: data has to be written for all indexes present on the table.

CockroachDB has a group of features that can help you analyze the usage of your indexes, and provide recommendations for whether you should create new indexes, or drop or update existing ones. In this blog post we will show you how to use these features.

Use the Statements Page to avoid full scans

A query causes a full table scan when there is no way for the database to avoid reading all of the data for a table to satisfy the query. A full table scan causes the database to read every single byte on disk for a table, which is very expensive for large tables. For example, a query like SELECT * FROM t WHERE id = 10 will cause a full table scan if there’s no index on the “id” column of the table.

Full scans are often optimization opportunities and signals for a better indexing strategy.  Queries that filter for a subset of rows but are forced to scan the full table would benefit from a secondary index to allow better pruning of scanned rows, reducing IO, and increasing overall performance. An easy way to identify which statements are being executed with full table scans is clicking on the Statements tab, under the SQL Activity page and selecting the filter “Only show statements with full table scans”. 

Since CockroachDB aggregates the values on the Statement Overview page on the selected period, you might be seeing a statement fingerprint (which is a representation of the SQL statement made by replacing literal values, such as numbers and strings, with underscores) on the table whose latest execution might not have used a full scan, so the next step is to go to its Details page, by clicking the Statement in question. 

From there select the Explain Plans tab. There you will be able to see all plans used by that statement fingerprint ID during the selected period. One of the columns on the plans table is Last Execution Time and another is Full Scan, which will allow you to identify if the latest plan execution was with a full scan. 

How to use Indexes | Statements Page

You can also click on each plan to see its full query plan.

How to use indexes | Statement Fingerprint

Besides the statements with full scans, you can also look into statements that have a higher execution time.

Now that you have a few statements that you might want to create or improve the existing indexes, let’s find recommendations for them.

To find index recommendations, you can use the powerful EXPLAIN command, which returns information about the statement plan that the optimizer has selected for a query. You can also use EXPLAIN ANALYZE which will actually execute the SQL query and generate a statement plan that’s annotated with execution statistics.

How to use indexes | EXPLAIN command

Notice the information at the bottom of the EXPLAIN output: recommendations for creating and also replacing indexes. These recommendations are the database’s best guess at how you could improve an individual statement with different indexes.

A common mistake you can make is to create a secondary index without a STORING clause, which can result in an index join. Index joins can be slow since they require performing a lookup in the primary index for every row scanned in the secondary index. In this case, you might get the recommendation to replace the index with one containing the STORING clause. 

After you create or modify indexes, you can go back to the same details page for the selected statement and check the statistics again, where you will be able to observe updated performance information now that the new indexes are in place.

If the amount of data in a table changes significantly, new columns are introduced to the table, or new access patterns are added, it’s good practice to monitor workload performance with the features mentioned above, so you can make sure the workload is always performing optimally.

Use Index Statistics to find indexes with low or no usage

On the Database page, click on any database to see the list of tables it contains. That view will also display the number of indexes on each particular table. Clicking on the table name will bring you to the Table Details page, where you can see the list of all indexes and the stats for each one, including the name, total reads and last time it was used.

On the index stats table you can see if there are indexes with a low count of total reads, never used or only used a long time ago. Those indexes could be degrading your write performance, without significant positive impact on read performance, so they’re good candidates to consider dropping.

How to use indexes | index stats

If you made a recent change to indexes, you can click on Reset all index stats, which will clear all the data from the cluster, allowing you to focus your analysis on the latest metrics only.

You can also reset index stats with CLI with SELECT crdb_internal.reset_index_usage_stats().

Stay tuned for some updates coming this fall in v22.2 that will make it even easier to see index recommendations!

If there is a new feature that you want or you’re having a performance issue that you need our assistance, use one of our support resources to reach out to us, we’re happy to help!

To learn more about this topic, here are a few other resources:

Indexes

Create secondary index

Table statistics

Statement tuning with EXPLAIN

Best practices for secondary indexes

DB Console Query Tuning

About the author

Marylia Gutierrez

Marylia Gutierrez joined Cockroach Labs at the beginning of 2021, working as the Engineer Manager and developer for the SQL Observability team. She’s worked on bringing new features to help users better understand their workload, with information such as contention, indexes usage, explain plans, among many others. She obtained her Master’s in Human-Computer Interaction from Escola Politécnica da Universidade de São Paulo in Brazil. When she is not observing SQL at work, she is observing tv shows, games and her plants in her personal life.

github link linkedin link

Keep Reading

SQL Cheat Sheet – Quick and Easy SQL Command Reference Examples

This SQL cheat sheet is designed to make it quick and easy to find that SQL command that’s right on the tip of …

Read More
SQL Query Tuning with DB Console

Observability is how you understand the current state of your database: how it is behaving, plus any potentially …

Read More
What is a Foreign Key? (With SQL Examples)

Foreign keys put the “relational” in “relational database” – they help define the relationship between tables. They …

Read More
x
Developer Resources