3 basic rules for choosing indexes

There are a few basic rules to keep in mind when choosing indexes for a database. A good index should have these three properties:

Usefulness: Speed up the execution of some queries (or enforce a constraint)
Clustering: Keep records that are likely to be accessed together near each other
Scattering: Keep records that are unlikely to be accessed together far apart

Usefulness

The first rule is simply a reminder that indexes aren’t free, and if it’s not helping the application somehow you’re better off without it. Each additional index makes all writes to the table slower, but they can make some reads much faster.

The best case for write performance is a table with a primary key but no secondary indexes. The first secondary index has a high cost because it means any insert to the table becomes a distributed transaction, but the cost of each additional secondary index is smaller. Do not let this scare you away from secondary indexes, however. They’re not free, but they have such a transformative impact on read performance that it’s nearly always worthwhile to ensure that every query you run has a suitable index.

Note that in CockroachDB it’s never helpful to remove the primary key. The database will create a hidden primary key for any table that doesn’t have one, and this hidden key cannot be used by any queries, so it’s always better to have a real primary key that you choose.

Clustering

The second rule, clustering, is a little more subtle. When an application needs to load multiple records at once ( for example, due to a JOIN or the use of the IN operator), it’s best for performance if those records are near each other.

Originally, this advice came about to minimize the number of seeks that must be performed on spinning HDDs. In a distributed database like CockroachDB, the same guideline serves to minimize the number of network operations to access data remotely. For instance, in a social network news feed, most page views only need data from the current day. Organizing the data by time may provide the best data clustering and cache efficiency (or maybe not, as we’ll see below).

CockroachDB offers a few SQL extensions that can further improve data clustering, including storing indexes.

Scattering

The third rule, scattering, is in some sense the converse of the second: when similar records are near each other, different records naturally have to go somewhere else.

However, it’s not always the case that improving clustering also increases scattering. In the social network news feed example, organizing records by time maximizes clustering, but it also creates a hotspot because all the posts happening right now are trying to write to the same place. This is a severe limitation on the application’s ability to scale – if a hotspot like this exists, it’s not necessarily possible to serve more users by adding more nodes to a CockroachDB cluster. In practice, clustering and scattering are more often in tension with each other than they are mutually reinforcing.

Options for selecting unique IDs

If a table doesn’t have a natural primary key, you’ll probably want to synthesize some sort of unique identifier for each record. For this, you have a few options:

Sequences

The simplest (but often the least performant) way to generate unique IDs is to start with 1 and count up. This is what you get from the SEQUENCE feature in CockroachDB and PostgreSQL, or the AUTO_INCREMENT keyword in MySQL. This produces IDs that roughly correspond to insertion order, and many users like the fact that the IDs produced are small integers (on the other hand, sequential IDs can give away the details of how many users/photos/etc your application has).

Unfortunately, sequential IDs are not ideal for distributed databases. The sequence becomes a bottleneck that all insertions must wait for, so throughput is limited by the nodes responsible for the sequence counter, and adding more nodes to the cluster won’t necessarily improve performance.

Note: Sequences are non-transactional

Why do IDs only roughly correspond to insertion order? (This is true in most databases, not just CockroachDB) Because when a transaction inserts a record, it gets an ID when it reaches the INSERT statement, but that record doesn’t become visible to other readers until the transaction COMMITs. That means that it’s possible to see records appear to be out of order, like this:

Transaction A inserts a record which is assigned ID 1
Transaction B inserts a record which is assigned ID 2
Transaction B commits
Transaction C reads from the table and sees record 2
Transaction A commits
Transaction C reads from the table and sees records 1 and 2

If your application requires IDs to strictly correspond to insertion order, you can do something like INSERT INTO tbl (id, …) VALUES ((SELECT max(id)+1 FROM tbl), …). However, this has a very high performance cost since it makes all insert transactions wait for their turn to insert the next ID, so only do this if your application really requires strict ID ordering.

Using change data capture (CDC) can help avoid the requirement for strict ID ordering in many applications, letting you use higher-performance ID strategies.

Timestamps

Timestamps are roughly ordered and nearly unique (and collisions can be handled by adding random bits or using UNIQUE constraints in the database). They’re more scalable than sequences since they don’t require a single-key bottleneck to maintain the sequence counter, and therefore they’re a good fit for distributed databases.

CockroachDB uses timestamps plus random bits as the default for the SERIAL column type and the unique_rowid() function. When insertion-order clustering is more important than scattering for your application, we recommend timestamp-based IDs.

Randomness

The third major option for ID generation is to use large random numbers, usually via the 128-bit UUID type. As you might expect, random IDs maximize scattering but don’t give any clustering. Random IDs usually give the best raw INSERT performance in CockroachDB because they allow all the nodes in the cluster to be fully utilized, although the lack of clustering can hurt the performance of some queries.

An alternative approach: Hash-sharded index keys

Even though timestamps avoid the worst bottlenecks of sequential IDs, they still tend to create a bottleneck because all insertions are happening at around the current time, so only a small number of nodes are able to participate in handling these writes. If you need more write throughput than timestamp IDs offer but more clustering than random UUIDs, CockroachDB has a unique feature called hash-sharded indexes that’s designed for precisely those requirements.

How hash-sharded indexes work

CockroachDB automatically splits ranges of data in the key-value store based on the size of the range, and on the load streaming to the range. To split a range based on load, the system looks for a point in the range that evenly divides incoming traffic. If the range is indexed on a column of data that is sequential in nature (e.g., an ordered sequence, or a series of increasing, non-repeating TIMESTAMPs), then all incoming writes to the range will be the last (or first) item in the index and appended to the end of the range. As a result, the system cannot find a point in the range that evenly divides the traffic, and the range cannot benefit from load-based splitting, creating a hotspot at the single range.

Hash-sharded indexes distribute sequential traffic uniformly across ranges, eliminating single-range hot spots and improving write performance on sequentially-keyed indexes at a small cost to read performance. More details about how they work are available here.

How to create a hash-sharded index

To create a hash-sharded index, set the experimental_enable_hash_sharded_indexes session variable to on. Then, add the optional USING HASH WITH BUCKET_COUNT = n_buckets clause to a CREATE INDEX statement, to an INDEX definition in a CREATE TABLE statement, or to an ALTER PRIMARY KEY statement. When this clause is used, CockroachDB creates n_buckets computed columns, shards the index into n_buckets shards, and then stores each index shard in the underlying key-value store with one of the computed column’s hash as its prefix.

To change the bucket size of an existing hash-sharded primary key index, use an ALTER PRIMARY KEY statement with a USING HASH WITH BUCKET_COUNT = n_buckets clause that specifies the new bucket size and the existing primary key columns.

For example:

CREATE TABLE events (
    product_id INT8,
    owner UUID,
    serial_number VARCHAR,
    event_id UUID,
    ts TIMESTAMP,
    data JSONB,
    PRIMARY KEY (product_id, owner, serial_number, ts, event_id)
);

SET experimental_enable_hash_sharded_indexes=on;

CREATE INDEX ON events(ts) USING HASH WITH BUCKET_COUNT=8;

To confirm it worked:

SHOW INDEX FROM events;

Learn more about hash-sharded indexes in CockroachDB.