If you are working with a table that must be indexed on sequential keys, you should use hash-sharded indexes. Hash-sharded indexes distribute sequential traffic uniformly across ranges, eliminating single-range hot spots and improving write performance on sequentially-keyed indexes at a small cost to read performance.
This is an experimental feature. The interface and output are subject to change.
How hash-sharded indexes work
CockroachDB automatically splits ranges of data in the key-value store based on the size of the range, and on the load streaming to the range. To split a range based on load, the system looks for a point in the range that evenly divides incoming traffic. If the range is indexed on a column of data that is sequential in nature (e.g., an ordered sequence, or a series of increasing, non-repeating
TIMESTAMPs), then all incoming writes to the range will be the last (or first) item in the index and appended to the end of the range. As a result, the system cannot find a point in the range that evenly divides the traffic, and the range cannot benefit from load-based splitting, creating a hot spot at the single range.
Hash-sharded indexes solve this problem by distributing sequential data across multiple nodes within your cluster, eliminating hotspots. The trade-off to this, however, is a small performance impact on reading sequential data or ranges of data, as it's not guaranteed that sequentially close values will be on the same node.
Hash-sharded indexes create a physical
STORED computed column, known as a shard column. CockroachDB uses this shard column, as opposed to the sequential column in the index, to control the distribution of values across the index. The shard column is hidden by default but can be seen with
When dropping a hash-sharded index, the shard column will also be dropped. This will require a rewrite of the table.
For details about the mechanics and performance improvements of hash-sharded indexes in CockroachDB, see our Hash Sharded Indexes Unlock Linear Scaling for Sequential Workloads blog post.
Create a hash-sharded index
To create a hash-sharded index, set the
experimental_enable_hash_sharded_indexes session variable to
on. Then, add the optional
USING HASH WITH BUCKET_COUNT = n_buckets clause to a
CREATE INDEX statement, to an
INDEX definition in a
CREATE TABLE statement, or to an
ALTER PRIMARY KEY statement. When this clause is used, CockroachDB creates
n_buckets computed columns, shards the index into
n_buckets shards, and then stores each index shard in the underlying key-value store with one of the computed column's hash as its prefix.
To change the bucket size of an existing hash-sharded primary key index, use an
ALTER PRIMARY KEY statement with a
USING HASH WITH BUCKET_COUNT = n_buckets clause that specifies the new bucket size and the existing primary key columns.
For an example of a hash-sharded index, see Create a hash-sharded secondary index.