Expression Indexes

New in v21.2

An expression index is an index created by applying an expression to a column. For example, to facilitate fast, case insensitive lookups of user names you could create an index by applying the function lower to the name column: CREATE INDEX users_name_idx ON users (lower(name)). The value of the expression is stored only in the expression index, not in the primary family index.

Both standard indexes and inverted indexes support expressions. You can use expressions in unique indexes and partial indexes.

You can reference multiple columns in an expression index.

Create an expression index

To create an expression index, use the syntax:

icon/buttons/copy
CREATE INDEX index_name ON table_name (expression(column_name));

View index expression

To view the expression used to generate the index, run SHOW CREATE TABLE:

icon/buttons/copy
> SHOW CREATE TABLE users;
 table_name |                                  create_statement
-------------+--------------------------------------------------------------------------------------
  users      | CREATE TABLE public.users (
...
             |     INDEX users_name_idx (lower(name:::STRING) ASC),
...
             | )
(1 row)

Examples

Simple examples

Suppose you have a table with the following columns:

icon/buttons/copy

CREATE TABLE t (i INT, b BOOL, s STRING, j JSON);

The following examples illustrate how to create various types of expression indexes.

A partial, multi-column index, where one column is defined with an expression:

icon/buttons/copy

CREATE INDEX ON t (lower(s), b) WHERE i > 0;

A unique, partial, multi-column index, where one column is defined with an expression:

icon/buttons/copy

CREATE UNIQUE INDEX ON t (lower(s), b) WHERE i > 0;

An inverted, partial, multi-column index, where one column is defined with an expression:

icon/buttons/copy

CREATE INVERTED INDEX ON t (lower(s), i, j) WHERE b;

Use an expression to index a field in a JSONB column

You can use an expression in an index definition to index a field in a JSON column. You can also use an expression to create an inverted index on a subset of the JSON column.

Normally an index is used only if the cost of using the index is less than the cost of a full table scan. To disable that optimization, turn off statistics collection:

> SET CLUSTER SETTING sql.stats.automatic_collection.enabled = false;

Create a table of three users with a JSON object in the user_profile column:

icon/buttons/copy
> CREATE TABLE users (
  profile_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  last_updated TIMESTAMP DEFAULT now(),
  user_profile JSONB
);

> INSERT INTO users (user_profile) VALUES
  ('{"id": "d78236", "firstName": "Arthur", "lastName": "Read", "birthdate": "2010-01-25", "school": "PVPHS", "credits": 120, "sports": ["none"], "clubs": ["Robotics"]}'),
  ('{"id": "f98112", "firstName": "Buster", "lastName": "Bunny", "birthdate": "2011-11-07",  "school": "THS", "credits": 67, "sports": ["Gymnastics"], "clubs": ["Theater"]}'),
  ('{"id": "t63512", "firstName": "Jane", "lastName": "Narayan", "birthdate": "2012-12-12", "school" : "Brooklyn Tech", "credits": 98, "sports": ["Track and Field"], "clubs": ["Chess"]}');

When you perform a query that filters on the user_profile->'birthdate' column:

icon/buttons/copy
> EXPLAIN SELECT jsonb_pretty(user_profile) FROM users WHERE user_profile->>'birthdate' = '2011-11-07';

You can see that a full scan is performed:

                                           info
-------------------------------------------------------------------------------------------
  distribution: full
  vectorized: true

  • render
  │ estimated row count: 0
  │
  └── • filter
      │ estimated row count: 0
      │ filter: (user_profile->'birthdate') = '2011-11-07'
      │
      └── • index join
          │ estimated row count: 3
          │ table: users@primary
          │
          └── • scan
                missing stats
                table: users@primary
                spans: FULL SCAN

To limit the number of rows scanned, create an expression index on the birthdate field:

icon/buttons/copy
> CREATE INDEX timestamp_idx ON users (parse_timestamp(user_profile->>'birthdate'));

When you filter on the expression parse_timestamp(user_profile->'birthdate'), only the row matching the filter is scanned:

icon/buttons/copy
> EXPLAIN SELECT jsonb_pretty(user_profile) FROM users WHERE parse_timestamp(user_profile->>'birthdate') = '2011-11-07';
                                        info
-------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • render
  │ estimated row count: 1
  │
  └── • index join
      │ estimated row count: 1
      │ table: users@primary
      │
      └── • scan
            missing stats
            table: users@timestamp_idx
            spans: [/'2011-11-07 00:00:00' - /'2011-11-07 00:00:00']

As shown in this example, for an expression index to be used to service a query, the query must constrain the same exact expression in its filter.

Known limitations

Expression indexes have the following limitations:

  • The expression cannot reference columns outside the index's table.
  • Functional expression output must be determined by the input arguments. For example, you can't use the function now() to create an index because its output depends on more than just the function arguments.
  • CockroachDB does not allow expression indexes to reference computed columns.

    Tracking GitHub Issue

  • CockroachDB does not support expressions as ON CONFLICT targets. This means that unique expression indexes cannot be selected as arbiters for INSERT .. ON CONFLICT statements. For example:

icon/buttons/copy
CREATE TABLE t (a INT, b INT, UNIQUE INDEX ((a + b)));
CREATE TABLE
icon/buttons/copy
INSERT INTO t VALUES (1, 2) ON CONFLICT ((a + b)) DO NOTHING;
invalid syntax: statement ignored: at or near "(": syntax error
SQLSTATE: 42601
DETAIL: source SQL:
INSERT INTO t VALUES (1, 2) ON CONFLICT ((a + b)) DO NOTHING
                                    ^
HINT: try \h INSERT
icon/buttons/copy
INSERT INTO t VALUES (1, 2) ON CONFLICT ((a + b)) DO UPDATE SET a = 10;
invalid syntax: statement ignored: at or near "(": syntax error
SQLSTATE: 42601
DETAIL: source SQL:
INSERT INTO t VALUES (1, 2) ON CONFLICT ((a + b)) DO UPDATE SET a = 10
                                    ^
HINT: try \h INSERT

Tracking GitHub Issue

See also

YesYes NoNo