Generalized Inverted Indexes

Warning:
This version of CockroachDB is no longer supported. For more details, see the Release Support Policy.

Generalized inverted indexes, or GIN indexes, store mappings from values within a container column (such as a JSONB document) to the row that holds that value. They are used to speed up containment searches, e.g., "show me all of the rows from this table which have a JSON column that contains the key-value pair {"location":"NYC"}". GIN indexes are commonly used in document retrieval systems.

CockroachDB stores the contents of the following data types in GIN indexes:

Tip:
For a hands-on demonstration of using a GIN index to improve query performance on a JSONB column, see the JSON tutorial.

How do GIN indexes work?

Standard indexes work well for searches based on prefixes of sorted data. However, data types like JSONB or arrays cannot be queried without a full table scan, since they do not adhere to ordinary value prefix comparison operators. JSONB in particular needs to be indexed in a more detailed way than what a standard index provides. This is where GIN indexes prove useful.

GIN indexes filter on components of tokenizable data. The JSONB data type is built on two structures that can be tokenized:

  • Objects - Collections of key-value pairs where each key-value pair is a token.
  • Arrays - Ordered lists of values where every value in the array is a token.

For example, take the following JSONB value in column person:

{
  "firstName": "John",
  "lastName": "Smith",
  "age": 25,
  "address": {
    "state": "NY",
    "postalCode": "10021"
  },
  "cars": [
    "Subaru",
    "Honda"
  ]
}

A GIN index for this object would have an entry per component, mapping it back to the original object:

"firstName": "John"
"lastName": "Smith"
"age": 25
"address": "state": "NY"
"address": "postalCode": "10021"
"cars" : "Subaru"
"cars" : "Honda"

This lets you search based on subcomponents.

Creation

You can use GIN indexes to improve the performance of queries using JSONB or ARRAY columns. You can create them:

  • At the same time as the table with the INVERTED INDEX clause of CREATE TABLE.
  • For existing tables with CREATE INVERTED INDEX.
  • Using the following PostgreSQL-compatible syntax:

    > CREATE INDEX <optional name> ON <table> USING GIN (<column>);
    

Selection

If a query contains a filter against an indexed JSONB or ARRAY column that uses any of the supported operators, the GIN index is added to the set of index candidates.

Because each query can use only a single index, CockroachDB selects the index it calculates will scan the fewest rows (i.e., the fastest). For more detail, check out our blog post Index Selection in CockroachDB.

To override CockroachDB's index selection, you can also force queries to use a specific index (also known as "index hinting").

Storage

CockroachDB stores indexes directly in your key-value store. You can find more information in our blog post Mapping Table Data to Key-Value Storage.

Locking

Tables are not locked during index creation thanks to CockroachDB's schema change procedure.

Performance

Indexes create a trade-off: they greatly improve the speed of queries, but slightly slow down writes (because new values have to be copied and sorted). The first index you create has the largest impact, but additional indexes only introduce marginal overhead.

Comparisons

JSONB

GIN indexes on JSONB columns support the following comparison operators:

  • "is contained by": <@
  • "contains": @>
  • "equals": =, but only when you've reached into the JSON document with the -> operator. For example:

    icon/buttons/copy
    > SELECT * FROM a WHERE j ->'foo' = '"1"';
    

    This is equivalent to using @>:

    icon/buttons/copy
    > SELECT * FROM a WHERE j @> '{"foo": "1"}';
    

If you require comparisons using <, <=, etc., you can create an index on a computed column using your JSON payload, and then create a regular index on that. So if you wanted to write a query where the value of "foo" is greater than three, you would:

  1. Create your table with a computed column:

    icon/buttons/copy
    > CREATE TABLE test (
        id INT,
        data JSONB,
        foo INT AS ((data->>'foo')::INT) STORED
        );
    
  2. Create an index on your computed column:

    icon/buttons/copy
    > CREATE INDEX test_idx ON test (foo);
    
  3. Execute your query with your comparison:

    icon/buttons/copy
    > SELECT * FROM test where foo > 3;
    

Arrays

GIN indexes on ARRAY columns support the following comparison operators:

  • "is contained by": <@
  • "contains": @>

Known limitations

CockroachDB does not support partitioning GIN indexes. For details, see tracking issue.

Example

Create a table with GIN index on a JSONB column

In this example, let's create a table with a JSONB column and a GIN index:

icon/buttons/copy
> CREATE TABLE users (
    profile_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    last_updated TIMESTAMP DEFAULT now(),
    user_profile JSONB,
    INVERTED INDEX user_details (user_profile)
  );

Then, insert a few rows of data:

icon/buttons/copy
> INSERT INTO users (user_profile) VALUES
    ('{"first_name": "Lola", "last_name": "Dog", "location": "NYC", "online" : true, "friends" : 547}'),
    ('{"first_name": "Ernie", "status": "Looking for treats", "location" : "Brooklyn"}'),
    ('{"first_name": "Carl", "last_name": "Kimball", "location": "NYC", "breed": "Boston Terrier"}'
  );
icon/buttons/copy
> SELECT *, jsonb_pretty(user_profile) FROM users;
+--------------------------------------+----------------------------------+--------------------------------------------------------------------------+------------------------------------+
|              profile_id              |           last_updated           |                               user_profile                               |            jsonb_pretty            |
+--------------------------------------+----------------------------------+--------------------------------------------------------------------------+------------------------------------+
| 81330a51-80b2-44aa-b793-1b8d84ba69c9 | 2018-03-13 18:26:24.521541+00:00 | {"breed": "Boston Terrier", "first_name": "Carl", "last_name":           | {                                  |
|                                      |                                  | "Kimball", "location": "NYC"}                                            |                                    |
|                                      |                                  |                                                                          |     "breed": "Boston Terrier",     |
|                                      |                                  |                                                                          |     "first_name": "Carl",          |
|                                      |                                  |                                                                          |     "last_name": "Kimball",        |
|                                      |                                  |                                                                          |     "location": "NYC"              |
|                                      |                                  |                                                                          | }                                  |
| 81c87adc-a49c-4bed-a59c-3ac417756d09 | 2018-03-13 18:26:24.521541+00:00 | {"first_name": "Ernie", "location": "Brooklyn", "status": "Looking for   | {                                  |
|                                      |                                  | treats"}                                                                 |                                    |
|                                      |                                  |                                                                          |     "first_name": "Ernie",         |
|                                      |                                  |                                                                          |     "location": "Brooklyn",        |
|                                      |                                  |                                                                          |     "status": "Looking for treats" |
|                                      |                                  |                                                                          | }                                  |
| ec0a4942-b0aa-4a04-80ae-591b3f57721e | 2018-03-13 18:26:24.521541+00:00 | {"first_name": "Lola", "friends": 547, "last_name": "Dog", "location":   | {                                  |
|                                      |                                  | "NYC", "online": true}                                                   |                                    |
|                                      |                                  |                                                                          |     "first_name": "Lola",          |
|                                      |                                  |                                                                          |     "friends": 547,                |
|                                      |                                  |                                                                          |     "last_name": "Dog",            |
|                                      |                                  |                                                                          |     "location": "NYC",             |
|                                      |                                  |                                                                          |     "online": true                 |
|                                      |                                  |                                                                          | }                                  |
+--------------------------------------+----------------------------------+--------------------------------------------------------------------------+------------------------------------+
(3 rows)

Now, run a query that filters on the JSONB column:

icon/buttons/copy
> SELECT * FROM users where user_profile @> '{"location":"NYC"}';
+--------------------------------------+----------------------------------+--------------------------------------------------------------------------+
|              profile_id              |           last_updated           |                               user_profile                               |
+--------------------------------------+----------------------------------+--------------------------------------------------------------------------+
| 81330a51-80b2-44aa-b793-1b8d84ba69c9 | 2018-03-13 18:26:24.521541+00:00 | {"breed": "Boston Terrier", "first_name": "Carl", "last_name":           |
|                                      |                                  | "Kimball", "location": "NYC"}                                            |
| ec0a4942-b0aa-4a04-80ae-591b3f57721e | 2018-03-13 18:26:24.521541+00:00 | {"first_name": "Lola", "friends": 547, "last_name": "Dog", "location":   |
|                                      |                                  | "NYC", "online": true}                                                   |
+--------------------------------------+----------------------------------+--------------------------------------------------------------------------+
(2 rows)

Add a GIN index to a table with an array column

In this example, let's create a table with an ARRAY column first, and add the GIN index later:

icon/buttons/copy
> CREATE TABLE students (
    student_id  UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    marks INT ARRAY
  );

Insert a few rows of data:

icon/buttons/copy
> INSERT INTO students (marks) VALUES
    (ARRAY[10,20,50]),
    (ARRAY[20,40,100]),
    (ARRAY[100,20,70]
  );
icon/buttons/copy
> SELECT * FROM students;
+--------------------------------------+--------------+
|              student_id              |    marks     |
+--------------------------------------+--------------+
| 11cdc77c-2f12-48d4-8bb4-ddee7c705e00 | {10,20,50}   |
|                                      |              |
| 2526c746-0b32-4f6b-a2b4-7ce6d411c1c2 | {20,40,100}  |
|                                      |              |
| eefdc32e-4485-45ca-9df1-80c0f42d73c0 | {100,20,70}  |
|                                      |              |
+--------------------------------------+--------------+
(3 rows)

Now, let’s add a GIN index to the table and run a query that filters on the ARRAY:

icon/buttons/copy
> CREATE INVERTED INDEX student_marks ON students (marks);
icon/buttons/copy
> SELECT * FROM students where marks @> ARRAY[100];
+--------------------------------------+--------------+
|              student_id              |    marks     |
+--------------------------------------+--------------+
| 2526c746-0b32-4f6b-a2b4-7ce6d411c1c2 | {20,40,100}  |
|                                      |              |
| eefdc32e-4485-45ca-9df1-80c0f42d73c0 | {100,20,70}  |
|                                      |              |
+--------------------------------------+--------------+
(2 rows)

See also

YesYes NoNo