Full-Text Search

On this page Carat arrow pointing down

New in v23.1: A full-text search is used to perform natural-language searches on documents such as articles, websites, or other written formats.

This page describes how to perform full-text searches using the provided built-in functions.

Note:

Some PostgreSQL syntax and features are unsupported. For details, see Unsupported features.

How does full-text search work?

In the PostgreSQL terminology, a document is a natural-language text converted to a data type that is searchable using specially formatted queries. A document is typically stored within a single database row or concatenated from multiple fields.

A full-text search has the following advantages over pattern matching with LIKE and ILIKE:

  • A full-text search can specify a text search configuration that enables language-specific searches.
  • The results of a full-text search can be ranked.
  • A full-text search can be accelerated using a full-text index.
  • LIKE and ILIKE are only fast for prefix searches or when indexed with a trigram index.
Tip:

Depending on your use case, you may prefer to use trigram indexes to do fuzzy string matching and pattern matching. For more information about use cases for trigram indexes that could make having full-text search unnecessary, see the 2022 blog post Use cases for trigram indexes (when not to use Full Text Search).

Process a document

To make a document searchable, convert it to the TSVECTOR data type. A TSVECTOR value consists of individual lexemes, which are normalized strings used for text matching. Each lexeme also includes a list of integer positions that indicate where the lexeme existed in the original document.

The to_tsvector() built-in function converts a string input into a TSVECTOR value:

icon/buttons/copy
SELECT to_tsvector('How do trees get on the internet?');
           to_tsvector
---------------------------------
  'get':4 'internet':7 'tree':3

This TSVECTOR consists of the lexemes get, internet, and tree. Normalization removes the following from the input:

  • Derivatives of words, which are reduced using a stemming algorithm. In this example, "trees" is normalized to tree.
  • Stop words. These are words that are considered not useful for indexing and searching, based on the text search configuration. This example does not specify a configuration, and english is used by default. "How", "do", "on", and "the" are identified as stop words.
  • Punctuation and capitalization.

In the preceding output, the integers indicate that get is in the fourth position, internet is in the seventh position, and tree is in the third position in the input.

Form a query

A full-text search attempts to match a query to a document. A full-text search query has the TSQUERY data type. Like TSVECTOR, a TSQUERY value consists of individual lexemes, which are normalized strings used for text matching. Lexemes in a TSQUERY are separated with any combination of & (AND), | (OR), <-> (FOLLOWED BY), or ! (NOT) operators.

  • The to_tsquery() built-in function normalizes a TSQUERY input. The input must also be formatted as a TSQUERY, or the statement will error.

    icon/buttons/copy
    SELECT to_tsquery('How & do & trees & get & on & the & internet?');
    
              to_tsquery
    -------------------------------
      'tree' & 'get' & 'internet'
    
  • The plainto_tsquery() built-in function converts a string input into a TSQUERY value, and separates the lexemes with & (AND):

    icon/buttons/copy
    SELECT plainto_tsquery('How do trees get on the internet?');
    
            plainto_tsquery
    -------------------------------
      'tree' & 'get' & 'internet'
    
  • The phraseto_tsquery() built-in function converts a string input into a TSQUERY value, and separates the lexemes with <-> (FOLLOWED BY):

    SELECT phraseto_tsquery('How do trees get on the internet?');
    
             phraseto_tsquery
    -----------------------------------
      'tree' <-> 'get' <3> 'internet'
    

    In the preceding output, <-> (equivalent to <1>) indicates that get must follow tree in a matching TSVECTOR. <3> further indicates that get and internet must be separated by two lexemes in a matching TSVECTOR. This resulted from converting the stop words "on" and "the" in the input.

    To match this query, a document must therefore contain phrases such as "get tree" and "get {word} {word} internet".

Queries and documents are matched using the @@ comparison operator. For usage examples, see Match queries to documents.

Rank search results

You can rank the results of a full-text search.

The ts_rank() built-in function outputs a search rank based on the frequency of matching lexemes. In the following example, two lexemes match:

SELECT ts_rank(to_tsvector('How do trees get on the internet?'), plainto_tsquery('how to get internet'));
icon/buttons/copy
   ts_rank
--------------
  0.09735848

In this example, three lexemes match, resulting in a higher rank:

SELECT ts_rank(to_tsvector('How do trees get on the internet?'), plainto_tsquery('wow, do trees get internet?'));
icon/buttons/copy
   ts_rank
--------------
  0.26426345
Note:

Because a rank must be calculated for each matching document, ranking a full-text search can incur a performance overhead if there are many matching documents.

For more information about using ts_rank(), see the PostgreSQL documentation.

Comparisons

Full-text searches support the following comparison operator:

  • matching: @@. This operator is set between a TSQUERY and TSVECTOR, and returns true if the lexemes match. The TSQUERY and TSVECTOR can be specified in any order.

For usage examples, see Match queries to documents.

Full-text indexes

Note:

You can perform full-text searches without a full-text index. However, an index will drastically improve search performance when searching a large number of documents.

To create a full-text index, use the CREATE INDEX syntax that defines an inverted index, specifying a TSVECTOR column.

  • Using the PostgreSQL-compatible syntax:

    CREATE INDEX {optional name} ON {table} USING GIN ({column});
    
    Note:

    GIN and GiST indexes are implemented identically on CockroachDB. GIN and GIST are therefore synonymous when defining a full-text index.

  • Using CREATE INVERTED INDEX:

    CREATE INVERTED INDEX {optional name} ON {table} ({column});
    

For more ways to define full-text indexes, see Create a full-text index with an expression and Create a full-text index with a stored computed column.

Text search configuration

A text search configuration determines how inputs are parsed into TSVECTOR and TSQUERY values. This includes a dictionary that is used to identify derivatives of words, as well as stop words to exclude when normalizing documents and queries.

The supported dictionaries are English, Danish, Dutch, Finnish, French, German, Hungarian, Italian, Norwegian, Portuguese, Russian, Spanish, Swedish, and Turkish. An additional simple dictionary does not perform stemming or stopwording when normalizing documents or queries.

You can specify a text search configuration as the first parameter when calling any of the built-in functions to process a document or form a query. For example:

icon/buttons/copy
SELECT to_tsvector('swedish', 'Hur får träd tillgång till internet?');
                 to_tsvector
----------------------------------------------
  'får':2 'internet':6 'tillgång':4 'träd':3

If you do not specify a configuration when calling the function, the value of the default_text_search_config session variable is used. This defaults to english and can be changed as follows:

icon/buttons/copy
SET default_text_search_config = swedish;

For more information about text search configurations, see the PostgreSQL documentation.

Note:

At this time, only the dictionary can be specified in a text search configuration. See Unsupported features.

Examples

Match queries to documents

Use the @@ operator to match a query (TSQUERY) to a searchable document (TSVECTOR). In the following example, because the TSQUERY comprises the lexemes get and internet, which are both contained in the TSVECTOR, the output will be true:

icon/buttons/copy
SELECT to_tsvector('How do trees get on the internet?') @@ to_tsquery('How & to & get & internet?');
  ?column?
------------
     t

Use the plainto_tsquery() built-in function to match text to a searchable document. This search is equivalent to the preceding example:

icon/buttons/copy
SELECT to_tsvector('How do trees get on the internet?') @@ plainto_tsquery('How to get internet?');
  ?column?
------------
     t

Use the phraseto_tsquery() built-in function to match text phrases to a searchable document. Because phraseto_tsquery() separates the lexemes get and internet with the <-> (FOLLOWED BY) operator, and the document does not contain a phrase like "get internet", the output will be false:

icon/buttons/copy
SELECT to_tsvector('How do trees get on the internet?') @@ phraseto_tsquery('How to get internet?');
  ?column?
------------
     f

For an example of how text matching is used on a table, see Perform a full-text search with ranked results.

Create a full-text index with an expression

You can create an expression index on a STRING column, using to_tsvector() to convert the value to TSVECTOR.

Given the table:

icon/buttons/copy
CREATE TABLE t (a STRING);

Create an expression index that converts column a to TSVECTOR:

icon/buttons/copy
CREATE INDEX ON t USING GIN (to_tsvector('english', a));
Note:

When using a full-text search function in an expression index, you must specify a text search configuration. In the preceding example, the english configuration is specified.

Create a full-text index with a stored computed column

You can create a full-text index on a stored computed column that has a TSVECTOR data type.

Given the table:

icon/buttons/copy
CREATE TABLE t (a STRING);

Add a new TSVECTOR column that is computed from a using to_tsvector():

icon/buttons/copy
ALTER TABLE t ADD COLUMN b TSVECTOR 
  AS (to_tsvector('english', a)) STORED;
Note:

When using a full-text search function in a stored generated column, you must specify a text search configuration. In the preceding example, the english configuration is specified.

View the columns on the table:

icon/buttons/copy
SHOW COLUMNS FROM t;
  column_name | data_type | is_nullable | column_default |   generation_expression   |       indices       | is_hidden
--------------+-----------+-------------+----------------+---------------------------+---------------------+------------
  a           | STRING    |      t      | NULL           |                           | {t_pkey}            |     f
  rowid       | INT8      |      f      | unique_rowid() |                           | {t_expr_idx,t_pkey} |     t
  b           | TSVECTOR  |      t      | NULL           | to_tsvector('english', a) | {t_pkey}            |     f
(3 rows)

Create an inverted index on the TSVECTOR column:

CREATE INDEX ON t USING GIN (b);

Perform a full-text search with ranked results

  1. Create a table with STRING columns:

    icon/buttons/copy
    CREATE TABLE dadjokes (opener STRING, response STRING);
    
  2. Populate the table with sample values. These are the documents that you will search:

    icon/buttons/copy
    INSERT INTO dadjokes (opener, response) VALUES
      ('How do trees get on the internet?', 'They log on.'),
      ('What do you call a pony with a sore throat?', 'A little horse.'),
      ('What would a bathroom for fancy cats be called?', 'The glitter box.'),
      ('Why did the scarecrow win an award?', 'It was outstanding in its field.'),
      ('What kind of tree fits in your hand?', 'A palm tree.'),
      ('What was a better invention than the first telephone?', 'The second one.'),
      ('Where do you learn to make banana splits?', 'At sundae school.'),
      ('How did the hipster burn the roof of his mouth?', 'He ate the pizza before it was cool.'),
      ('What did one wall say to the other wall?', 'Meet you at the corner.'),
      ('When does a joke become a dad joke?', 'When it becomes apparent.');
    
  3. You can use LIKE or ILIKE to search for text. However, the results will be unranked:

    icon/buttons/copy
    SELECT opener, response
    FROM dadjokes
    WHERE opener LIKE '%tree%' OR response LIKE '%tree%';
    
                     opener                |   response
    ---------------------------------------+---------------
      How do trees get on the internet?    | They log on.
      What kind of tree fits in your hand? | A palm tree.
    (2 rows)
    
  4. Create a full-text index on the concatenation of both table columns, specifying a text search configuration (in this case, english), as is mandatory when defining an expression index:

    CREATE INDEX ON dadjokes USING GIN (to_tsvector('english', opener || response));
    
    Note:

    Because inverted joins on TSVECTOR values are not yet supported, this index won't be used to accelerate the SQL queries in this example. See Unsupported features.

  5. Search the table for a query (in this case, tree), and rank the results.

    In the following statement, to_tsvector() makes the table values searchable, to_tsquery() forms the query, and ts_rank() calculates the search rankings:

    SELECT opener, response, ts_rank(joke, query) AS rank
    FROM dadjokes, to_tsvector(opener || response) joke, to_tsquery('tree') query
    WHERE query @@ joke
    ORDER BY rank DESC
    LIMIT 10;
    
                     opener                |   response   |    rank
    ---------------------------------------+--------------+--------------
      What kind of tree fits in your hand? | A palm tree. | 0.075990885
      How do trees get on the internet?    | They log on. |  0.06079271
    (2 rows)
    

    The frequency of the tree lexeme in each row determines the difference in the rankings.

  6. Search the table for the query calling, and rank the results:

    SELECT opener, response, ts_rank(joke, query) AS rank
    FROM dadjokes, to_tsvector(opener || response) joke, to_tsquery('calling') query
    WHERE query @@ joke
    ORDER BY rank DESC
    LIMIT 10;
    
                          opener                      |     response     |    rank
    --------------------------------------------------+------------------+-------------
      What would a bathroom for fancy cats be called? | The glitter box. | 0.06079271
      What do you call a pony with a sore throat?     | A little horse.  | 0.06079271
    (2 rows)
    

    Unlike pattern matching with LIKE and ILIKE, a full-text search for calling produced matches. This is because to_tsvector() and to_tsquery() each normalized derivatives of the word "call" in their respective inputs to the lexeme call, using the default english text search configuration.

  7. Use plainto_tsquery() to convert text input to a search query:

    SELECT opener, response, ts_rank(joke, query) AS rank
    FROM dadjokes, to_tsvector(opener || response) joke, plainto_tsquery('no more joking, dad') query
    WHERE query @@ joke
    ORDER BY rank DESC
    LIMIT 10;
    
                    opener                |         response          |    rank
    --------------------------------------+---------------------------+-------------
      When does a joke become a dad joke? | When it becomes apparent. | 0.18681315
    (1 row)
    
  8. Alternatively, use phraseto_tsquery() to search for matching text phrases (in this example, "joke dad"):

    SELECT opener, response, ts_rank(joke, query) AS rank
    FROM dadjokes, to_tsvector(opener || response) joke, phraseto_tsquery('no more joking, dad') query
    WHERE query @@ joke
    ORDER BY rank DESC
    LIMIT 10;
    
      opener | response | rank
    ---------+----------+-------
    (0 rows)
    

Unsupported features

Some PostgreSQL syntax and features are unsupported. These include, but are not limited to:

  • Aspects of text search configurations other than the specified dictionary.
  • websearch_to_tsquery() built-in function.
  • tsquery_phrase() built-in function.
  • ts_rank_cd() built-in function.
  • setweight() built-in function.
  • Inverted joins on TSVECTOR values.
  • tsvector || tsvector comparisons.
  • tsquery || tsquery comparisons.
  • tsquery && tsquery comparisons.
  • tsquery <-> tsquery comparisons.
  • !! tsquery comparisons.
  • tsquery @> tsquery and tsquery <@ tsquery comparisons.

For full details, see the tracking issue.

See also


Yes No
On this page

Yes No