12.2. Tables and Indexes

The previous section described how to perform full text searches using constant strings. This section shows how to search table data, optionally using indexes.

12.2.1. Searching a Table

It is possible to do full text table search with no index. A simple query to find all title entries that contain the word friend is:

SELECT title
FROM pgweb
WHERE to_tsvector('english', body) @@ to_tsquery('friend')

The query above uses the english the configuration set by default_text_search_config. A more complex query is to select the ten most recent documents which contain create and table in the title or body:

SELECT title
FROM pgweb
WHERE to_tsvector('english', title || body) @@ to_tsquery('create & table')
ORDER BY dlm DESC LIMIT 10;

dlm is the last-modified date so we used ORDER BY dlm LIMIT 10 to get the ten most recent matches. For clarity we omitted the coalesce function which prevents the unwanted effect of NULL concatenation.

12.2.2. Creating Indexes

We can create a GIN (Section 12.5) index to speed up the search:

CREATE INDEX pgweb_idx ON pgweb USING gin(to_tsvector('english', body));

Notice that the 2-argument version of to_tsvector is used. Only text search functions which specify a configuration name can be used in expression indexes (Section 11.7). This is because the index contents must be unaffected by default_text_search_config. If they were affected, the index contents might be inconsistent because different entries could contain tsvectors that were created with different text search configurations, and there would be no way to guess which was which. It would be impossible to dump and restore such an index correctly.

Because the two-argument version of to_tsvector was used in the index above, only a query reference that uses the 2-argument version of to_tsvector with the same configuration name will use that index, i.e. WHERE 'a & b' @@ to_svector('english', body) will use the index, but WHERE 'a & b' @@ to_svector(body)) and WHERE 'a & b' @@ body::tsvector will not. This guarantees that an index will be used only with the same configuration used to create the index rows.

It is possible to setup more complex expression indexes where the configuration name is specified by another column, e.g.:

CREATE INDEX pgweb_idx ON pgweb USING gin(to_tsvector(config_name, body));

where config_name is a column in the pgweb table. This allows mixed configurations in the same index while recording which configuration was used for each index row.

Indexes can even concatenate columns:

CREATE INDEX pgweb_idx ON pgweb USING gin(to_tsvector('english', title || body));

A more complex case is to create a separate tsvector column to hold the output of to_tsvector(). This example is a concatenation of title and body, with ranking information. We assign different labels to them to encode information about the origin of each word:

ALTER TABLE pgweb ADD COLUMN textsearch_index tsvector;
UPDATE pgweb SET textsearch_index =
     setweight(to_tsvector('english', coalesce(title,'')), 'A') || ' ' ||
     setweight(to_tsvector('english', coalesce(body,'')),'D');

Then we create a GIN index to speed up the search:

CREATE INDEX textsearch_idx ON pgweb USING gin(textsearch_index);

After vacuuming, we are ready to perform a fast full text search:

SELECT ts_rank_cd(textsearch_index, q) AS rank, title
FROM pgweb, to_tsquery('create & table') q
WHERE q @@ textsearch_index
ORDER BY rank DESC LIMIT 10;

It is necessary to create a trigger to keep the new tsvector column current anytime title or body changes. Keep in mind that, just like with expression indexes, it is important to specify the configuration name when creating text search data types inside triggers so the column's contents are not affected by changes to default_text_search_config.