12.8. Debugging

Function ts_debug allows easy testing of your full text searching configuration.

   ts_debug([config_name], document TEXT) returns SETOF ts_debug
  

ts_debug displays information about every token of document as produced by the parser and processed by the configured dictionaries using the configuration specified by config_name.

ts_debug's result type is defined as:

CREATE TYPE ts_debug AS (
    "Alias" text,
    "Description" text,
    "Token" text,
    "Dictionaries" regdictionary[],
    "Lexized token" text
);

For a demonstration of how function ts_debug works we first create a public.english configuration and ispell dictionary for the English language. You can skip the test step and play with the standard english configuration.

CREATE TEXT SEARCH CONFIGURATION public.english ( COPY = pg_catalog.english );

CREATE TEXT SEARCH DICTIONARY english_ispell (
    TEMPLATE = ispell,
    DictFile = english,
    AffFile = english,
    StopWords = english
);

ALTER TEXT SEARCH CONFIGURATION public.english
   ALTER MAPPING FOR lword WITH english_ispell, english_stem;
SELECT * FROM ts_debug('public.english','The Brightest supernovaes');
 Alias |  Description  |    Token    |              Dictionaries             |          Lexized token
-------+---------------+-------------+---------------------------------------+---------------------------------
 lword | Latin word    | The         | {public.english_ispell,pg_catalog.english_stem} | public.english_ispell: {}
 blank | Space symbols |             |                                       |
 lword | Latin word    | Brightest   | {public.english_ispell,pg_catalog.english_stem} | public.english_ispell: {bright}
 blank | Space symbols |             |                                       |
 lword | Latin word    | supernovaes | {public.english_ispell,pg_catalog.english_stem} | pg_catalog.english_stem: {supernova}
(5 rows)

In this example, the word Brightest was recognized by a parser as a Latin word (alias lword) and came through the dictionaries public.english_ispell and pg_catalog.english_stem. It was recognized by public.english_ispell, which reduced it to the noun bright. The word supernovaes is unknown by the public.english_ispell dictionary so it was passed to the next dictionary, and, fortunately, was recognized (in fact, public.english_stem is a stemming dictionary and recognizes everything; that is why it was placed at the end of the dictionary stack).

The word The was recognized by public.english_ispell dictionary as a stop word (Section 12.4.1) and will not be indexed.

You can always explicitly specify which columns you want to see:

SELECT "Alias", "Token", "Lexized token"
FROM ts_debug('public.english','The Brightest supernovaes');
 Alias |    Token    |          Lexized token
-------+-------------+---------------------------------
 lword | The         | public.english_ispell: {}
 blank |             |
 lword | Brightest   | public.english_ispell: {bright}
 blank |             |
 lword | supernovaes | pg_catalog.english_stem: {supernova}
(5 rows)