PostgreSQL 8.3beta1 Documentation | ||||
---|---|---|---|---|
Prev | Fast Backward | Chapter 12. Full Text Search | Fast Forward | Next |
Function ts_debug
allows easy testing of your full text searching
configuration.
ts_debug([config_name], document TEXT) returns SETOF ts_debug
ts_debug
displays information about every token of
document as produced by the
parser and processed by the configured dictionaries using the configuration
specified by config_name.
ts_debug's result type is defined as:
CREATE TYPE ts_debug AS ( "Alias" text, "Description" text, "Token" text, "Dictionaries" regdictionary[], "Lexized token" text );
For a demonstration of how function ts_debug
works we
first create a public.english configuration and
ispell dictionary for the English language. You can skip the test step and
play with the standard english configuration.
CREATE TEXT SEARCH CONFIGURATION public.english ( COPY = pg_catalog.english ); CREATE TEXT SEARCH DICTIONARY english_ispell ( TEMPLATE = ispell, DictFile = english, AffFile = english, StopWords = english ); ALTER TEXT SEARCH CONFIGURATION public.english ALTER MAPPING FOR lword WITH english_ispell, english_stem;
SELECT * FROM ts_debug('public.english','The Brightest supernovaes'); Alias | Description | Token | Dictionaries | Lexized token -------+---------------+-------------+---------------------------------------+--------------------------------- lword | Latin word | The | {public.english_ispell,pg_catalog.english_stem} | public.english_ispell: {} blank | Space symbols | | | lword | Latin word | Brightest | {public.english_ispell,pg_catalog.english_stem} | public.english_ispell: {bright} blank | Space symbols | | | lword | Latin word | supernovaes | {public.english_ispell,pg_catalog.english_stem} | pg_catalog.english_stem: {supernova} (5 rows)
In this example, the word Brightest was recognized by a parser as a Latin word (alias lword) and came through the dictionaries public.english_ispell and pg_catalog.english_stem. It was recognized by public.english_ispell, which reduced it to the noun bright. The word supernovaes is unknown by the public.english_ispell dictionary so it was passed to the next dictionary, and, fortunately, was recognized (in fact, public.english_stem is a stemming dictionary and recognizes everything; that is why it was placed at the end of the dictionary stack).
The word The was recognized by public.english_ispell dictionary as a stop word (Section 12.4.1) and will not be indexed.
You can always explicitly specify which columns you want to see:
SELECT "Alias", "Token", "Lexized token" FROM ts_debug('public.english','The Brightest supernovaes'); Alias | Token | Lexized token -------+-------------+--------------------------------- lword | The | public.english_ispell: {} blank | | lword | Brightest | public.english_ispell: {bright} blank | | lword | supernovaes | pg_catalog.english_stem: {supernova} (5 rows)