This section describes functions and operators for examining and manipulating string values. Strings in this context include values of all the types CHARACTER, CHARACTER VARYING, and TEXT. Unless otherwise noted, all of the functions listed below work on all of these types, but be wary of potential effects of the automatic padding when using the CHARACTER type. Generally, the functions described here also work on data of non-string types by converting that data to a string representation first. Some functions also exist natively for bit-string types.
SQL defines some string functions with a special syntax where certain keywords rather than commas are used to separate the arguments. Details are in Table 6-6. These functions are also implemented using the regular syntax for function invocation. (See Table 6-8.)
Table 6-6. SQL String Functions and Operators
Function | Return Type | Description | Example | Result |
---|---|---|---|---|
string || string | text | string concatenation | 'Postgre' || 'SQL' | PostgreSQL |
bit_length(string) | integer | number of bits in string | bit_length('jose') | 32 |
char_length(string) or character_length(string) | integer | number of characters in string | char_length('jose') | 4 |
convert(string using conversion_name) | text | Change encoding using specified conversion name. Conversions can be defined by CREATE CONVERSION. Also there are some pre-defined conversion names. See Table 6-7 for available conversion names. | convert('PostgreSQL' using iso8859_1_to_utf8) | 'PostgreSQL' in UNICODE(UTF-8) encoding |
lower(string) | text | Convert string to lower case. | lower('TOM') | tom |
octet_length(string) | integer | number of bytes in string | octet_length('jose') | 4 |
overlay(string placing string from integer [for integer]) | text | insert substring | overlay('Txxxxas' placing 'hom' from 2 for 4) | Thomas |
position(substring in string) | integer | location of specified substring | position('om' in 'Thomas') | 3 |
substring(string [from integer] [for integer]) | text | extract substring | substring('Thomas' from 2 for 3) | hom |
substring(string [from pattern] [for escape]) | text | extract regular expression | substring('Thomas' from 'mas$' for [escape '\\']) | mas |
trim([leading | trailing | both] [characters] from string) | text | Removes the longest string containing only the characters (a space by default) from the beginning/end/both ends of the string. | trim(both 'x' from 'xTomxx') | Tom |
upper(string) | text | Convert string to upper case. | upper('tom') | TOM |
Table 6-7. Available conversion names
conversion | source encoding | destination encoding |
---|---|---|
ascii_to_mic | SQL_ASCII | MULE_INTERNAL |
ascii_to_utf_8 | SQL_ASCII | UNICODE |
big5_to_euc_tw | BIG5 | EUC_TW |
big5_to_mic | BIG5 | MULE_INTERNAL |
big5_to_utf_8 | BIG5 | UNICODE |
euc_cn_to_mic | EUC_CN | MULE_INTERNAL |
euc_cn_to_utf_8 | EUC_CN | UNICODE |
euc_jp_to_mic | EUC_JP | MULE_INTERNAL |
euc_jp_to_sjis | EUC_JP | SJIS |
euc_jp_to_utf_8 | EUC_JP | UNICODE |
euc_kr_to_mic | EUC_KR | MULE_INTERNAL |
euc_kr_to_utf_8 | EUC_KR | UNICODE |
euc_tw_to_big5 | EUC_TW | BIG5 |
euc_tw_to_mic | EUC_TW | MULE_INTERNAL |
euc_tw_to_utf_8 | EUC_TW | UNICODE |
gb18030_to_utf_8 | GB18030 | UNICODE |
gbk_to_utf_8 | GBK | UNICODE |
iso_8859_10_to_utf_8 | LATIN6 | UNICODE |
iso_8859_13_to_utf_8 | LATIN7 | UNICODE |
iso_8859_14_to_utf_8 | LATIN8 | UNICODE |
iso_8859_15_to_utf_8 | LATIN9 | UNICODE |
iso_8859_16_to_utf_8 | LATIN10 | UNICODE |
iso_8859_1_to_mic | LATIN1 | MULE_INTERNAL |
iso_8859_1_to_utf_8 | LATIN1 | UNICODE |
iso_8859_2_to_mic | LATIN2 | MULE_INTERNAL |
iso_8859_2_to_utf_8 | LATIN2 | UNICODE |
iso_8859_2_to_win1250 | LATIN2 | WIN1250 |
iso_8859_3_to_mic | LATIN3 | MULE_INTERNAL |
iso_8859_3_to_utf_8 | LATIN3 | UNICODE |
iso_8859_4_to_mic | LATIN4 | MULE_INTERNAL |
iso_8859_4_to_utf_8 | LATIN4 | UNICODE |
iso_8859_5_to_koi8r | ISO_8859_5 | KOI8 |
iso_8859_5_to_mic | ISO_8859_5 | MULE_INTERNAL |
iso_8859_5_to_utf_8 | ISO_8859_5 | UNICODE |
iso_8859_5_to_win1251 | ISO_8859_5 | WIN |
iso_8859_5_to_win866 | ISO_8859_5 | ALT |
iso_8859_6_to_utf_8 | ISO_8859_6 | UNICODE |
iso_8859_7_to_utf_8 | ISO_8859_7 | UNICODE |
iso_8859_8_to_utf_8 | ISO_8859_8 | UNICODE |
iso_8859_9_to_utf_8 | LATIN5 | UNICODE |
johab_to_utf_8 | JOHAB | UNICODE |
koi8r_to_iso_8859_5 | KOI8 | ISO_8859_5 |
koi8r_to_mic | KOI8 | MULE_INTERNAL |
koi8r_to_utf_8 | KOI8 | UNICODE |
koi8r_to_win1251 | KOI8 | WIN |
koi8r_to_win866 | KOI8 | ALT |
mic_to_ascii | MULE_INTERNAL | SQL_ASCII |
mic_to_big5 | MULE_INTERNAL | BIG5 |
mic_to_euc_cn | MULE_INTERNAL | EUC_CN |
mic_to_euc_jp | MULE_INTERNAL | EUC_JP |
mic_to_euc_kr | MULE_INTERNAL | EUC_KR |
mic_to_euc_tw | MULE_INTERNAL | EUC_TW |
mic_to_iso_8859_1 | MULE_INTERNAL | LATIN1 |
mic_to_iso_8859_2 | MULE_INTERNAL | LATIN2 |
mic_to_iso_8859_3 | MULE_INTERNAL | LATIN3 |
mic_to_iso_8859_4 | MULE_INTERNAL | LATIN4 |
mic_to_iso_8859_5 | MULE_INTERNAL | ISO_8859_5 |
mic_to_koi8r | MULE_INTERNAL | KOI8 |
mic_to_sjis | MULE_INTERNAL | SJIS |
mic_to_win1250 | MULE_INTERNAL | WIN1250 |
mic_to_win1251 | MULE_INTERNAL | WIN |
mic_to_win866 | MULE_INTERNAL | ALT |
sjis_to_euc_jp | SJIS | EUC_JP |
sjis_to_mic | SJIS | MULE_INTERNAL |
sjis_to_utf_8 | SJIS | UNICODE |
tcvn_to_utf_8 | TCVN | UNICODE |
uhc_to_utf_8 | UHC | UNICODE |
utf_8_to_ascii | UNICODE | SQL_ASCII |
utf_8_to_big5 | UNICODE | BIG5 |
utf_8_to_euc_cn | UNICODE | EUC_CN |
utf_8_to_euc_jp | UNICODE | EUC_JP |
utf_8_to_euc_kr | UNICODE | EUC_KR |
utf_8_to_euc_tw | UNICODE | EUC_TW |
utf_8_to_gb18030 | UNICODE | GB18030 |
utf_8_to_gbk | UNICODE | GBK |
utf_8_to_iso_8859_1 | UNICODE | LATIN1 |
utf_8_to_iso_8859_10 | UNICODE | LATIN6 |
utf_8_to_iso_8859_13 | UNICODE | LATIN7 |
utf_8_to_iso_8859_14 | UNICODE | LATIN8 |
utf_8_to_iso_8859_15 | UNICODE | LATIN9 |
utf_8_to_iso_8859_16 | UNICODE | LATIN10 |
utf_8_to_iso_8859_2 | UNICODE | LATIN2 |
utf_8_to_iso_8859_3 | UNICODE | LATIN3 |
utf_8_to_iso_8859_4 | UNICODE | LATIN4 |
utf_8_to_iso_8859_5 | UNICODE | ISO_8859_5 |
utf_8_to_iso_8859_6 | UNICODE | ISO_8859_6 |
utf_8_to_iso_8859_7 | UNICODE | ISO_8859_7 |
utf_8_to_iso_8859_8 | UNICODE | ISO_8859_8 |
utf_8_to_iso_8859_9 | UNICODE | LATIN5 |
utf_8_to_johab | UNICODE | JOHAB |
utf_8_to_koi8r | UNICODE | KOI8 |
utf_8_to_sjis | UNICODE | SJIS |
utf_8_to_tcvn | UNICODE | TCVN |
utf_8_to_uhc | UNICODE | UHC |
utf_8_to_win1250 | UNICODE | WIN1250 |
utf_8_to_win1251 | UNICODE | WIN |
utf_8_to_win1256 | UNICODE | WIN1256 |
utf_8_to_win866 | UNICODE | ALT |
utf_8_to_win874 | UNICODE | WIN874 |
win1250_to_iso_8859_2 | WIN1250 | LATIN2 |
win1250_to_mic | WIN1250 | MULE_INTERNAL |
win1250_to_utf_8 | WIN1250 | UNICODE |
win1251_to_iso_8859_5 | WIN | ISO_8859_5 |
win1251_to_koi8r | WIN | KOI8 |
win1251_to_mic | WIN | MULE_INTERNAL |
win1251_to_utf_8 | WIN | UNICODE |
win1251_to_win866 | WIN | ALT |
win1256_to_utf_8 | WIN1256 | UNICODE |
win866_to_iso_8859_5 | ALT | ISO_8859_5 |
win866_to_koi8r | ALT | KOI8 |
win866_to_mic | ALT | MULE_INTERNAL |
win866_to_utf_8 | ALT | UNICODE |
win866_to_win1251 | ALT | WIN |
win874_to_utf_8 | WIN874 | UNICODE |
Additional string manipulation functions are available and are listed below. Some of them are used internally to implement the SQL-standard string functions listed above.
Table 6-8. Other String Functions
The to_ascii function supports conversion from LATIN1, LATIN2, WIN1250 (CP1250) only.