Function Description
ascii(str) Returns the numeric value of the first character of `str`.
base64(bin) Converts the argument from a binary `bin` to a base 64 string.
bit_length(expr) Returns the bit length of string data or number of bits of binary data.
btrim(str) Removes the leading and trailing space characters from `str`.
btrim(str, trimStr) Remove the leading and trailing `trimStr` characters from `str`.
char(expr) Returns the ASCII character having the binary equivalent to `expr`. If n is larger than 256 the result is equivalent to chr(n % 256)
char_length(expr) Returns the character length of string data or number of bytes of binary data. The length of string data includes the trailing spaces. The length of binary data includes binary zeros.
character_length(expr) Returns the character length of string data or number of bytes of binary data. The length of string data includes the trailing spaces. The length of binary data includes binary zeros.
chr(expr) Returns the ASCII character having the binary equivalent to `expr`. If n is larger than 256 the result is equivalent to chr(n % 256)
collate(expr, collationName) Marks a given expression with the specified collation.
collation(expr) Returns the collation name of a given expression.
concat_ws(sep[, str | array(str)]+) Returns the concatenation of the strings separated by `sep`, skipping null values.
contains(left, right) Returns a boolean. The value is True if right is found inside left. Returns NULL if either input expression is NULL. Otherwise, returns False. Both left or right must be of STRING or BINARY type.
decode(bin, charset) Decodes the first argument using the second argument character set. If either argument is null, the result will also be null.
decode(expr, search, result [, search, result ] ... [, default]) Compares expr to each search value in order. If expr is equal to a search value, decode returns the corresponding result. If no match is found, then it returns default. If default is omitted, it returns null.
elt(n, input1, input2, ...) Returns the `n`-th input, e.g., returns `input2` when `n` is 2. The function returns NULL if the index exceeds the length of the array and `spark.sql.ansi.enabled` is set to false. If `spark.sql.ansi.enabled` is set to true, it throws ArrayIndexOutOfBoundsException for invalid indices.
encode(str, charset) Encodes the first argument using the second argument character set. If either argument is null, the result will also be null.
endswith(left, right) Returns a boolean. The value is True if left ends with right. Returns NULL if either input expression is NULL. Otherwise, returns False. Both left or right must be of STRING or BINARY type.
find_in_set(str, str_array) Returns the index (1-based) of the given string (`str`) in the comma-delimited list (`str_array`). Returns 0, if the string was not found or if the given string (`str`) contains a comma.
format_number(expr1, expr2) Formats the number `expr1` like '#,###,###.##', rounded to `expr2` decimal places. If `expr2` is 0, the result has no decimal point or fractional part. `expr2` also accept a user specified format. This is supposed to function like MySQL's FORMAT.
format_string(strfmt, obj, ...) Returns a formatted string from printf-style format strings.
initcap(str) Returns `str` with the first letter of each word in uppercase. All other letters are in lowercase. Words are delimited by white space.
instr(str, substr) Returns the (1-based) index of the first occurrence of `substr` in `str`.
is_valid_utf8(str) Returns true if `str` is a valid UTF-8 string, otherwise returns false.
lcase(str) Returns `str` with all characters changed to lowercase.
left(str, len) Returns the leftmost `len`(`len` can be string type) characters from the string `str`,if `len` is less or equal than 0 the result is an empty string.
len(expr) Returns the character length of string data or number of bytes of binary data. The length of string data includes the trailing spaces. The length of binary data includes binary zeros.
length(expr) Returns the character length of string data or number of bytes of binary data. The length of string data includes the trailing spaces. The length of binary data includes binary zeros.
levenshtein(str1, str2[, threshold]) Returns the Levenshtein distance between the two given strings. If threshold is set and distance more than it, return -1.
locate(substr, str[, pos]) Returns the position of the first occurrence of `substr` in `str` after position `pos`. The given `pos` and return value are 1-based.
lower(str) Returns `str` with all characters changed to lowercase.
lpad(str, len[, pad]) Returns `str`, left-padded with `pad` to a length of `len`. If `str` is longer than `len`, the return value is shortened to `len` characters or bytes. If `pad` is not specified, `str` will be padded to the left with space characters if it is a character string, and with zeros if it is a byte sequence.
ltrim(str) Removes the leading space characters from `str`.
luhn_check(str ) Checks that a string of digits is valid according to the Luhn algorithm. This checksum function is widely applied on credit card numbers and government identification numbers to distinguish valid numbers from mistyped, incorrect numbers.
make_valid_utf8(str) Returns the original string if `str` is a valid UTF-8 string, otherwise returns a new string whose invalid UTF8 byte sequences are replaced using the UNICODE replacement character U+FFFD.
mask(input[, upperChar, lowerChar, digitChar, otherChar]) masks the given string value. The function replaces characters with 'X' or 'x', and numbers with 'n'. This can be useful for creating copies of tables with sensitive information removed.
octet_length(expr) Returns the byte length of string data or number of bytes of binary data.
overlay(input, replace, pos[, len]) Replace `input` with `replace` that starts at `pos` and is of length `len`.
position(substr, str[, pos]) Returns the position of the first occurrence of `substr` in `str` after position `pos`. The given `pos` and return value are 1-based.
printf(strfmt, obj, ...) Returns a formatted string from printf-style format strings.
regexp_count(str, regexp) Returns a count of the number of times that the regular expression pattern `regexp` is matched in the string `str`.
regexp_extract(str, regexp[, idx]) Extract the first string in the `str` that match the `regexp` expression and corresponding to the regex group index.
regexp_extract_all(str, regexp[, idx]) Extract all strings in the `str` that match the `regexp` expression and corresponding to the regex group index.
regexp_instr(str, regexp) Searches a string for a regular expression and returns an integer that indicates the beginning position of the matched substring. Positions are 1-based, not 0-based. If no match is found, returns 0.
regexp_replace(str, regexp, rep[, position]) Replaces all substrings of `str` that match `regexp` with `rep`.
regexp_substr(str, regexp) Returns the substring that matches the regular expression `regexp` within the string `str`. If the regular expression is not found, the result is null.
repeat(str, n) Returns the string which repeats the given string value n times.
replace(str, search[, replace]) Replaces all occurrences of `search` with `replace`.
right(str, len) Returns the rightmost `len`(`len` can be string type) characters from the string `str`,if `len` is less or equal than 0 the result is an empty string.
rpad(str, len[, pad]) Returns `str`, right-padded with `pad` to a length of `len`. If `str` is longer than `len`, the return value is shortened to `len` characters. If `pad` is not specified, `str` will be padded to the right with space characters if it is a character string, and with zeros if it is a binary string.
rtrim(str) Removes the trailing space characters from `str`.
sentences(str[, lang[, country]]) Splits `str` into an array of array of words.
soundex(str) Returns Soundex code of the string.
space(n) Returns a string consisting of `n` spaces.
split(str, regex, limit) Splits `str` around occurrences that match `regex` and returns an array with a length of at most `limit`
split_part(str, delimiter, partNum) Splits `str` by delimiter and return requested part of the split (1-based). If any input is null, returns null. if `partNum` is out of range of split parts, returns empty string. If `partNum` is 0, throws an error. If `partNum` is negative, the parts are counted backward from the end of the string. If the `delimiter` is an empty string, the `str` is not split.
startswith(left, right) Returns a boolean. The value is True if left starts with right. Returns NULL if either input expression is NULL. Otherwise, returns False. Both left or right must be of STRING or BINARY type.
substr(str, pos[, len]) Returns the substring of `str` that starts at `pos` and is of length `len`, or the slice of byte array that starts at `pos` and is of length `len`.
substr(str FROM pos[ FOR len]]) Returns the substring of `str` that starts at `pos` and is of length `len`, or the slice of byte array that starts at `pos` and is of length `len`.
substring(str, pos[, len]) Returns the substring of `str` that starts at `pos` and is of length `len`, or the slice of byte array that starts at `pos` and is of length `len`.
substring(str FROM pos[ FOR len]]) Returns the substring of `str` that starts at `pos` and is of length `len`, or the slice of byte array that starts at `pos` and is of length `len`.
substring_index(str, delim, count) Returns the substring from `str` before `count` occurrences of the delimiter `delim`. If `count` is positive, everything to the left of the final delimiter (counting from the left) is returned. If `count` is negative, everything to the right of the final delimiter (counting from the right) is returned. The function substring_index performs a case-sensitive match when searching for `delim`.
to_binary(str[, fmt]) Converts the input `str` to a binary value based on the supplied `fmt`. `fmt` can be a case-insensitive string literal of "hex", "utf-8", "utf8", or "base64". By default, the binary format for conversion is "hex" if `fmt` is omitted. The function returns NULL if at least one of the input parameters is NULL.
to_char(expr, format) Convert `expr` to a string based on the `format`. Throws an exception if the conversion fails. The format can consist of the following characters, case insensitive: '0' or '9': Specifies an expected digit between 0 and 9. A sequence of 0 or 9 in the format string matches a sequence of digits in the input value, generating a result string of the same length as the corresponding sequence in the format string. The result string is left-padded with zeros if the 0/9 sequence comprises more digits than the matching part of the decimal value, starts with 0, and is before the decimal point. Otherwise, it is padded with spaces. '.' or 'D': Specifies the position of the decimal point (optional, only allowed once). ',' or 'G': Specifies the position of the grouping (thousands) separator (,). There must be a 0 or 9 to the left and right of each grouping separator. '$': Specifies the location of the $ currency sign. This character may only be specified once. 'S' or 'MI': Specifies the position of a '-' or '+' sign (optional, only allowed once at the beginning or end of the format string). Note that 'S' prints '+' for positive values but 'MI' prints a space. 'PR': Only allowed at the end of the format string; specifies that the result string will be wrapped by angle brackets if the input value is negative. ('<1>'). If `expr` is a datetime, `format` shall be a valid datetime pattern, see Datetime Patterns. If `expr` is a binary, it is converted to a string in one of the formats: 'base64': a base 64 string. 'hex': a string in the hexadecimal format. 'utf-8': the input binary is decoded to UTF-8 string.
to_number(expr, fmt) Convert string 'expr' to a number based on the string format 'fmt'. Throws an exception if the conversion fails. The format can consist of the following characters, case insensitive: '0' or '9': Specifies an expected digit between 0 and 9. A sequence of 0 or 9 in the format string matches a sequence of digits in the input string. If the 0/9 sequence starts with 0 and is before the decimal point, it can only match a digit sequence of the same size. Otherwise, if the sequence starts with 9 or is after the decimal point, it can match a digit sequence that has the same or smaller size. '.' or 'D': Specifies the position of the decimal point (optional, only allowed once). ',' or 'G': Specifies the position of the grouping (thousands) separator (,). There must be a 0 or 9 to the left and right of each grouping separator. 'expr' must match the grouping separator relevant for the size of the number. '$': Specifies the location of the $ currency sign. This character may only be specified once. 'S' or 'MI': Specifies the position of a '-' or '+' sign (optional, only allowed once at the beginning or end of the format string). Note that 'S' allows '-' but 'MI' does not. 'PR': Only allowed at the end of the format string; specifies that 'expr' indicates a negative number with wrapping angled brackets. ('<1>').
to_varchar(expr, format) Convert `expr` to a string based on the `format`. Throws an exception if the conversion fails. The format can consist of the following characters, case insensitive: '0' or '9': Specifies an expected digit between 0 and 9. A sequence of 0 or 9 in the format string matches a sequence of digits in the input value, generating a result string of the same length as the corresponding sequence in the format string. The result string is left-padded with zeros if the 0/9 sequence comprises more digits than the matching part of the decimal value, starts with 0, and is before the decimal point. Otherwise, it is padded with spaces. '.' or 'D': Specifies the position of the decimal point (optional, only allowed once). ',' or 'G': Specifies the position of the grouping (thousands) separator (,). There must be a 0 or 9 to the left and right of each grouping separator. '$': Specifies the location of the $ currency sign. This character may only be specified once. 'S' or 'MI': Specifies the position of a '-' or '+' sign (optional, only allowed once at the beginning or end of the format string). Note that 'S' prints '+' for positive values but 'MI' prints a space. 'PR': Only allowed at the end of the format string; specifies that the result string will be wrapped by angle brackets if the input value is negative. ('<1>'). If `expr` is a datetime, `format` shall be a valid datetime pattern, see Datetime Patterns. If `expr` is a binary, it is converted to a string in one of the formats: 'base64': a base 64 string. 'hex': a string in the hexadecimal format. 'utf-8': the input binary is decoded to UTF-8 string.
translate(input, from, to) Translates the `input` string by replacing the characters present in the `from` string with the corresponding characters in the `to` string.
trim(str) Removes the leading and trailing space characters from `str`.
trim(BOTH FROM str) Removes the leading and trailing space characters from `str`.
trim(LEADING FROM str) Removes the leading space characters from `str`.
trim(TRAILING FROM str) Removes the trailing space characters from `str`.
trim(trimStr FROM str) Remove the leading and trailing `trimStr` characters from `str`.
trim(BOTH trimStr FROM str) Remove the leading and trailing `trimStr` characters from `str`.
trim(LEADING trimStr FROM str) Remove the leading `trimStr` characters from `str`.
trim(TRAILING trimStr FROM str) Remove the trailing `trimStr` characters from `str`.
try_to_binary(str[, fmt]) This is a special version of `to_binary` that performs the same operation, but returns a NULL value instead of raising an error if the conversion cannot be performed.
try_to_number(expr, fmt) Convert string 'expr' to a number based on the string format `fmt`. Returns NULL if the string 'expr' does not match the expected format. The format follows the same semantics as the to_number function.
try_validate_utf8(str) Returns the original string if `str` is a valid UTF-8 string, otherwise returns NULL.
ucase(str) Returns `str` with all characters changed to uppercase.
unbase64(str) Converts the argument from a base 64 string `str` to a binary.
upper(str) Returns `str` with all characters changed to uppercase.
validate_utf8(str) Returns the original string if `str` is a valid UTF-8 string, otherwise throws an exception.