presto-docs/src/main/sphinx/functions/string.rst
The || operator performs concatenation.
For plugin-loaded string functions, see :ref:functions/plugin-loaded-functions:string functions.
.. note::
These functions assume that the input strings contain valid UTF-8 encoded
Unicode code points. There are no explicit checks for valid UTF-8 and
the functions may return incorrect results on invalid UTF-8.
Invalid UTF-8 data can be corrected with :func:`!from_utf8`.
Additionally, the functions operate on Unicode code points and not user
visible *characters* (or *grapheme clusters*). Some languages combine
multiple code points into a single user-perceived *character*, the basic
unit of a writing system for a language, but the functions will treat each
code point as a separate unit.
The :func:`!lower` and :func:`!upper` functions do not perform
locale-sensitive, context-sensitive, or one-to-many mappings required for
some languages. Specifically, this will return incorrect results for
Lithuanian, Turkish and Azeri.
.. function:: bit_length(string) -> boolean
Returns the count of bits for the given ``string``.
.. function:: chr(n) -> varchar
Returns the Unicode code point ``n`` as a single character string.
.. function:: codepoint(string) -> integer
Returns the Unicode code point of the only character of ``string``.
.. function:: concat(string1, ..., stringN) -> varchar
Returns the concatenation of ``string1``, ``string2``, ``...``, ``stringN``.
This function provides the same functionality as the
SQL-standard concatenation operator (``||``).
.. function:: ends_with(string, substring) -> boolean
Returns whether ``string`` ends_with with ``substring``.
.. function:: hamming_distance(string1, string2) -> bigint
Returns the Hamming distance of ``string1`` and ``string2``,
i.e. the number of positions at which the corresponding characters are different.
Note that the two strings must have the same length.
.. function:: jarowinkler_similarity(string1, string2) -> double
Returns the Jaro-Winkler similarity of ``string1`` and ``string2``.
.. function:: length(string) -> bigint
Returns the length of ``string`` in characters.
.. function:: levenshtein_distance(string1, string2) -> bigint
Returns the Levenshtein edit distance of ``string1`` and ``string2``,
i.e. the minimum number of single-character edits (insertions,
deletions or substitutions) needed to change ``string1`` into ``string2``.
.. function:: longest_common_prefix(string1, string2) -> varchar
Returns the longest common prefix between ``string1`` and ``string2``
.. function:: lower(string) -> varchar
Converts ``string`` to lowercase.
.. function:: lpad(string, size, padstring) -> varchar
Left pads ``string`` to ``size`` characters with ``padstring``.
If ``size`` is less than the length of ``string``, the result is
truncated to ``size`` characters. ``size`` must not be negative
and ``padstring`` must be non-empty.
.. function:: ltrim(string) -> varchar
Removes leading whitespace from ``string``. See :func:`!trim` for the set of
recognized whitespace characters.
.. function:: ltrim(string, chars) -> varchar :noindex:
Removes the longest substring containing only characters in ``chars`` from the beginning of the ``string``. ::
SELECT ltrim('test', 't'); -- est
SELECT ltrim('tetris', 'te'); -- ris
.. function:: replace(string, search) -> varchar
Removes all instances of ``search`` from ``string``.
.. function:: replace(string, search, replace) -> varchar
Replaces all instances of ``search`` with ``replace`` in ``string``.
If ``search`` is an empty string, inserts ``replace`` in front of every
character and at the end of the ``string``.
.. function:: reverse(string) -> varchar
Returns ``string`` with the characters in reverse order.
.. function:: rpad(string, size, padstring) -> varchar
Right pads ``string`` to ``size`` characters with ``padstring``.
If ``size`` is less than the length of ``string``, the result is
truncated to ``size`` characters. ``size`` must not be negative
and ``padstring`` must be non-empty.
.. function:: rtrim(string) -> varchar
Removes trailing whitespace from ``string``. See :func:`!trim` for the set of
recognized whitespace characters.
.. function:: rtrim(string, chars) -> varchar :noindex:
Removes the longest substring containing only characters in ``chars`` from the end of the ``string``. ::
SELECT rtrim('test', 't'); -- tes
SELECT rtrim('test...', '.'); -- test
.. function:: split(string, delimiter) -> array(varchar)
Splits ``string`` on ``delimiter`` and returns an array.
.. function:: split(string, delimiter, limit) -> array(varchar)
Splits ``string`` on ``delimiter`` and returns an array of size at most
``limit``. The last element in the array always contain everything
left in the ``string``. ``limit`` must be a positive number.
.. function:: split_part(string, delimiter, index) -> varchar
Splits ``string`` on ``delimiter`` and returns the field ``index``.
Field indexes start with ``1``. If the index is larger than than
the number of fields, then null is returned.
.. function:: split_to_map(string, entryDelimiter, keyValueDelimiter) -> map<varchar, varchar>
Splits ``string`` by ``entryDelimiter`` and ``keyValueDelimiter`` and returns a map.
``entryDelimiter`` splits ``string`` into key-value pairs. ``keyValueDelimiter`` splits
each pair into key and value. Note that ``entryDelimiter`` and ``keyValueDelimiter`` are
interpreted literally, i.e., as full string matches.
.. function:: split_to_map(string, entryDelimiter, keyValueDelimiter, function(K,V1,V2,R)) -> map<varchar, varchar>
Splits ``string`` by ``entryDelimiter`` and ``keyValueDelimiter`` and returns a map.
``entryDelimiter`` splits ``string`` into key-value pairs. ``keyValueDelimiter`` splits
each pair into key and value. Note that ``entryDelimiter`` and ``keyValueDelimiter`` are
interpreted literally, i.e., as full string matches. ``function(K,V1,V2,R)``
is invoked in cases of duplicate keys to resolve the value that should be in the map. ::
SELECT(split_to_map('a:1;b:2;a:3', ';', ':', (k, v1, v2) -> v1)); -- {"a": "1", "b": "2"}
SELECT(split_to_map('a:1;b:2;a:3', ';', ':', (k, v1, v2) -> CONCAT(v1, v2))); -- {"a": "13", "b": "2"}
.. function:: split_to_multimap(string, entryDelimiter, keyValueDelimiter) -> map(varchar, array(varchar))
Splits ``string`` by ``entryDelimiter`` and ``keyValueDelimiter`` and returns a map
containing an array of values for each unique key. ``entryDelimiter`` splits ``string``
into key-value pairs. ``keyValueDelimiter`` splits each pair into key and value. The
values for each key will be in the same order as they appeared in ``string``.
Note that ``entryDelimiter`` and ``keyValueDelimiter`` are interpreted literally,
i.e., as full string matches.
.. function:: strpos(string, substring) -> bigint
Returns the starting position of the first instance of ``substring`` in
``string``. Positions start with ``1``. If not found, ``0`` is returned.
.. function:: starts_with(string, substring) -> boolean
Returns whether ``string`` starts with ``substring``.
.. function:: strpos(string, substring, instance) -> bigint
Returns the position of the N-th ``instance`` of ``substring`` in ``string``.
``instance`` must be a positive number.
Positions start with ``1``. If not found, ``0`` is returned.
.. function:: strrpos(string, substring) -> bigint
Returns the starting position of the last instance of ``substring`` in ``string``.
Positions start with ``1``. If not found, ``0`` is returned.
.. function:: strrpos(string, substring, instance) -> bigint
Returns the position of the N-th ``instance`` of ``substring`` in ``string`` starting from the end of the string.
``instance`` must be a positive number.
Positions start with ``1``. If not found, ``0`` is returned.
.. function:: position(substring IN string) -> bigint
Returns the starting position of the first instance of ``substring`` in
``string``. Positions start with ``1``. If not found, ``0`` is returned.
.. function:: substr(string, start) -> varchar
Returns the rest of ``string`` from the starting position ``start``.
Positions start with ``1``. A negative starting position is interpreted
as being relative to the end of the string.
.. function:: substr(string, start, length) -> varchar
Returns a substring from ``string`` of length ``length`` from the starting
position ``start``. Positions start with ``1``. A negative starting
position is interpreted as being relative to the end of the string.
.. function:: trim(string) -> varchar
Removes leading and trailing whitespace from ``string``.
Recognized whitespace characters:
====== =========================== ====== ===========================
Code Description Code Description
====== =========================== ====== ===========================
9 TAB (horizontal tab) U+1680 Ogham Space Mark
10 LF (NL line feed, new line) U+2000 En Quad
11 VT (vertical tab) U+2001 Em Quad
12 FF (NP form feed, new page) U+2002 En Space
13 CR (carriage return) U+2003 Em Space
28 FS (file separator) U+2004 Three-Per-Em Space
29 GS (group separator) U+2005 Four-Per-Em Space
30 RS (record separator) U+2006 Four-Per-Em Space
31 US (unit separator) U+2008 Punctuation Space
32 Space U+2009 Thin Space
_ _ U+200a Hair Space
_ _ U+200a Hair Space
_ _ U+2028 Line Separator
_ _ U+2029 Paragraph Separator
_ _ U+205f Medium Mathematical Space
_ _ U+3000 Ideographic Space
====== =========================== ====== ===========================
.. function:: trim(string, chars) -> varchar :noindex:
Removes the longest substring containing only characters in ``chars`` from the beginning and end of the ``string``. ::
SELECT trim('test', 't'); -- es
SELECT trim('.t.e.s.t.', '.t'); -- e.s
.. function:: upper(string) -> varchar
Converts ``string`` to uppercase.
.. function:: word_stem(word) -> varchar
Returns the stem of ``word`` in the English language.
.. function:: word_stem(word, lang) -> varchar
Returns the stem of ``word`` in the ``lang`` language.
.. function:: normalize(string) -> varchar
Transforms ``string`` with NFC normalization form.
.. function:: normalize(string, form) -> varchar
Transforms ``string`` with the specified normalization form.
``form`` must be be one of the following keywords:
======== ===========
Form Description
======== ===========
``NFD`` Canonical Decomposition
``NFC`` Canonical Decomposition, followed by Canonical Composition
``NFKD`` Compatibility Decomposition
``NFKC`` Compatibility Decomposition, followed by Canonical Composition
======== ===========
.. note::
This SQL-standard function has special syntax and requires
specifying ``form`` as a keyword, not as a string.
.. function:: to_utf8(string) -> varbinary
Encodes ``string`` into a UTF-8 varbinary representation.
.. function:: from_utf8(binary) -> varchar
Decodes a UTF-8 encoded string from ``binary``. Invalid UTF-8 sequences
are replaced with the Unicode replacement character ``U+FFFD``.
.. function:: from_utf8(binary, replace) -> varchar
Decodes a UTF-8 encoded string from ``binary``. Invalid UTF-8 sequences
are replaced with `replace`. The replacement string `replace` must either
be a single character or empty (in which case invalid characters are
removed).