Back to Polars

String

py-polars/docs/source/reference/sql/functions/string.rst

latest30.0 KB
Original Source

String

.. list-table:: :header-rows: 1 :widths: 20 60

    • Function
    • Description
    • :ref:BIT_LENGTH <bit_length>
    • Returns the length of the input string in bits.
    • :ref:CONCAT <concat>
    • Returns all input expressions concatenated together as a string.
    • :ref:CONCAT_WS <concat_ws>
    • Returns all input expressions concatenated together (and interleaved with a separator) as a string.
    • :ref:DATE <date>
    • Converts a formatted date string to an actual Date value.
    • :ref:ENDS_WITH <ends_with>
    • Returns True if the value ends with the second argument.
    • :ref:INITCAP <initcap>
    • Converts the first letter of each word to uppercase, and the rest to lowercase.
    • :ref:LEFT <left>
    • Returns the first (leftmost) n characters.
    • :ref:LENGTH <length>
    • Returns the character length of the string.
    • :ref:LOWER <lower>
    • Returns a lowercased column.
    • :ref:LPAD <lpad>
    • Pads a string on the left to a specified length using an optional fill character (default is a space).
    • :ref:LTRIM <ltrim>
    • Strips whitespaces from the left.
    • :ref:NORMALIZE <normalize>
    • Convert string to the specified Unicode normalization form (one of NFC, NFD, NFKC, NFKD).
    • :ref:OCTET_LENGTH <octet_length>
    • Returns the length of a given string in bytes.
    • :ref:POSITION <position>
    • Returns the position of a substring within a string.
    • :ref:REGEXP_LIKE <regexp_like>
    • Returns True if pattern matches the value (optional: flags).
    • :ref:REPLACE <replace>
    • Replaces a given substring with another string.
    • :ref:REVERSE <reverse>
    • Returns the reversed string.
    • :ref:RIGHT <right>
    • Returns the last (rightmost) n characters.
    • :ref:RPAD <rpad>
    • Pads a string on the right to a specified length using an optional fill character (default is a space).
    • :ref:RTRIM <rtrim>
    • Strips whitespaces from the right.
    • :ref:SPLIT_PART <split_part>
    • Splits a string by another substring/delimiter, returning the n-th part; note that n is 1-indexed.
    • :ref:STARTS_WITH <starts_with>
    • Returns True if the value starts with the second argument.
    • :ref:STRING_TO_ARRAY <string_to_array>
    • Splits a string by another substring/delimiter, returning an array of strings.
    • :ref:STRPOS <strpos>
    • Returns the index of the given substring in the target string; note that the result is 1-indexed (returning 0 indicates that the given string was not found).
    • :ref:STRPTIME <strptime>
    • Converts a string to a Datetime using a strftime-compatible formatting string.
    • :ref:SUBSTR <substr>, :ref:SUBSTRING <substring>
    • Returns a slice of the string data (from a start index, with an optional length); note that start is 1-indexed.
    • :ref:TIMESTAMP <timestamp>
    • Converts a formatted timestamp/datetime string to an actual Datetime value.
    • :ref:TRIM <trim>
    • Strips characters from the left and/or right of a string.
    • :ref:UPPER <upper>
    • Returns an uppercased column.

.. _bit_length:

BIT_LENGTH

Returns the length of the input string in bits.

Example:

.. code-block:: python

df = pl.DataFrame({"foo": ["a", "bb", "ccc", "dddd"]})
df.sql("""
  SELECT foo, BIT_LENGTH(foo) AS n_bits FROM self
""")
# shape: (4, 2)
# ┌──────┬────────┐
# │ foo  ┆ n_bits │
# │ ---  ┆ ---    │
# │ str  ┆ u32    │
# ╞══════╪════════╡
# │ a    ┆ 8      │
# │ bb   ┆ 16     │
# │ ccc  ┆ 24     │
# │ dddd ┆ 32     │
# └──────┴────────┘

.. _concat:

CONCAT

Returns all input expressions concatenated together as a string.

Example:

.. code-block:: python

df = pl.DataFrame(
  {
    "foo": ["aa", "b", "c", "dd"],
    "bar": ["zz", "yy", "xx", "ww"],
  }
)
df.sql("""
  SELECT CONCAT(foo, bar) AS foobar FROM self
""")
# shape: (4, 1)
# ┌────────┐
# │ foobar │
# │ ---    │
# │ str    │
# ╞════════╡
# │ aazz   │
# │ byy    │
# │ cxx    │
# │ ddww   │
# └────────┘

.. _concat_ws:

CONCAT_WS

Returns all input expressions concatenated together (and interleaved with a separator) as a string.

Example:

.. code-block:: python

df = pl.DataFrame(
  {
    "foo": ["aa", "b", "c", "dd"],
    "bar": ["zz", "yy", "xx", "ww"],
  }
)
df.sql("""
  SELECT CONCAT_WS(':', foo, bar) AS foobar FROM self
""")
# shape: (4, 1)
# ┌────────┐
# │ foobar │
# │ ---    │
# │ str    │
# ╞════════╡
# │ aa:zz  │
# │ b:yy   │
# │ c:xx   │
# │ dd:ww  │
# └────────┘

.. _date:

DATE

Converts a formatted string date to an actual Date type; ISO-8601 format is assumed unless a strftime-compatible formatting string is provided as the second parameter.

.. tip::

DATE is also supported as a typed literal (this form does not allow a format string).

.. code-block:: sql

SELECT DATE '2021-01-01' AS dt

Example:

.. code-block:: python

df = pl.DataFrame(
  {
    "s_dt1": ["1969-10-30", "2024-07-05", "2077-02-28"],
    "s_dt2": ["10 February 1920", "5 July 2077", "28 April 2000"],
  }
)
df.sql("""
  SELECT
    DATE(s_dt1) AS dt1,
    DATE(s_dt2, '%d %B %Y') AS dt2
  FROM self
""")
# shape: (3, 2)
# ┌────────────┬────────────┐
# │ dt1        ┆ dt2        │
# │ ---        ┆ ---        │
# │ date       ┆ date       │
# ╞════════════╪════════════╡
# │ 1969-10-30 ┆ 1920-02-10 │
# │ 2024-07-05 ┆ 2077-07-05 │
# │ 2077-02-28 ┆ 2000-04-28 │
# └────────────┴────────────┘

.. _ends_with:

ENDS_WITH

Returns True if the value ends with the second argument.

Example:

.. code-block:: python

df = pl.DataFrame(
  {
    "foo": ["aa", "bb", "cc", "dd"],
    "bar": ["zz", "yy", "xx", "ww"],
  }
)
df.sql("""
  SELECT foo, ENDS_WITH(foo, 'a') AS ends_a FROM self
""")
# shape: (4, 2)
# ┌─────┬────────┐
# │ foo ┆ ends_a │
# │ --- ┆ ---    │
# │ str ┆ bool   │
# ╞═════╪════════╡
# │ aa  ┆ true   │
# │ bb  ┆ false  │
# │ cc  ┆ false  │
# │ dd  ┆ false  │
# └─────┴────────┘

.. _initcap:

INITCAP

Converts the first letter of each word to uppercase, and the rest to lowercase.

Example:

.. code-block:: python

df = pl.DataFrame({"bar": ["hello world", "HELLO", "wOrLd"]})
df.sql("""
  SELECT bar, INITCAP(bar) AS baz FROM self
""")
# shape: (3, 2)
# ┌─────────────┬─────────────┐
# │ bar         ┆ baz         │
# │ ---         ┆ ---         │
# │ str         ┆ str         │
# ╞═════════════╪═════════════╡
# │ hello world ┆ Hello World │
# │ HELLO       ┆ Hello       │
# │ wOrLd       ┆ World       │
# └─────────────┴─────────────┘

.. _left:

LEFT

Returns the first (leftmost) n characters.

Example:

.. code-block:: python

df = pl.DataFrame(
  {
    "foo": ["abcd", "efgh", "ijkl", "mnop"],
    "bar": ["zz", "yy", "xx", "ww"],
  }
)
df.sql("""
  SELECT
    LEFT(foo, 1) AS foo1,
    LEFT(bar, 2) AS bar2
  FROM self
""")

# shape: (4, 2)
# ┌──────┬──────┐
# │ foo1 ┆ bar2 │
# │ ---  ┆ ---  │
# │ str  ┆ str  │
# ╞══════╪══════╡
# │ a    ┆ zz   │
# │ e    ┆ yy   │
# │ i    ┆ xx   │
# │ m    ┆ ww   │
# └──────┴──────┘

.. _length:

LENGTH

Returns the character length of the string.

.. admonition:: Aliases

CHAR_LENGTH, CHARACTER_LENGTH

Example:

.. code-block:: python

df = pl.DataFrame(
  {
    "iso_lang":["de", "ru", "es"],
    "color": ["weiß", "синий", "amarillo"],
  }
)
df.sql("""
  SELECT
    iso_lang,
    color,
    LENGTH(color) AS n_chars,
    OCTET_LENGTH(color) AS n_bytes
  FROM self
""")

# shape: (3, 4)
# ┌──────────┬──────────┬─────────┬─────────┐
# │ iso_lang ┆ color    ┆ n_chars ┆ n_bytes │
# │ ---      ┆ ---      ┆ ---     ┆ ---     │
# │ str      ┆ str      ┆ u32     ┆ u32     │
# ╞══════════╪══════════╪═════════╪═════════╡
# │ de       ┆ weiß     ┆ 4       ┆ 5       │
# │ ru       ┆ синий    ┆ 5       ┆ 10      │
# │ es       ┆ amarillo ┆ 8       ┆ 8       │
# └──────────┴──────────┴─────────┴─────────┘

.. _lower:

LOWER

Returns a lowercased column.

Example:

.. code-block:: python

df = pl.DataFrame({"foo": ["AA", "BB", "CC", "DD"]})
df.sql("""
  SELECT foo, LOWER(foo) AS foo_lower FROM self
""")
# shape: (4, 2)
# ┌─────┬───────────┐
# │ foo ┆ foo_lower │
# │ --- ┆ ---       │
# │ str ┆ str       │
# ╞═════╪═══════════╡
# │ AA  ┆ aa        │
# │ BB  ┆ bb        │
# │ CC  ┆ cc        │
# │ DD  ┆ dd        │
# └─────┴───────────┘

.. _lpad:

LPAD

Pads a string on the left to a specified length using an optional fill character (default is a space). If the string is longer than the target length it is truncated.

Example:

.. code-block:: python

df = pl.DataFrame({"foo": ["hello", "hi", "a", None, "longstr"]})
df.sql("""
  SELECT
    foo,
    LPAD(foo, 7) AS lpad,
    LPAD(foo, 7, 'x') AS lpad_x,
    LPAD(foo, 4, '#') AS lpad_trunc
  FROM self
""")
# shape: (5, 4)
# ┌─────────┬─────────┬─────────┬────────────┐
# │ foo     ┆ lpad    ┆ lpad_x  ┆ lpad_trunc │
# │ ---     ┆ ---     ┆ ---     ┆ ---        │
# │ str     ┆ str     ┆ str     ┆ str        │
# ╞═════════╪═════════╪═════════╪════════════╡
# │ hello   ┆   hello ┆ xxhello ┆ hell       │
# │ hi      ┆      hi ┆ xxxxxhi ┆ ##hi       │
# │ a       ┆       a ┆ xxxxxxa ┆ ###a       │
# │ null    ┆ null    ┆ null    ┆ null       │
# │ longstr ┆ longstr ┆ longstr ┆ long       │
# └─────────┴─────────┴─────────┴────────────┘

.. _ltrim:

LTRIM

Strips whitespaces from the left.

Example:

.. code-block:: python

df = pl.DataFrame({"foo": ["   AA", " BB", "CC", "  DD"]})
df.sql("""
  SELECT foo, LTRIM(foo) AS trimmed FROM self
""")
# shape: (4, 2)
# ┌───────┬─────────┐
# │ foo   ┆ trimmed │
# │ ---   ┆ ---     │
# │ str   ┆ str     │
# ╞═══════╪═════════╡
# │    AA ┆ AA      │
# │  BB   ┆ BB      │
# │ CC    ┆ CC      │
# │   DD  ┆ DD      │
# └───────┴─────────┘

.. _normalize:

NORMALIZE

Convert string to the specified Unicode normalization form (one of NFC, NFD, NFKC, NFKD). If the normalization form is not provided, NFC is used by default.

Example:

.. code-block:: python

df = pl.DataFrame({
    "txt": [
        "Test",
        "Ⓣⓔⓢⓣ",
        "𝕿𝖊𝖘𝖙",
        "𝕋𝕖𝕤𝕥",
        "𝗧𝗲𝘀𝘁",
    ],
})
df.sql("""
  SELECT NORMALIZE(txt, NFKC) FROM self
""").to_series()
# shape: (5,)
# Series: 'txt' [str]
# [
#   "Test"
#   "Test"
#   "Test"
#   "Test"
#   "Test"
# ]

.. _octet_length:

OCTET_LENGTH

Returns the length of a given string in bytes.

Example:

.. code-block:: python

df = pl.DataFrame(
  {
    "iso_lang":["de", "ru", "es"],
    "color": ["weiß", "синий", "amarillo"],
  }
)
df.sql("""
  SELECT
    iso_lang,
    color,
    OCTET_LENGTH(color) AS n_bytes,
    LENGTH(color) AS n_chars
  FROM self
""")
# shape: (3, 4)
# ┌──────────┬──────────┬─────────┬─────────┐
# │ iso_lang ┆ color    ┆ n_bytes ┆ n_chars │
# │ ---      ┆ ---      ┆ ---     ┆ ---     │
# │ str      ┆ str      ┆ u32     ┆ u32     │
# ╞══════════╪══════════╪═════════╪═════════╡
# │ de       ┆ weiß     ┆ 5       ┆ 4       │
# │ ru       ┆ синий    ┆ 10      ┆ 5       │
# │ es       ┆ amarillo ┆ 8       ┆ 8       │
# └──────────┴──────────┴─────────┴─────────┘

.. _position:

POSITION

Returns the position of a substring within a string. The result is 1-indexed, returning 0 if the substring is not found.

.. seealso::

:ref:STRPOS <strpos> for the equivalent function-call syntax.

Example:

.. code-block:: python

df = pl.DataFrame({"foo": ["apple", "banana", "orange", "grape"]})
df.sql("""
  SELECT foo, POSITION('an' IN foo) AS pos FROM self
""")
# shape: (4, 2)
# ┌────────┬─────┐
# │ foo    ┆ pos │
# │ ---    ┆ --- │
# │ str    ┆ u32 │
# ╞════════╪═════╡
# │ apple  ┆ 0   │
# │ banana ┆ 2   │
# │ orange ┆ 3   │
# │ grape  ┆ 0   │
# └────────┴─────┘

.. _regexp_like:

REGEXP_LIKE

Returns True if pattern matches the value (optional: flags).

Example:

.. code-block:: python

df = pl.DataFrame({"foo": ["abc123", "4ab4a", "abc456", "321cba"]})
df.sql(r"""
  SELECT foo, REGEXP_LIKE(foo, '\d$') AS ends_in_digit FROM self
""")
# shape: (4, 2)
# ┌────────┬───────────────┐
# │ foo    ┆ ends_in_digit │
# │ ---    ┆ ---           │
# │ str    ┆ bool          │
# ╞════════╪═══════════════╡
# │ abc123 ┆ true          │
# │ 4ab4a  ┆ false         │
# │ abc456 ┆ true          │
# │ 321cba ┆ false         │
# └────────┴───────────────┘

.. _replace:

REPLACE

Replaces a given substring with another string.

Example:

.. code-block:: python

df = pl.DataFrame({"foo": ["abc123", "11aabb", "bcbc45"]})
df.sql("""
  SELECT foo, REPLACE(foo, 'b', '?') AS bar FROM self
""")
# shape: (3, 2)
# ┌────────┬────────┐
# │ foo    ┆ bar    │
# │ ---    ┆ ---    │
# │ str    ┆ str    │
# ╞════════╪════════╡
# │ abc123 ┆ a?c123 │
# │ 11aabb ┆ 11aa?? │
# │ bcbc45 ┆ ?c?c45 │
# └────────┴────────┘

.. _reverse:

REVERSE

Returns the reversed string.

Example:

.. code-block:: python

df = pl.DataFrame({"foo": ["apple", "banana", "orange", "grape"]})
df.sql("""
  SELECT foo, REVERSE(foo) AS oof FROM self
""")
# shape: (4, 2)
# ┌────────┬────────┐
# │ foo    ┆ oof    │
# │ ---    ┆ ---    │
# │ str    ┆ str    │
# ╞════════╪════════╡
# │ apple  ┆ elppa  │
# │ banana ┆ ananab │
# │ orange ┆ egnaro │
# │ grape  ┆ eparg  │
# └────────┴────────┘

.. _right:

Returns the last (rightmost) n characters.

Example:

.. code-block:: python

df = pl.DataFrame({"foo": ["ab", "cde", "fghi", "jklmn"]})
df.sql("""
  SELECT foo, RIGHT(foo, 2) AS bar FROM self
""")
# shape: (4, 2)
# ┌───────┬─────┐
# │ foo   ┆ bar │
# │ ---   ┆ --- │
# │ str   ┆ str │
# ╞═══════╪═════╡
# │ ab    ┆ ab  │
# │ cde   ┆ de  │
# │ fghi  ┆ hi  │
# │ jklmn ┆ mn  │
# └───────┴─────┘

.. _rpad:

RPAD

Pads a string on the right to a specified length using an optional fill character (default is a space). If the string is longer than the target length it is truncated.

Example:

.. code-block:: python

df = pl.DataFrame({"foo": ["hello", "hi", "a", None, "longstr"]})
df.sql("""
  SELECT
    foo,
    RPAD(foo, 7) AS rpad,
    RPAD(foo, 7, '-') AS rpad_dash,
    RPAD(foo, 4, '#') AS rpad_trunc
  FROM self
""")
# shape: (5, 4)
# ┌─────────┬─────────┬───────────┬────────────┐
# │ foo     ┆ rpad    ┆ rpad_dash ┆ rpad_trunc │
# │ ---     ┆ ---     ┆ ---       ┆ ---        │
# │ str     ┆ str     ┆ str       ┆ str        │
# ╞═════════╪═════════╪═══════════╪════════════╡
# │ hello   ┆ hello   ┆ hello--   ┆ hell       │
# │ hi      ┆ hi      ┆ hi-----   ┆ hi##       │
# │ a       ┆ a       ┆ a------   ┆ a###       │
# │ null    ┆ null    ┆ null      ┆ null       │
# │ longstr ┆ longstr ┆ longstr   ┆ long       │
# └─────────┴─────────┴───────────┴────────────┘

.. _rtrim:

RTRIM

Strips whitespaces from the right.

Example:

.. code-block:: python

df = pl.DataFrame({"bar": ["zz    ", "yy  ", "xx ", "ww   "]})
df.sql("""
  SELECT bar, RTRIM(bar) AS baz FROM self
""")
# shape: (4, 2)
# ┌────────┬─────┐
# │ bar    ┆ baz │
# │ ---    ┆ --- │
# │ str    ┆ str │
# ╞════════╪═════╡
# │ zz     ┆ zz  │
# │ yy     ┆ yy  │
# │ xx     ┆ xx  │
# │ ww     ┆ ww  │
# └────────┴─────┘

.. _split_part:

SPLIT_PART

Splits a string by another substring/delimiter, returning the n-th part; note that n is 1-indexed.

Example:

.. code-block:: python

df = pl.DataFrame({"s": ["xx,yy,zz", "abc,,xyz,???,hmm", None, ""]})
df.sql("""
  SELECT
    s,
    SPLIT_PART(s,',',1) AS "s+1",
    SPLIT_PART(s,',',3) AS "s+3",
    SPLIT_PART(s,',',-2) AS "s-2",
  FROM self
""")
# shape: (4, 4)
# ┌──────────────────┬──────┬──────┬──────┐
# │ s                ┆ s+1  ┆ s+3  ┆ s-2  │
# │ ---              ┆ ---  ┆ ---  ┆ ---  │
# │ str              ┆ str  ┆ str  ┆ str  │
# ╞══════════════════╪══════╪══════╪══════╡
# │ xx,yy,zz         ┆ xx   ┆ zz   ┆ yy   │
# │ abc,,xyz,???,hmm ┆ abc  ┆ xyz  ┆ ???  │
# │ null             ┆ null ┆ null ┆ null │
# │                  ┆      ┆      ┆      │
# └──────────────────┴──────┴──────┴──────┘

.. _starts_with:

STARTS_WITH

Returns True if the value starts with the second argument.

Example:

.. code-block:: python

df = pl.DataFrame({"foo": ["apple", "banana", "avocado", "grape"]})
df.sql("""
  SELECT foo, STARTS_WITH(foo, 'a') AS starts_a FROM self
""")
# shape: (4, 2)
# ┌─────────┬──────────┐
# │ foo     ┆ starts_a │
# │ ---     ┆ ---      │
# │ str     ┆ bool     │
# ╞═════════╪══════════╡
# │ apple   ┆ true     │
# │ banana  ┆ false    │
# │ avocado ┆ true     │
# │ grape   ┆ false    │
# └─────────┴──────────┘

.. _string_to_array:

STRING_TO_ARRAY

Splits a string by another substring/delimiter, returning an array of strings.

Example:

.. code-block:: python

df = pl.DataFrame({"foo": ["aa,bb,cc", "x,y"]})
df.sql("""
  SELECT foo, STRING_TO_ARRAY(foo, ',') AS arr FROM self
""")
# shape: (2, 2)
# ┌──────────┬────────────────────┐
# │ foo      ┆ arr                │
# │ ---      ┆ ---                │
# │ str      ┆ list[str]          │
# ╞══════════╪════════════════════╡
# │ aa,bb,cc ┆ ["aa", "bb", "cc"] │
# │ x,y      ┆ ["x", "y"]         │
# └──────────┴────────────────────┘

.. _strpos:

STRPOS

Returns the index of the given substring in the target string.

Example:

.. code-block:: python

df = pl.DataFrame({"foo": ["apple", "banana", "orange", "grape"]})
df.sql("""
  SELECT foo, STRPOS(foo, 'a') AS pos_a FROM self
""")
# shape: (4, 2)
# ┌────────┬───────┐
# │ foo    ┆ pos_a │
# │ ---    ┆ ---   │
# │ str    ┆ u32   │
# ╞════════╪═══════╡
# │ apple  ┆ 1     │
# │ banana ┆ 2     │
# │ orange ┆ 3     │
# │ grape  ┆ 3     │
# └────────┴───────┘

.. _strptime:

STRPTIME

Converts a string to a Datetime using a chrono strftime <https://docs.rs/chrono/latest/chrono/format/strftime/>_-compatible formatting string.

Example:

.. code-block:: python

df = pl.DataFrame(
  {
    "s_dt": ["1969 Oct 30", "2024 Jul 05", "2077 Feb 28"],
    "s_tm": ["00.30.55", "12.40.15", "10.45.00"],
  }
)
df.sql("""
  SELECT
    s_dt,
    s_tm,
    STRPTIME(s_dt || ' ' || s_tm, '%Y %b %d %H.%M.%S') AS dtm
  FROM self
""")
# shape: (3, 3)
# ┌─────────────┬──────────┬─────────────────────┐
# │ s_dt        ┆ s_tm     ┆ dtm                 │
# │ ---         ┆ ---      ┆ ---                 │
# │ str         ┆ str      ┆ datetime[μs]        │
# ╞═════════════╪══════════╪═════════════════════╡
# │ 1969 Oct 30 ┆ 00.30.55 ┆ 1969-10-30 00:30:55 │
# │ 2024 Jul 05 ┆ 12.40.15 ┆ 2024-07-05 12:40:15 │
# │ 2077 Feb 28 ┆ 10.45.00 ┆ 2077-02-28 10:45:00 │
# └─────────────┴──────────┴─────────────────────┘

.. _substr:

SUBSTR

Returns a slice of the string data (from a start index, with an optional length); note that start is 1-indexed.

.. seealso::

:ref:SUBSTRING <substring> supports the additional form (FROM ... FOR ...).

Example:

.. code-block:: python

df = pl.DataFrame({"foo": ["apple", "banana", "orange", "grape"]})
df.sql("""
  SELECT foo, SUBSTR(foo, 3, 4) AS foo_3_4 FROM self
""")
# shape: (4, 2)
# ┌────────┬─────────┐
# │ foo    ┆ foo_3_4 │
# │ ---    ┆ ---     │
# │ str    ┆ str     │
# ╞════════╪═════════╡
# │ apple  ┆ ple     │
# │ banana ┆ nana    │
# │ orange ┆ ange    │
# │ grape  ┆ ape     │
# └────────┴─────────┘

.. _substring:

SUBSTRING

Returns a slice of the string data (from a start index, with an optional length); note that start is 1-indexed.

.. seealso::

:ref:SUBSTR <substr> for the simpler function-call syntax.

Supports multiple forms:

  • SUBSTRING(str, start)
  • SUBSTRING(str, start, length)
  • SUBSTRING(str FROM start)
  • SUBSTRING(str FROM start FOR length)

Note that start is 1-indexed.

Example:

.. code-block:: python

df = pl.DataFrame({"foo": ["apple", "banana", "orange", "grape"]})
df.sql("""
  SELECT foo, SUBSTRING(foo FROM 2 FOR 3) AS foo_2_3 FROM self
""")
# shape: (4, 2)
# ┌────────┬─────────┐
# │ foo    ┆ foo_2_3 │
# │ ---    ┆ ---     │
# │ str    ┆ str     │
# ╞════════╪═════════╡
# │ apple  ┆ ppl     │
# │ banana ┆ ana     │
# │ orange ┆ ran     │
# │ grape  ┆ rap     │
# └────────┴─────────┘

.. _timestamp:

TIMESTAMP

Converts a formatted string date to an actual Datetime type; ISO-8601 format is assumed unless a strftime-compatible formatting string is provided as the second parameter.

.. admonition:: Aliases

DATETIME

.. tip::

TIMESTAMP is also supported as a typed literal (this form does not allow a format string).

.. code-block:: sql

SELECT TIMESTAMP '2077-12-10 22:30:45' AS ts

Example:

.. code-block:: python

df = pl.DataFrame(
  {
    "str_timestamp": [
      "1969 July 30, 00:30:55",
      "2030-October-08, 12:40:15",
      "2077 February 28, 10:45:00",
    ]
  }
)
df.sql("""
  SELECT str_timestamp, TIMESTAMP(str_date, '%Y.%m.%d') AS date FROM self
""")
# shape: (3, 2)
# ┌────────────┬────────────┐
# │ str_date   ┆ date       │
# │ ---        ┆ ---        │
# │ str        ┆ date       │
# ╞════════════╪════════════╡
# │ 1969.10.30 ┆ 1969-10-30 │
# │ 2024.07.05 ┆ 2024-07-05 │
# │ 2077.02.28 ┆ 2077-02-28 │
# └────────────┴────────────┘

.. _trim:

TRIM

Strips characters from the left and/or right of a string.

Syntax:

  • TRIM(expr) - strip whitespace from both sides.
  • TRIM(BOTH char FROM expr) - strip char from both sides.
  • TRIM(LEADING char FROM expr) - strip char from the left.
  • TRIM(TRAILING char FROM expr) - strip char from the right.

Example:

.. code-block:: python

df = pl.DataFrame({"foo": ["  hello  ", "  world  ", "  test  "]})
df.sql("""
  SELECT
    foo,
    TRIM(foo) AS trimmed,
    TRIM(LEADING ' ' FROM foo) AS ltrimmed,
    TRIM(TRAILING ' ' FROM foo) AS rtrimmed
  FROM self
""")
# shape: (3, 4)
# ┌───────────┬─────────┬──────────┬──────────┐
# │ foo       ┆ trimmed ┆ ltrimmed ┆ rtrimmed │
# │ ---       ┆ ---     ┆ ---      ┆ ---      │
# │ str       ┆ str     ┆ str      ┆ str      │
# ╞═══════════╪═════════╪══════════╪══════════╡
# │   hello   ┆ hello   ┆ hello    ┆   hello  │
# │   world   ┆ world   ┆ world    ┆   world  │
# │   test    ┆ test    ┆ test     ┆   test   │
# └───────────┴─────────┴──────────┴──────────┘

.. _upper:

UPPER

Returns an uppercased column.

Example:

.. code-block:: python

df = pl.DataFrame({"foo": ["apple", "banana", "orange", "grape"]})
df.sql("""
  SELECT foo, UPPER(foo) AS foo_upper FROM self
""")
# shape: (4, 2)
# ┌────────┬───────────┐
# │ foo    ┆ foo_upper │
# │ ---    ┆ ---       │
# │ str    ┆ str       │
# ╞════════╪═══════════╡
# │ apple  ┆ APPLE     │
# │ banana ┆ BANANA    │
# │ orange ┆ ORANGE    │
# │ grape  ┆ GRAPE     │
# └────────┴───────────┘