Back to Materialize

normalize function

doc/user/content/sql/functions/normalize.md

1232.0 KB
Original Source

normalize converts a string to a specified Unicode normalization form.

Signatures

{{% include-example file="examples/normalize" example="syntax" %}}

ParameterTypeDescription
strstringThe string to normalize.
formkeywordThe Unicode normalization form: NFC, NFD, NFKC, or NFKD (unquoted, case-insensitive keywords). Defaults to NFC.

Return value

normalize returns a string.

Details

Unicode normalization is a process that converts different binary representations of characters to a canonical form. This is useful when comparing strings that may have been encoded differently.

The four normalization forms are:

  • NFC (Normalization Form Canonical Composition): Canonical decomposition, followed by canonical composition. This is the default and most commonly used form.
  • NFD (Normalization Form Canonical Decomposition): Canonical decomposition only. Characters are decomposed into their constituent parts.
  • NFKC (Normalization Form Compatibility Composition): Compatibility decomposition, followed by canonical composition. This applies more aggressive transformations, converting compatibility variants to standard forms.
  • NFKD (Normalization Form Compatibility Decomposition): Compatibility decomposition only.

For more information, see:

Examples

{{% include-example file="examples/normalize" example="normalize-default" %}}

<hr/>

{{% include-example file="examples/normalize" example="normalize-nfc" %}}

<hr/>

{{% include-example file="examples/normalize" example="normalize-nfd" %}}

<hr/>

{{% include-example file="examples/normalize" example="normalize-nfkc-ligatures" %}}

<hr/>

{{% include-example file="examples/normalize" example="normalize-nfkc-superscript" %}}