Back to Paradedb

Literal Normalized

docs/documentation/tokenizers/available-tokenizers/literal-normalized.mdx

0.23.31.4 KB
Original Source
<Note> For all patch versions greater than `0.20.8` in the `20` minor version, and all patch versions greater than `0.21.4` in the `21` minor version, fields using the [literal normalized](/documentation/tokenizers/available-tokenizers/literal-normalized) tokenizer are also columnar indexed. This means that they can be used in [aggregates](/documentation/aggregates/overview) and [Top K queries](/documentation/sorting/topk). Indexes created prior to these versions must be reindexed to use this feature. </Note>

The literal normalized tokenizer is similar to the literal tokenizer in that it does not split the source text. All text is treated as a single token, regardless of how many words are contained.

However, unlike the literal tokenizer, this tokenizer allows token filters to be applied. By default, the literal normalized tokenizer also lowercases the text.

sql
CREATE INDEX search_idx ON mock_items
USING bm25 (id, (description::pdb.literal_normalized))
WITH (key_field='id');

To get a feel for this tokenizer, run the following command and replace the text with your own:

sql
SELECT 'Tokenize me!'::pdb.literal_normalized::text[];
ini
       text
------------------
 {"tokenize me!"}
(1 row)