Back to Paradedb

Simple

docs/documentation/tokenizers/available-tokenizers/simple.mdx

0.23.3533 B
Original Source

The simple tokenizer splits on any non-alphanumeric character (e.g. whitespace, punctuation, symbols). All characters are lowercased by default.

sql
CREATE INDEX search_idx ON mock_items
USING bm25 (id, (description::pdb.simple))
WITH (key_field='id');

To get a feel for this tokenizer, run the following command and replace the text with your own:

sql
SELECT 'Tokenize me!'::pdb.simple::text[];
ini
     text
---------------
 {tokenize,me}
(1 row)