Back to Tokenizers

Pre-tokenizers

docs/source-doc-builder/api/pre-tokenizers.mdx

0.23.11.1 KB
Original Source

Pre-tokenizers

<tokenizerslangcontent> <python> ## BertPreTokenizer

[[autodoc]] tokenizers.pre_tokenizers.BertPreTokenizer

ByteLevel

[[autodoc]] tokenizers.pre_tokenizers.ByteLevel

CharDelimiterSplit

[[autodoc]] tokenizers.pre_tokenizers.CharDelimiterSplit

Digits

[[autodoc]] tokenizers.pre_tokenizers.Digits

Metaspace

[[autodoc]] tokenizers.pre_tokenizers.Metaspace

PreTokenizer

[[autodoc]] tokenizers.pre_tokenizers.PreTokenizer

Punctuation

[[autodoc]] tokenizers.pre_tokenizers.Punctuation

Sequence

[[autodoc]] tokenizers.pre_tokenizers.Sequence

Split

[[autodoc]] tokenizers.pre_tokenizers.Split

UnicodeScripts

[[autodoc]] tokenizers.pre_tokenizers.UnicodeScripts

Whitespace

[[autodoc]] tokenizers.pre_tokenizers.Whitespace

WhitespaceSplit

[[autodoc]] tokenizers.pre_tokenizers.WhitespaceSplit </python> <rust> The Rust API Reference is available directly on the Docs.rs website. </rust> <node> The node API has not been documented yet. </node> </tokenizerslangcontent>