Back to Tokenizers

Encode Inputs

docs/source-doc-builder/api/encode-inputs.mdx

0.23.12.1 KB
Original Source

Encode Inputs

<tokenizerslangcontent> <python> These types represent all the different kinds of input that a [`~tokenizers.Tokenizer`] accepts when using [`~tokenizers.Tokenizer.encode_batch`].

TextEncodeInput[[[[tokenizers.TextEncodeInput]]]]

<code>tokenizers.TextEncodeInput</code>

Represents a textual input for encoding. Can be either:

alias of Union[str, Tuple[str, str], List[str]].

PreTokenizedEncodeInput[[[[tokenizers.PreTokenizedEncodeInput]]]]

<code>tokenizers.PreTokenizedEncodeInput</code>

Represents a pre-tokenized input for encoding. Can be either:

alias of Union[List[str], Tuple[str], Tuple[Union[List[str], Tuple[str]], Union[List[str], Tuple[str]]], List[Union[List[str], Tuple[str]]]].

EncodeInput[[[[tokenizers.EncodeInput]]]]

<code>tokenizers.EncodeInput</code>

Represents all the possible types of input for encoding. Can be:

alias of Union[str, Tuple[str, str], List[str], Tuple[str], Tuple[Union[List[str], Tuple[str]], Union[List[str], Tuple[str]]], List[Union[List[str], Tuple[str]]]]. </python> <rust> The Rust API Reference is available directly on the Docs.rs website. </rust> <node> The node API has not been documented yet. </node> </tokenizerslangcontent>