Back to Tokenizers

Input Sequences

docs/source-doc-builder/api/input-sequences.mdx

0.23.11.3 KB
Original Source

Input Sequences

<tokenizerslangcontent> <python> These types represent all the different kinds of sequence that can be used as input of a Tokenizer. Globally, any sequence can be either a string or a list of strings, according to the operating mode of the tokenizer: `raw text` vs `pre-tokenized`.

TextInputSequence[[tokenizers.TextInputSequence]]

<code>tokenizers.TextInputSequence</code>

A str that represents an input sequence

PreTokenizedInputSequence[[tokenizers.PreTokenizedInputSequence]]

<code>tokenizers.PreTokenizedInputSequence</code>

A pre-tokenized input sequence. Can be one of:

  • A List of str
  • A Tuple of str

alias of Union[List[str], Tuple[str]].

InputSequence[[tokenizers.InputSequence]]

<code>tokenizers.InputSequence</code>

Represents all the possible types of input sequences for encoding. Can be:

alias of Union[str, List[str], Tuple[str]]. </python> <rust> The Rust API Reference is available directly on the Docs.rs website. </rust> <node> The node API has not been documented yet. </node> </tokenizerslangcontent>