docs/code/microsoft-ml-tokenizers-migration-guide.md
This guide provides general guidance on how to migrate from various tokenizer libraries to Microsoft.ML.Tokenizers for Tiktoken.
| Microsoft.DeepDev.TokenizerLib | Microsoft.ML.Tokenizers |
|---|---|
| TikTokenizer | Tokenizer |
| ITokenizer | Tokenizer |
| TokenizerBuilder | TiktokenTokenizer.CreateForModel |
| TiktokenTokenizer.CreateForModel(Async/Stream) user provided file stream |
TiktokenTokenizer.CreateForModel function. The table lists the mapping of model names to the corresponding vocabulary files used with each model. This table offers clarity regarding the vocabulary file linked with each model, alleviating users from the concern of carrying or downloading such vocabulary files if they utilize one of the models listed.TiktokenTokenizer.CreateForModel/Async method to create the tokenizer using the model name, or a provided stream.TiktokenTokenizer.CountTokens for getting the token count and TiktokenTokenizer.EncodeToIds for getting the encode ids.TiktokenTokenizer.GetIndexByTokenCount or GetIndexByTokenCountFromEnd to find the index to truncate from the start or end of a string, respectively.