Back to Machinelearning

PACKAGE

src/Microsoft.ML.Tokenizers.Data.Gpt2/PACKAGE.md

5.0.01.1 KB
Original Source

About

The Microsoft.ML.Tokenizers.Data.Gpt2 includes the Tiktoken tokenizer data file gpt2.tiktoken, which is utilized by models such as Gpt-2.

Key Features

  • This package mainly contains the gpt2.tiktoken file, which is used by the Tiktoken tokenizer. This data file is used by the Gpt-2 model.

How to Use

Reference this package in your project to use the Tiktoken tokenizer with the specified model.

csharp

// Create a tokenizer for the specified model
Tokenizer tokenizer = TiktokenTokenizer.CreateForModel("Gpt-2");

Main Types

Users shouldn't use any types exposed by this package directly. This package is intended to provide tokenizer data files.

Additional Documentation

<!-- The related packages associated with this package -->

Microsoft.ML.Tokenizers

Feedback & Contributing

Microsoft.ML.Tokenizers.Data.Gpt2 is released as open source under the MIT license. Bug reports and contributions are welcome at the GitHub repository.