Back to Tokenizers

Installation

docs/source-doc-builder/installation.mdx

0.23.11.7 KB
Original Source

Installation

<tokenizerslangcontent> <python> šŸ¤— Tokenizers is tested on Python 3.5+.

You should install šŸ¤— Tokenizers in a virtual environment. If you're unfamiliar with Python virtual environments, check out the user guide. Create a virtual environment with the version of Python you're going to use and activate it.

Installation with pip

šŸ¤— Tokenizers can be installed using pip as follows:

bash
pip install tokenizers

Installation from sources

To use this method, you need to have the Rust language installed. You can follow the official guide for more information.

If you are using a unix based OS, the installation should be as simple as running:

bash
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Or you can easily update it with the following command:

bash
rustup update

Once rust is installed, we can start retrieving the sources for šŸ¤— Tokenizers:

bash
git clone https://github.com/huggingface/tokenizers

Then we go into the python bindings folder:

bash
cd tokenizers/bindings/python

At this point you should have your virtual environment already activated. In order to compile šŸ¤— Tokenizers, you need to:

bash
pip install -e .
</python> <rust> ## Crates.io

šŸ¤— Tokenizers is available on crates.io.

You just need to add it to your Cargo.toml:

bash
cargo add tokenizers
</rust> <node> ## Installation with npm

You can simply install šŸ¤— Tokenizers with npm using:

bash
npm install tokenizers
</node> </tokenizerslangcontent>