docs/source-doc-builder/installation.mdx
You should install š¤ Tokenizers in a virtual environment. If you're unfamiliar with Python virtual environments, check out the user guide. Create a virtual environment with the version of Python you're going to use and activate it.
š¤ Tokenizers can be installed using pip as follows:
pip install tokenizers
To use this method, you need to have the Rust language installed. You can follow the official guide for more information.
If you are using a unix based OS, the installation should be as simple as running:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
Or you can easily update it with the following command:
rustup update
Once rust is installed, we can start retrieving the sources for š¤ Tokenizers:
git clone https://github.com/huggingface/tokenizers
Then we go into the python bindings folder:
cd tokenizers/bindings/python
At this point you should have your virtual environment already activated. In order to compile š¤ Tokenizers, you need to:
pip install -e .
š¤ Tokenizers is available on crates.io.
You just need to add it to your Cargo.toml:
cargo add tokenizers
You can simply install š¤ Tokenizers with npm using:
npm install tokenizers