llama-index-integrations/llms/llama-index-llms-huggingface/README.md
Install the required Python packages:
%pip install llama-index-llms-huggingface
%pip install llama-index-llms-huggingface-api
!pip install "transformers[torch]" "huggingface_hub[inference]"
!pip install llama-index
Set the Hugging Face API token as an environment variable:
export HUGGING_FACE_TOKEN=your_token_here
import os
from typing import List, Optional
from llama_index.llms.huggingface import HuggingFaceLLM
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
To run the model locally on your machine:
locally_run = HuggingFaceLLM(model_name="HuggingFaceH4/zephyr-7b-alpha")
To run the model remotely using Hugging Face's Inference API:
HF_TOKEN: Optional[str] = os.getenv("HUGGING_FACE_TOKEN")
remotely_run = HuggingFaceInferenceAPI(
model_name="HuggingFaceH4/zephyr-7b-alpha", token=HF_TOKEN
)
You can also use the Inference API anonymously without providing a token:
remotely_run_anon = HuggingFaceInferenceAPI(
model_name="HuggingFaceH4/zephyr-7b-alpha"
)
If you do not provide a model name, Hugging Face's recommended model is used:
remotely_run_recommended = HuggingFaceInferenceAPI(token=HF_TOKEN)
To generate a text completion using the remote model:
completion_response = remotely_run_recommended.complete("To infinity, and")
print(completion_response)
If you modify the LLM, ensure you change the global tokenizer to match:
from llama_index.core import set_global_tokenizer
from transformers import AutoTokenizer
set_global_tokenizer(
AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-alpha").encode
)
https://docs.llamaindex.ai/en/stable/examples/llm/huggingface/