Back to Llama Index

HuggingFace LLM - Camel-5b

docs/examples/customization/llms/SimpleIndexDemo-Huggingface_camel.ipynb

0.14.213.0 KB
Original Source

<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/customization/llms/SimpleIndexDemo-Huggingface_camel.ipynb" target="_parent"></a>

HuggingFace LLM - Camel-5b

If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

python
%pip install llama-index-llms-huggingface
python
!pip install llama-index
python
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.huggingface import HuggingFaceLLM
from llama_index.core import Settings

Download Data

python
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

Load documents, build the VectorStoreIndex

python
# load documents
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
python
# setup prompts - specific to StableLM
from llama_index.core import PromptTemplate

# This will wrap the default prompts that are internal to llama-index
# taken from https://huggingface.co/Writer/camel-5b-hf
query_wrapper_prompt = PromptTemplate(
    "Below is an instruction that describes a task. "
    "Write a response that appropriately completes the request.\n\n"
    "### Instruction:\n{query_str}\n\n### Response:"
)
python
import torch

llm = HuggingFaceLLM(
    context_window=2048,
    max_new_tokens=256,
    generate_kwargs={"temperature": 0.25, "do_sample": False},
    query_wrapper_prompt=query_wrapper_prompt,
    tokenizer_name="Writer/camel-5b-hf",
    model_name="Writer/camel-5b-hf",
    device_map="auto",
    tokenizer_kwargs={"max_length": 2048},
    # uncomment this if using CUDA to reduce memory usage
    # model_kwargs={"torch_dtype": torch.float16}
)

Settings.chunk_size = 512
Settings.llm = llm
python
index = VectorStoreIndex.from_documents(documents)

Query Index

python
# set Logging to DEBUG for more detailed outputs
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
python
print(response)

Query Index - Streaming

python
query_engine = index.as_query_engine(streaming=True)
python
# set Logging to DEBUG for more detailed outputs
response_stream = query_engine.query("What did the author do growing up?")
python
# can be slower to start streaming since llama-index often involves many LLM calls
response_stream.print_response_stream()
python
# can also get a normal response object
response = response_stream.get_response()
print(response)
python
# can also iterate over the generator yourself
generated_text = ""
for text in response.response_gen:
    generated_text += text