Back to Llama Index

Local Embeddings with IPEX-LLM on Intel CPU

docs/examples/embeddings/ipex_llm.ipynb

0.14.211.8 KB
Original Source

Local Embeddings with IPEX-LLM on Intel CPU

IPEX-LLM is a PyTorch library for running LLM on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max) with very low latency.

This example goes over how to use LlamaIndex to conduct embedding tasks with ipex-llm optimizations on Intel CPU. This would be helpful in applications such as RAG, document QA, etc.

Note

You could refer to here for full examples of IpexLLMEmbedding. Please note that for running on Intel CPU, please specify -d 'cpu' in command argument when running the examples.

Install llama-index-embeddings-ipex-llm

This will also install ipex-llm and its dependencies.

python
%pip install llama-index-embeddings-ipex-llm

IpexLLMEmbedding

python
from llama_index.embeddings.ipex_llm import IpexLLMEmbedding

embedding_model = IpexLLMEmbedding(model_name="BAAI/bge-large-en-v1.5")

Please note that IpexLLMEmbedding currently only provides optimization for Hugging Face Bge models.

python
sentence = "IPEX-LLM is a PyTorch library for running LLM on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max) with very low latency."
query = "What is IPEX-LLM?"

text_embedding = embedding_model.get_text_embedding(sentence)
print(f"embedding[:10]: {text_embedding[:10]}")

text_embeddings = embedding_model.get_text_embedding_batch([sentence, query])
print(f"text_embeddings[0][:10]: {text_embeddings[0][:10]}")
print(f"text_embeddings[1][:10]: {text_embeddings[1][:10]}")

query_embedding = embedding_model.get_query_embedding(query)
print(f"query_embedding[:10]: {query_embedding[:10]}")