Back to Llama Index

OpenVINO Rerank

docs/examples/node_postprocessor/openvino_rerank.ipynb

0.14.213.6 KB
Original Source

<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/node_postprocessor/openvino_rerank.ipynb" target="_parent"></a>

OpenVINO Rerank

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference. The OpenVINO™ Runtime supports various hardware devices including x86 and ARM CPUs, and Intel GPUs. It can help to boost deep learning performance in Computer Vision, Automatic Speech Recognition, Natural Language Processing and other common tasks.

Hugging Face rerank model can be supported by OpenVINO through OpenVINORerank class.

If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

python
%pip install llama-index-postprocessor-openvino-rerank
%pip install llama-index-embeddings-openvino
python
!pip install llama-index

Download Data

python
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
python
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

# load documents
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()

Download Embedding, Rerank models and LLM

python
from llama_index.embeddings.huggingface_openvino import OpenVINOEmbedding

OpenVINOEmbedding.create_and_save_openvino_model(
    "BAAI/bge-small-en-v1.5", "./embedding_ov"
)
python
from llama_index.postprocessor.openvino_rerank import OpenVINORerank

OpenVINORerank.create_and_save_openvino_model(
    "BAAI/bge-reranker-large", "./rerank_ov"
)
python
!optimum-cli export openvino --model HuggingFaceH4/zephyr-7b-beta --weight-format int4 llm_ov

Retrieve top 10 most relevant nodes, then filter with OpenVINO Rerank

python
from llama_index.postprocessor.openvino_rerank import OpenVINORerank
from llama_index.llms.openvino import OpenVINOLLM
from llama_index.core import Settings


Settings.embed_model = OpenVINOEmbedding(model_id_or_path="./embedding_ov")
Settings.llm = OpenVINOLLM(model_id_or_path="./llm_ov")

ov_rerank = OpenVINORerank(
    model_id_or_path="./rerank_ov", device="cpu", top_n=2
)
python
index = VectorStoreIndex.from_documents(documents=documents)
python
query_engine = index.as_query_engine(
    similarity_top_k=10,
    node_postprocessors=[ov_rerank],
)
response = query_engine.query(
    "What did Sam Altman do in this essay?",
)
python
print(response)
python
print(response.get_formatted_sources(length=200))

Directly retrieve top 2 most similar nodes

python
query_engine = index.as_query_engine(
    similarity_top_k=2,
)
response = query_engine.query(
    "What did Sam Altman do in this essay?",
)

Retrieved context is irrelevant and response is hallucinated.

python
print(response)
python
print(response.get_formatted_sources(length=200))

For more information refer to: