docs/examples/node_postprocessor/openvino_rerank.ipynb
<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/node_postprocessor/openvino_rerank.ipynb" target="_parent"></a>
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference. The OpenVINO™ Runtime supports various hardware devices including x86 and ARM CPUs, and Intel GPUs. It can help to boost deep learning performance in Computer Vision, Automatic Speech Recognition, Natural Language Processing and other common tasks.
Hugging Face rerank model can be supported by OpenVINO through OpenVINORerank class.
If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.
%pip install llama-index-postprocessor-openvino-rerank
%pip install llama-index-embeddings-openvino
!pip install llama-index
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
# load documents
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
from llama_index.embeddings.huggingface_openvino import OpenVINOEmbedding
OpenVINOEmbedding.create_and_save_openvino_model(
"BAAI/bge-small-en-v1.5", "./embedding_ov"
)
from llama_index.postprocessor.openvino_rerank import OpenVINORerank
OpenVINORerank.create_and_save_openvino_model(
"BAAI/bge-reranker-large", "./rerank_ov"
)
!optimum-cli export openvino --model HuggingFaceH4/zephyr-7b-beta --weight-format int4 llm_ov
from llama_index.postprocessor.openvino_rerank import OpenVINORerank
from llama_index.llms.openvino import OpenVINOLLM
from llama_index.core import Settings
Settings.embed_model = OpenVINOEmbedding(model_id_or_path="./embedding_ov")
Settings.llm = OpenVINOLLM(model_id_or_path="./llm_ov")
ov_rerank = OpenVINORerank(
model_id_or_path="./rerank_ov", device="cpu", top_n=2
)
index = VectorStoreIndex.from_documents(documents=documents)
query_engine = index.as_query_engine(
similarity_top_k=10,
node_postprocessors=[ov_rerank],
)
response = query_engine.query(
"What did Sam Altman do in this essay?",
)
print(response)
print(response.get_formatted_sources(length=200))
query_engine = index.as_query_engine(
similarity_top_k=2,
)
response = query_engine.query(
"What did Sam Altman do in this essay?",
)
Retrieved context is irrelevant and response is hallucinated.
print(response)
print(response.get_formatted_sources(length=200))
For more information refer to: