docs/articles_en/openvino-workflow-generative/genai-model-preparation.rst
.. meta:: :description: Learn how to use Hugging Face Hub and Optimum Intel APIs to prepare generative models for inference.
Since generative AI models tend to be big and resource-heavy, it is advisable to optimize them for efficient inference. This article will show how to prepare LLM models for inference with OpenVINO by:
Downloading Models from Hugging Face <#download-generative-models-from-hugging-face-hub>__Downloading Models from Model Scope <#download-generative-models-from-model-scope>__Converting and Optimizing Generative Models <#convert-and-optimize-generative-models>__Download Generative Models From Hugging Face Hub ###############################################################################
Pre-converted and pre-optimized models are available in the OpenVINO Toolkit <https://huggingface.co/OpenVINO>__
organization, under the model section <https://huggingface.co/OpenVINO#models>__, or under
different model collections:
LLM: <https://huggingface.co/collections/OpenVINO/llm-6687aaa2abca3bbcec71a9bd>__Speech-to-Text <https://huggingface.co/collections/OpenVINO/speech-to-text-672321d5c070537a178a8aeb>__Speculative Decoding Draft Models <https://huggingface.co/collections/OpenVINO/speculative-decoding-draft-models-673f5d944d58b29ba6e94161>__You can also use the huggingface_hub package to download models:
.. code-block:: console
pip install huggingface_hub huggingface-cli download "OpenVINO/phi-2-fp16-ov" --local-dir model_path
The models can be used in OpenVINO immediately after download. No dependencies are required except huggingface_hub.
Download Generative Models From Model Scope ###############################################################################
To download models from Model Scope <https://www.modelscope.cn/home>__,
use the modelscope package:
.. code-block:: console
pip install modelscope modelscope download --model "Qwen/Qwen2-7b" --local_dir model_path
Models downloaded via Model Scope are available in Pytorch format only and they must
be :doc:converted to OpenVINO IR <../../openvino-workflow/model-preparation/convert-model-to-ir>
before inference.
Convert and Optimize Generative Models ###############################################################################
OpenVINO works best with models in the OpenVINO IR format, both in full precision and quantized. If your selected model has not been pre-optimized, you can easily do it yourself, using a single optimum-cli command. For that, make sure optimum-intel is installed on your system:
.. code-block:: console
pip install optimum-intel[openvino]
While optimizing models, you can decide to keep the original precision or select one that is lower.
.. tab-set::
.. tab-item:: Keeping full model precision :sync: full-precision
.. code-block:: console
optimum-cli export openvino --model <model_id> --weight-format fp16 <exported_model_name>
Examples:
.. tab-set::
.. tab-item:: LLM (text generation)
:sync: llm-text-gen
.. code-block:: console
optimum-cli export openvino --model meta-llama/Llama-2-7b-chat-hf --weight-format fp16 ov_llama_2
.. tab-item:: Diffusion models (text2image)
:sync: diff-text-img
.. code-block:: console
optimum-cli export openvino --model stabilityai/stable-diffusion-xl-base-1.0 --weight-format fp16 ov_SDXL
.. tab-item:: VLM (Image processing):
:sync: vlm-img-proc
.. code-block:: console
optimum-cli export openvino --model openbmb/MiniCPM-V-2_6 --trust-remote-code –weight-format fp16 ov_MiniCPM-V-2_6
.. tab-item:: Whisper models (speech2text):
:sync: whisp-speech-txt
.. code-block:: console
optimum-cli export openvino --trust-remote-code --model openai/whisper-base ov_whisper
.. tab-item:: SpeechT5 TTS models (text2speech):
:sync: whisp-speech-txt
.. code-block:: console
optimum-cli export openvino --model microsoft/speecht5_tts --model-kwargs "{\"vocoder\": \"microsoft/speecht5_hifigan\"}" ov_speecht5_tts
.. tab-item:: Exporting to selected precision :sync: low-precision
.. code-block:: console
optimum-cli export openvino --model <model_id> --weight-format int4 <exported_model_name>
Examples:
.. tab-set::
.. tab-item:: LLM (text generation)
:sync: llm-text-gen
.. code-block:: console
optimum-cli export openvino --model meta-llama/Llama-2-7b-chat-hf --weight-format int4 ov_llama_2
.. tab-item:: Diffusion models (text2image)
:sync: diff-text-img
.. code-block:: console
optimum-cli export openvino --model stabilityai/stable-diffusion-xl-base-1.0 --weight-format int4 ov_SDXL
.. tab-item:: VLM (Image processing)
:sync: vlm-img-proc
.. code-block:: console
optimum-cli export openvino -m model_path --task text-generation-with-past --weight-format int4 ov_MiniCPM-V-2_6
.. note::
Any other model_id, for example openbmb/MiniCPM-V-2_6, or the path
to a local model file can be used.
Also, you can specify different data type like int8.
Additional Resources ###############################################################################
Full set of optimum-cli parameters <https://huggingface.co/docs/optimum/en/intel/openvino/export>__Model conversion in OpenVINO <../../openvino-workflow/model-preparation/convert-model-to-ir>Model optimization in OpenVINO <../../openvino-workflow/model-optimization>