docs/articles_en/openvino-workflow/running-inference.rst
.. meta:: :description: Learn how to implement a typical inference pipeline of OpenVINO™ Runtime in an application.⠀
.. toctree:: :maxdepth: 1 :hidden:
running-inference/model-representation running-inference/model-input-output running-inference/inference-request running-inference/stateful-models running-inference/python-api-advanced-inference running-inference/python-api-exclusives running-inference/inference-devices-and-modes Optimize Inference <running-inference/optimize-inference>
OpenVINO Runtime is a set of C++ libraries with C and Python bindings, providing a common API to run inference on various devices. Each device (integrated with OpenVINO’s plugin architecture) offers the common, as well as hardware-specific APIs for more configuration options. Note that OpenVINO Runtime may also be integrated with other frameworks and work as their backend, for example, using torch.compile. The scheme below illustrates the typical workflow for deploying a trained deep learning model in an application:
.. image:: ../assets/images/IMPLEMENT_PIPELINE_with_API_C.svg
This guide will show you how to implement a typical OpenVINO™ Runtime inference pipeline
in your application. Before proceeding, check how
:doc:model conversion <model-preparation/convert-model-to-ir>
works in OpenVINO and how it may affect your applications’ performance. Make sure you have
installed OpenVINO Runtime and set environment variables (otherwise, the find_package
calls will not find OpenVINO_DIR):
.. tab-set::
.. tab-item:: Linux :sync: lin
.. code-block:: console
<INSTALL_DIR>/setupvars.sh
.. tab-item:: Windows :sync: win
PowerShell:
.. code-block:: console
<INSTALL_DIR>/setupvars.sh
Command Prompt
.. code-block:: console
cd <INSTALL_DIR>
setupvars.bat
Step 1. Create OpenVINO Runtime Core ###############################################################################################
Initiate working with OpenVINO in your application by including the OpenVINO™ Runtime components:
.. tab-set::
.. tab-item:: Python
:sync: py
.. doxygensnippet:: docs/snippets/src/main.py
:language: python
:fragment: [import]
.. doxygensnippet:: docs/snippets/src/main.py
:language: python
:fragment: [part1]
.. tab-item:: C++
:sync: cpp
.. doxygensnippet:: docs/snippets/src/main.cpp
:language: cpp
:fragment: [include]
.. doxygensnippet:: docs/snippets/src/main.cpp
:language: cpp
:fragment: [part1]
.. tab-item:: C
:sync: c
.. doxygensnippet:: docs/snippets/src/main.c
:language: cpp
:fragment: [include]
.. doxygensnippet:: docs/snippets/src/main.c
:language: cpp
:fragment: [part1]
Step 2. Compile the Model ###############################################################################################
Compile the model with ov::Core::compile_model(), defining the device or mode to use
for inference. The following example uses the
:doc:AUTO mode <running-inference/inference-devices-and-modes/auto-device-selection>,
which selects the device for you. To learn more about supported devices and inference modes,
see the :doc:Inference Devices and Modes <running-inference/inference-devices-and-modes>
section.
.. tab-set::
.. tab-item:: Python
:sync: py
.. tab-set::
.. tab-item:: IR
:sync: ir
.. doxygensnippet:: docs/snippets/src/main.py
:language: python
:fragment: [part2_1]
.. tab-item:: ONNX
:sync: onnx
.. doxygensnippet:: docs/snippets/src/main.py
:language: python
:fragment: [part2_2]
.. tab-item:: PaddlePaddle
:sync: paddlepaddle
.. doxygensnippet:: docs/snippets/src/main.py
:language: python
:fragment: [part2_3]
.. tab-item:: TensorFlow
:sync: tensorflow
.. doxygensnippet:: docs/snippets/src/main.py
:language: python
:fragment: [part2_4]
.. tab-item:: TensorFlow Lite
:sync: tensorflow_lite
.. doxygensnippet:: docs/snippets/src/main.py
:language: python
:fragment: [part2_5]
.. tab-item:: ov::Model
:sync: openvinomodel
.. doxygensnippet:: docs/snippets/src/main.py
:language: python
:fragment: [part2_6]
.. tab-item:: C++
:sync: cpp
.. tab-set::
.. tab-item:: IR
:sync: ir
.. doxygensnippet:: docs/snippets/src/main.cpp
:language: cpp
:fragment: [part2_1]
.. tab-item:: ONNX
:sync: onnx
.. doxygensnippet:: docs/snippets/src/main.cpp
:language: cpp
:fragment: [part2_2]
.. tab-item:: PaddlePaddle
:sync: paddlepaddle
.. doxygensnippet:: docs/snippets/src/main.cpp
:language: cpp
:fragment: [part2_3]
.. tab-item:: TensorFlow
:sync: tensorflow
.. doxygensnippet:: docs/snippets/src/main.cpp
:language: cpp
:fragment: [part2_4]
.. tab-item:: TensorFlow Lite
:sync: tensorflow_lite
.. doxygensnippet:: docs/snippets/src/main.cpp
:language: cpp
:fragment: [part2_5]
.. tab-item:: ov::Model
:sync: openvinomodel
.. doxygensnippet:: docs/snippets/src/main.cpp
:language: cpp
:fragment: [part2_6]
.. tab-item:: C
:sync: c
.. tab-set::
.. tab-item:: IR
:sync: ir
.. doxygensnippet:: docs/snippets/src/main.c
:language: cpp
:fragment: [part2_1]
.. tab-item:: ONNX
:sync: onnx
.. doxygensnippet:: docs/snippets/src/main.c
:language: cpp
:fragment: [part2_2]
.. tab-item:: PaddlePaddle
:sync: paddlepaddle
.. doxygensnippet:: docs/snippets/src/main.c
:language: cpp
:fragment: [part2_3]
.. tab-item:: TensorFlow
:sync: tensorflow
.. doxygensnippet:: docs/snippets/src/main.c
:language: cpp
:fragment: [part2_4]
.. tab-item:: TensorFlow Lite
:sync: tensorflow_lite
.. doxygensnippet:: docs/snippets/src/main.c
:language: cpp
:fragment: [part2_5]
.. tab-item:: ov::Model
:sync: openvinomodel
.. doxygensnippet:: docs/snippets/src/main.c
:language: cpp
:fragment: [part2_6]
The ov::CompiledModel class represents a compiled model and enables you to get
information inputs or output ports by a tensor name or index. This approach is aligned with
most frameworks. The ov::Model object represents any models inside the OpenVINO™ Runtime.
For more details, refer to
:doc:OpenVINO™ Model representation <running-inference/model-representation>.
Step 3. Create an Inference Request ###############################################################################################
Use the ov::InferRequest class methods to create an infer request. For more details,
see the article on
:doc:InferRequest <running-inference/inference-request>.
.. tab-set::
.. tab-item:: Python
:sync: py
.. doxygensnippet:: docs/snippets/src/main.py
:language: python
:fragment: [part3]
.. tab-item:: C++
:sync: cpp
.. doxygensnippet:: docs/snippets/src/main.cpp
:language: cpp
:fragment: [part3]
.. tab-item:: C
:sync: c
.. doxygensnippet:: docs/snippets/src/main.c
:language: cpp
:fragment: [part3]
Step 4. Set Inputs ###############################################################################################
Create ov::Tensor, you can use external memory for that , and use the
ov::InferRequest::set_input_tensor method to send this tensor to the device.
For more info on textual data as input, see the
:doc:String Tensors <running-inference/model-input-output/string-tensors> article.
.. tab-set::
.. tab-item:: Python
:sync: py
.. doxygensnippet:: docs/snippets/src/main.py
:language: python
:fragment: [part4]
.. tab-item:: C++
:sync: cpp
.. doxygensnippet:: docs/snippets/src/main.cpp
:language: cpp
:fragment: [part4]
.. tab-item:: C
:sync: c
.. doxygensnippet:: docs/snippets/src/main.c
:language: cpp
:fragment: [part4]
Step 5. Start Inference ###############################################################################################
Use either ov::InferRequest::start_async or ov::infer_request.infer() to start model
inference. To learn how they work, see the
:doc:OpenVINO Inference Request <running-inference/inference-request>
article. The following example uses the asynchronous option and calls
ov::InferRequest::wait to wait for the inference results.
.. tab-set::
.. tab-item:: Python
:sync: py
.. doxygensnippet:: docs/snippets/src/main.py
:language: python
:fragment: [part5]
.. tab-item:: C++
:sync: cpp
.. doxygensnippet:: docs/snippets/src/main.cpp
:language: cpp
:fragment: [part5]
.. tab-item:: C
:sync: c
.. doxygensnippet:: docs/snippets/src/main.c
:language: cpp
:fragment: [part5]
Step 6. Process the Inference Results #################################################################################################
Get output tensors and process the inference results.
For more info on textual data as input, see the
:doc:String Tensors <running-inference/model-input-output/string-tensors> article.
.. tab-set::
.. tab-item:: Python
:sync: py
.. doxygensnippet:: docs/snippets/src/main.py
:language: python
:fragment: [part6]
.. tab-item:: C++
:sync: cpp
.. doxygensnippet:: docs/snippets/src/main.cpp
:language: cpp
:fragment: [part6]
.. tab-item:: C
:sync: c
.. doxygensnippet:: docs/snippets/src/main.c
:language: cpp
:fragment: [part6]
Step 7. [only for C] Release the allocated objects ###############################################################################################
To avoid memory leak, applications developed with the C API need to release the allocated objects in the following order.
.. tab-set::
.. tab-item:: C
:sync: c
.. doxygensnippet:: docs/snippets/src/main.c
:language: cpp
:fragment: [part8]
Build Your Application ###############################################################################################
If you have integrated OpenVINO with your application, you will need to adjust your
application build process as well. Of course, there are multiple ways this stage may be
done, so you will need to choose the one best for your project. To learn about the basics of
OpenVINO build process, refer to the
documentation on GitHub <https://github.com/openvinotoolkit/openvino/blob/master/docs/dev/build.md>__.
The following example uses a C++ & C application together with CMake, for project configuration.
Create Structure for project:
.. tab-set::
.. tab-item:: C++
:sync: cpp
.. doxygensnippet:: docs/snippets/src/main.cpp
:language: cpp
:fragment: [part7]
:force:
.. tab-item:: C
:sync: c
.. doxygensnippet:: docs/snippets/src/main.c
:language: cpp
:fragment: [part7]
:force:
Configure the CMake build
For details on additional CMake build options, refer to the
CMake page <https://cmake.org/cmake/help/latest/manual/cmake.1.html#manual:cmake(1)>__.
.. tab-set::
.. tab-item:: C++
:sync: cpp
.. doxygensnippet:: docs/snippets/CMakeLists.txt
:language: cpp
:fragment: [cmake:integration_example_cpp]
.. tab-item:: C
:sync: c
.. doxygensnippet:: docs/snippets/CMakeLists.txt
:language: cpp
:fragment: [cmake:integration_example_c]
.. tab-item:: C++ (PyPI)
:sync: cpp
.. doxygensnippet:: docs/snippets/CMakeLists.txt
:language: cpp
:fragment: [cmake:integration_example_cpp_py]
Build Project
Use CMake to build the project on your system:
.. code-block:: sh
cd build/ cmake ../project cmake --build .
Additional Resources ####################
Learn OpenVINO <../get-started/learn-openvino> section, including
OpenVINO™ Runtime API Tutorial <./../../notebooks/openvino-api-with-output.html>__.Hugging Face <https://huggingface.co/models>__.Using Encrypted Models with OpenVINO <encrypted-models>