Inference Setup

The entrypoint for inference with DeepSpeed is deepspeed.init_inference().

Example usage:

.. code-block:: python

engine = deepspeed.init_inference(model=net, config=config)

The DeepSpeedInferenceConfig is used to control all aspects of initializing the InferenceEngine. The config should be passed as a dictionary to init_inference, but parameters can also be passed as keyword arguments.

.. _DeepSpeedInferenceConfig: .. autopydantic_model:: deepspeed.inference.config.DeepSpeedInferenceConfig

.. _DeepSpeedTPConfig: .. autopydantic_model:: deepspeed.inference.config.DeepSpeedTPConfig

.. _DeepSpeedMoEConfig: .. autopydantic_model:: deepspeed.inference.config.DeepSpeedMoEConfig

.. _QuantizationConfig: .. autopydantic_model:: deepspeed.inference.config.QuantizationConfig

.. _InferenceCheckpointConfig: .. autopydantic_model:: deepspeed.inference.config.InferenceCheckpointConfig

Example config:

.. code-block:: python

config = {
"kernel_inject": True,
"tensor_parallel": {"tp_size": 4},
"dtype": "fp16",
"enable_cuda_graph": False
}

.. autofunction:: deepspeed.init_inference