docs/code-docs/source/inference-init.rst
The entrypoint for inference with DeepSpeed is deepspeed.init_inference().
Example usage:
.. code-block:: python
engine = deepspeed.init_inference(model=net, config=config)
The DeepSpeedInferenceConfig is used to control all aspects of initializing
the InferenceEngine. The config should be passed as a dictionary to
init_inference, but parameters can also be passed as keyword arguments.
.. _DeepSpeedInferenceConfig: .. autopydantic_model:: deepspeed.inference.config.DeepSpeedInferenceConfig
.. _DeepSpeedTPConfig: .. autopydantic_model:: deepspeed.inference.config.DeepSpeedTPConfig
.. _DeepSpeedMoEConfig: .. autopydantic_model:: deepspeed.inference.config.DeepSpeedMoEConfig
.. _QuantizationConfig: .. autopydantic_model:: deepspeed.inference.config.QuantizationConfig
.. _InferenceCheckpointConfig: .. autopydantic_model:: deepspeed.inference.config.InferenceCheckpointConfig
Example config:
.. code-block:: python
config = {
"kernel_inject": True,
"tensor_parallel": {"tp_size": 4},
"dtype": "fp16",
"enable_cuda_graph": False
}
.. autofunction:: deepspeed.init_inference