docs/xinference_infer.md
Xinference is a unified inference platform that provides a unified interface for different inference engines. It supports LLM, text generation, image generation, and more.but it's not bigger than Swift too much.
Xinference can be installed simply by using the following easy bash code:
pip install "xinference[all]"
The initial steps for conducting inference with Xinference involve downloading the model during the first launch.
xinference
Model engine : Transformers
model format : pytorch
Model size : 8
quantization : none
N-GPU : auto
Replica : 1
If you have already downloaded the MiniCPM-Llama3-V-2_5 model locally, you can proceed with Xinference inference following these steps:
xinference
Model engine : Transformers
model format : pytorch
Model size : 8
quantization : none
N-GPU : auto
Replica : 1
Maybe your firewall or mac os to prevent the web to open.