Back to UI-TARS-desktop

Deployment

docs/archive-1.0/deployment.md

0.3.02.5 KB
Original Source

[!WARNING] This document has been archived.

Deployment

⚠️ Important Announcement: GGUF Model Performance

The GGUF model has undergone quantization, but unfortunately, its performance cannot be guaranteed. As a result, we have decided to downgrade it.

💡 Alternative Solution: You can use Cloud Deployment or Local Deployment [vLLM](If you have enough GPU resources) instead.

We appreciate your understanding and patience as we work to ensure the best possible experience.

Cloud Deployment

We recommend using HuggingFace Inference Endpoints for fast deployment. We provide two docs for users to refer:

English version: GUI Model Deployment Guide

中文版: GUI模型部署教程

Local Deployment [vLLM]

We recommend using vLLM for fast deployment and inference. You need to use vllm>=0.6.1.

bash
pip install -U transformers
VLLM_VERSION=0.6.6
CUDA_VERSION=cu124
pip install vllm==${VLLM_VERSION} --extra-index-url https://download.pytorch.org/whl/${CUDA_VERSION}

Download the Model

We provide three model sizes on Hugging Face: 2B, 7B, and 72B. To achieve the best performance, we recommend using the 7B-DPO or 72B-DPO model (based on your hardware configuration):

Start an OpenAI API Service

Run the command below to start an OpenAI-compatible API service:

bash
python -m vllm.entrypoints.openai.api_server --served-model-name ui-tars --model <path to your model>

Input your API information

<!-- If you use Ollama, you can use the following settings to start the server: ```yaml VLM Provider: ollama VLM Base Url: http://localhost:11434/v1 VLM API Key: api_key VLM Model Name: ui-tars ``` -->

Note: VLM Base Url is OpenAI compatible API endpoints (see OpenAI API protocol document for more details).