docs/archive-1.0/README.md
UI-TARS Desktop is a GUI Agent application based on UI-TARS (Vision-Language Model) that allows you to control your computer using natural language.
<p align="center">    π <a href="https://arxiv.org/abs/2501.12326">Paper</a>    | π€ <a href="https://huggingface.co/bytedance-research/UI-TARS-7B-DPO">Hugging Face Models</a>   |   π«¨ <a href="https://discord.gg/pTXwYVjfcs">Discord</a>   |   π€ <a href="https://www.modelscope.cn/models/bytedance-research/UI-TARS-7B-DPO">ModelScope</a>  π₯οΈ Desktop Application    |    π <a href="https://github.com/web-infra-dev/midscene">Midscene (use in browser)</a>
</p>| Instruction | Video |
|---|---|
| Get the current weather in SF using the web browser | <video src="https://github.com/user-attachments/assets/5235418c-ac61-4895-831d-68c1c749fc87" height="300" /> |
| Send a twitter with the content "hello world" | <video src="https://github.com/user-attachments/assets/737ccc11-9124-4464-b4be-3514cbced85c" height="300" /> |
See Quick Start.
See Deployment.
See CONTRIBUTING.md.
See @ui-tars/sdk
UI-TARS Desktop is licensed under the Apache License 2.0.
If you find our paper and code useful in your research, please consider giving a star :star: and citation :pencil:
@article{qin2025ui,
title={UI-TARS: Pioneering Automated GUI Interaction with Native Agents},
author={Qin, Yujia and Ye, Yining and Fang, Junjie and Wang, Haoming and Liang, Shihao and Tian, Shizuo and Zhang, Junda and Li, Jiahao and Li, Yunxin and Huang, Shijue and others},
journal={arXiv preprint arXiv:2501.12326},
year={2025}
}