Back to UI-TARS-desktop

UI-TARS Desktop

docs/archive-1.0/README.md

0.3.03.1 KB
Original Source
<p align="center"> </p>

UI-TARS Desktop

UI-TARS Desktop is a GUI Agent application based on UI-TARS (Vision-Language Model) that allows you to control your computer using natural language.

<p align="center"> &nbsp&nbsp πŸ“‘ <a href="https://arxiv.org/abs/2501.12326">Paper</a> &nbsp&nbsp | πŸ€— <a href="https://huggingface.co/bytedance-research/UI-TARS-7B-DPO">Hugging Face Models</a>&nbsp&nbsp | &nbsp&nbsp🫨 <a href="https://discord.gg/pTXwYVjfcs">Discord</a>&nbsp&nbsp | &nbsp&nbspπŸ€– <a href="https://www.modelscope.cn/models/bytedance-research/UI-TARS-7B-DPO">ModelScope</a>&nbsp&nbsp

πŸ–₯️ Desktop Application &nbsp&nbsp | &nbsp&nbsp πŸ‘“ <a href="https://github.com/web-infra-dev/midscene">Midscene (use in browser)</a>

</p>

Showcases

InstructionVideo
Get the current weather in SF using the web browser<video src="https://github.com/user-attachments/assets/5235418c-ac61-4895-831d-68c1c749fc87" height="300" />
Send a twitter with the content "hello world"<video src="https://github.com/user-attachments/assets/737ccc11-9124-4464-b4be-3514cbced85c" height="300" />

News

  • [2025-04-17] - πŸŽ‰ We're excited to announce support for UI-TARS-1.5, featuring enhanced performance, precise control, and expanded scenario coverage (using computer and browser as operators). Now compatible with multiple models: UI-TARS-1.0, UI-TARS-1.5, and Doubao-1.5-UI-TARS!
  • [2025-02-20] - πŸ“¦ Introduced UI TARS SDK, is a powerful cross-platform toolkit for building GUI automation agents.
  • [2025-01-23] - πŸš€ We updated the Cloud Deployment section in the δΈ­ζ–‡η‰ˆ: GUIζ¨‘εž‹ιƒ¨η½²ζ•™η¨‹ with new information related to the ModelScope platform. You can now use the ModelScope platform for deployment.

Features

  • πŸ€– Natural language control powered by Vision-Language Model
  • πŸ–₯️ Screenshot and visual recognition support
  • 🎯 Precise mouse and keyboard control
  • πŸ’» Cross-platform support (Windows/MacOS)
  • πŸ”„ Real-time feedback and status display
  • πŸ” Private and secure - fully local processing

Quick Start

See Quick Start.

Deployment

See Deployment.

Contributing

See CONTRIBUTING.md.

SDK (Experimental)

See @ui-tars/sdk

License

UI-TARS Desktop is licensed under the Apache License 2.0.

Citation

If you find our paper and code useful in your research, please consider giving a star :star: and citation :pencil:

BibTeX
@article{qin2025ui,
  title={UI-TARS: Pioneering Automated GUI Interaction with Native Agents},
  author={Qin, Yujia and Ye, Yining and Fang, Junjie and Wang, Haoming and Liang, Shihao and Tian, Shizuo and Zhang, Junda and Li, Jiahao and Li, Yunxin and Huang, Shijue and others},
  journal={arXiv preprint arXiv:2501.12326},
  year={2025}
}