README.md
English | 简体中文
<b>TARS<sup>*</sup></b> is a Multimodal AI Agent stack, currently shipping two projects: Agent TARS and UI-TARS-desktop:
<table> <thead> <tr> <th width="50%" align="center"><a href="#agent-tars">Agent TARS</a></th> <th width="50%" align="center"><a href="#ui-tars-desktop">UI-TARS-desktop</a></th> </tr> </thead> <tbody> <tr> <td align="center"> <video src="https://github.com/user-attachments/assets/c9489936-afdc-4d12-adda-d4b90d2a869d" width="50%"></video> </td> <td align="center"> <video src="https://github.com/user-attachments/assets/e0914ce9-ad33-494b-bdec-0c25c1b01a27" width="50%"></video> </td> </tr> <tr> <td align="left"> <b>Agent TARS</b> is a general multimodal AI Agent stack, it brings the power of GUI Agent and Vision into your terminal, computer, browser and product. It primarily ships with a <a href="https://agent-tars.com/guide/basic/cli.html" target="_blank">CLI</a> and <a href="https://agent-tars.com/guide/basic/web-ui.html" target="_blank">Web UI</a> for usage.
It aims to provide a workflow that is closer to human-like task completion through cutting-edge multimodal LLMs and seamless integration with various real-world <a href="https://agent-tars.com/guide/basic/mcp.html" target="_blank">MCP</a> tools.
</td>
<td align="left">
<b>UI-TARS Desktop</b> is a desktop application that provides a native GUI Agent based on the <a href="https://github.com/bytedance/UI-TARS" target="_blank">UI-TARS</a> model.
It primarily ships a
<a href="https://github.com/bytedance/UI-TARS-desktop/blob/main/docs/quick-start.md#get-model-and-run-local-operator" target="_blank">local</a> and
<a href="https://github.com/bytedance/UI-TARS-desktop/blob/main/docs/quick-start.md#run-remote-operator" target="_blank">remote</a> computer as well as browser operators.
</td>
</tr>
<b>Agent TARS</b> is a general multimodal AI Agent stack, it brings the power of GUI Agent and Vision into your terminal, computer, browser and product.
It primarily ships with a <a href="https://agent-tars.com/guide/basic/cli.html" target="_blank">CLI</a> and <a href="https://agent-tars.com/guide/basic/web-ui.html" target="_blank">Web UI</a> for usage. It aims to provide a workflow that is closer to human-like task completion through cutting-edge multimodal LLMs and seamless integration with various real-world <a href="https://agent-tars.com/guide/basic/mcp.html" target="_blank">MCP</a> tools.
Please help me book the earliest flight from San Jose to New York on September 1st and the last return flight on September 6th on Priceline
https://github.com/user-attachments/assets/772b0eef-aef7-4ab9-8cb0-9611820539d8
<table> <thead> <tr> <th width="50%" align="center">Booking Hotel</th> <th width="50%" align="center">Generate Chart with extra MCP Servers</th> </tr> </thead> <tbody> <tr> <td align="center"> <video src="https://github.com/user-attachments/assets/c9489936-afdc-4d12-adda-d4b90d2a869d" width="50%"></video> </td> <td align="center"> <video src="https://github.com/user-attachments/assets/a9fd72d0-01bb-4233-aa27-ca95194bbce9" width="50%"></video> </td> </tr> <tr> <td align="left"> <b>Instruction:</b> <i>I am in Los Angeles from September 1st to September 6th, with a budget of $5,000. Please help me book a Ritz-Carlton hotel closest to the airport on booking.com and compile a transportation guide for me</i> </td> <td align="left"> <b>Instruction:</b> <i>Draw me a chart of Hangzhou's weather for one month</i> </td> </tr> </tbody> </table>For more use cases, please check out #842.
# Launch with `npx`.
npx @agent-tars/cli@latest
# Install globally, required Node.js >= 22
npm install @agent-tars/cli@latest -g
# Run with your preferred model provider
agent-tars --provider volcengine --model doubao-1-5-thinking-vision-pro-250428 --apiKey your-api-key
agent-tars --provider anthropic --model claude-3-7-sonnet-latest --apiKey your-api-key
Visit the comprehensive Quick Start guide for detailed setup instructions.
<table> <thead> <tr> <th width="20%" align="center">Category</th> <th width="30%" align="center">Resource Link</th> <th width="50%" align="left">Description</th> </tr> </thead> <tbody> <tr> <td align="center">🏠 <strong>Central Hub</strong></td> <td align="center"> <a href="https://agent-tars.com">🌟 Explore Agent TARS Universe 🌟
</a>
</td>
<td align="left">Your gateway to Agent TARS ecosystem</td>
</tr>
<tr>
<td align="center">📚 <strong>Quick Start</strong></td>
<td align="center">
<a href="https://agent-tars.com/guide/get-started/quick-start.html">
</a>
</td>
<td align="left">Zero to hero in 5 minutes</td>
</tr>
<tr>
<td align="center">🚀 <strong>What's New</strong></td>
<td align="center">
<a href="https://agent-tars.com/beta">
</a>
</td>
<td align="left">Discover cutting-edge features & vision</td>
</tr>
<tr>
<td align="center">🛠️ <strong>Developer Zone</strong></td>
<td align="center">
<a href="https://agent-tars.com/guide/get-started/introduction.html">
</a>
</td>
<td align="left">Master every command & features</td>
</tr>
<tr>
<td align="center">🎯 <strong>Showcase</strong></td>
<td align="center">
<a href="https://github.com/bytedance/UI-TARS-desktop/issues/842">
</a>
</td>
<td align="left">View use cases built by the official and community</td>
</tr>
<tr>
<td align="center">🔧 <strong>Reference</strong></td>
<td align="center">
<a href="https://agent-tars.com/api/">
</a>
</td>
<td align="left">Complete technical reference</td>
</tr>
UI-TARS Desktop is a native GUI agent for your local computer, driven by UI-TARS and Seed-1.5-VL/1.6 series models.
<div align="center"> <p>    📑 <a href="https://arxiv.org/abs/2501.12326">Paper</a>    | 🤗 <a href="https://huggingface.co/ByteDance-Seed/UI-TARS-1.5-7B">Hugging Face Models</a>   |   🫨 <a href="https://discord.gg/pTXwYVjfcs">Discord</a>   |   🤖 <a href="https://www.modelscope.cn/collections/UI-TARS-bccb56fa1ef640">ModelScope</a>  🖥️ Desktop Application    |    👓 <a href="https://github.com/web-infra-dev/midscene">Midscene (use in browser)</a>   
</p> </div>| Instruction | Local Operator | Remote Operator |
|---|---|---|
| Please help me open the autosave feature of VS Code and delay AutoSave operations for 500 milliseconds in the VS Code setting. | <video src="https://github.com/user-attachments/assets/e0914ce9-ad33-494b-bdec-0c25c1b01a27" height="300" /> | <video src="https://github.com/user-attachments/assets/01e49b69-7070-46c8-b3e3-2aaaaec71800" height="300" /> |
| Could you help me check the latest open issue of the UI-TARS-Desktop project on GitHub? | <video src="https://github.com/user-attachments/assets/3d159f54-d24a-4268-96c0-e149607e9199" height="300" /> | <video src="https://github.com/user-attachments/assets/072fb72d-7394-4bfa-95f5-4736e29f7e58" height="300" /> |
See Quick Start
See CONTRIBUTING.md.
This project is licensed under the Apache License 2.0.
If you find our paper and code useful in your research, please consider giving a star :star: and citation :pencil:
@article{qin2025ui,
title={UI-TARS: Pioneering Automated GUI Interaction with Native Agents},
author={Qin, Yujia and Ye, Yining and Fang, Junjie and Wang, Haoming and Liang, Shihao and Tian, Shizuo and Zhang, Junda and Li, Jiahao and Li, Yunxin and Huang, Shijue and others},
journal={arXiv preprint arXiv:2501.12326},
year={2025}
}