docs/users/features/arena.md
Dispatch multiple AI models simultaneously to execute the same task, compare their solutions side-by-side, and select the best result to apply to your workspace.
[!warning] Agent Arena is experimental. It has known limitations around display modes and session management.
Agent Arena lets you pit multiple AI models against each other on the same task. Each model runs as a fully independent agent in its own isolated Git worktree, so file operations never interfere. When all agents finish, you compare results and select a winner to merge back into your main workspace.
Unlike subagents, which delegate focused subtasks within a single session, Arena agents are complete, top-level agent instances — each with its own model, context window, and full tool access.
This page covers:
Agent Arena is most effective when you want to evaluate or compare how different models tackle the same problem. The strongest use cases are:
Agent Arena uses significantly more tokens than a single session (each agent has its own context window and model calls). It works best when the value of comparison justifies the cost. For routine tasks where you trust your default model, a single session is more efficient.
Use the /arena slash command to launch a session. Specify the models you want to compete and the task:
/arena --models qwen3.5-plus,glm-5,kimi-k2.5 "Refactor the authentication module to use JWT tokens"
If you omit --models, an interactive model selection dialog appears, letting you pick from your configured providers.
~/.qwen/arena/<session-id>/worktrees/<model-name>/. Each worktree mirrors your current working directory state exactly — including staged changes, unstaged changes, and untracked files.Agent Arena currently supports in-process mode, where all agents run asynchronously within the same terminal process. A tab bar at the bottom of the terminal lets you switch between agents.
[!note] Split-pane display modes are planned for the future. We intend to support tmux-based and iTerm2-based split-pane layouts, where each agent gets its own terminal pane for true side-by-side viewing. Currently, only in-process tab switching is available.
In in-process mode, use keyboard shortcuts to switch between agent views:
| Shortcut | Action |
|---|---|
Right | Switch to the next agent tab |
Left | Switch to the previous agent tab |
Up | Switch focus to the input box |
Down | Switch focus to the agent tab bar |
The tab bar shows each agent's current status:
| Indicator | Meaning |
|---|---|
● | Running or idle |
✓ | Completed successfully |
✗ | Failed |
○ | Cancelled |
When viewing an agent's tab, you can:
Each agent is a full, independent session. Anything you can do with the main agent, you can do with an arena agent.
When all agents complete, the Arena enters the result comparison phase. You'll see:
A selection dialog presents the successful agents. Choose one to apply its changes to your main workspace, or discard all results. Press p to toggle a quick preview for the highlighted agent, or d to toggle that agent's detailed diff before selecting a winner.
If you want to inspect the complete reasoning path before deciding, each agent's full conversation history is still available via the tab bar while the selection dialog is active.
Arena behavior can be customized in settings.json:
{
"arena": {
"worktreeBaseDir": "~/.qwen/arena",
"maxRoundsPerAgent": 50,
"timeoutSeconds": 600
}
}
| Setting | Description | Default |
|---|---|---|
arena.worktreeBaseDir | Base directory for arena worktrees | ~/.qwen/arena |
arena.maxRoundsPerAgent | Maximum reasoning rounds per agent | 50 |
arena.timeoutSeconds | Timeout for each agent in seconds | 600 |
Arena is most valuable when you compare models with meaningfully different strengths. For example:
/arena --models qwen3.5-plus,glm-5,kimi-k2.5 "Optimize the database query layer"
Comparing three versions of the same model family yields less insight than comparing across providers.
Arena agents work independently with no communication. Tasks should be fully describable in the prompt without requiring back-and-forth:
Good: "Refactor the payment module to use the strategy pattern. Update all tests."
Less effective: "Let's discuss how to improve the payment module" — this benefits from conversation, which is better suited to a single session.
Up to 5 agents can run simultaneously. In practice, 2-3 agents provide the best balance of comparison value to resource cost. More agents means:
Start with 2-3 and scale up only when the comparison value justifies it.
Arena shines when the stakes justify running multiple models:
For routine changes like renaming a variable or updating a config file, a single session is faster and cheaper.
--models is properly configured with valid API credentials~/.qwen/arena/ by default)git worktree list to check for stale worktrees from previous sessionsgit worktree prunegit --version, requires Git 2.5+)arena.timeoutSeconds in settingsarena.maxRoundsPerAgent if agents are spending too many roundsAgent Arena is experimental. Current limitations:
git worktree prune.Agent Arena is one of several planned multi-agent modes in Qwen Code. Agent Team and Agent Swarm are not yet implemented — the table below describes their intended design for reference.
| Agent Arena | Agent Team (planned) | Agent Swarm (planned) | |
|---|---|---|---|
| Goal | Competitive: Find the best solution to the same task | Collaborative: Tackle different aspects together | Batch parallel: Dynamically spawn workers for bulk tasks |
| Agents | Pre-configured models compete independently | Teammates collaborate with assigned roles | Workers spawned on-the-fly, destroyed on completion |
| Communication | No inter-agent communication | Direct peer-to-peer messaging | One-way: results aggregated by parent |
| Isolation | Full: separate Git worktrees | Independent sessions with shared task list | Lightweight ephemeral context per worker |
| Output | One selected solution applied to workspace | Synthesized results from multiple perspectives | Aggregated results from parallel processing |
| Best for | Benchmarking, choosing between model approaches | Research, complex collaboration, cross-layer work | Batch operations, data processing, map-reduce tasks |
Explore related approaches for parallel and delegated work: