website/docs/MCP.md
WebdriverIO MCP is a Model Context Protocol (MCP) server that enables AI assistants to automate and interact with web browsers and mobile applications.
It provides a unified interface for:
through the @wdio/mcp package.
This allows AI assistants to:
:::info
NOTE For Mobile Apps Mobile automation requires a running Appium server with the appropriate drivers installed. See Prerequisites for setup instructions.
:::
The easiest way to use @wdio/mcp is via npx without any local installation:
npx @wdio/mcp
Or install it globally:
npm install -g @wdio/mcp
To use WebdriverIO MCP with Claude, modify the configuration file:
{
"mcpServers": {
"wdio-mcp": {
"command": "npx",
"args": ["-y", "@wdio/mcp"]
}
}
}
After adding the configuration, restart your harness. The WebdriverIO MCP tools will be available for browser and mobile automation tasks.
Claude Code automatically detects MCP servers. You can configure it in your project's .claude/settings.json, or .mcp.json.
Or add it to .claude.json globally with executing:
claude mcp add --transport stdio wdio-mcp -- npx -y @wdio/mcp
Validate it by running the /mcp command inside claude code.
Ask Claude to automate browser tasks:
"Open Chrome and navigate to https://webdriver.io"
"Click the 'Get Started' button"
"Take a screenshot of the page"
"Find all visible links on the page"
Ask Claude to automate mobile apps:
"Start my iOS app on the iPhone 15 simulator"
"Tap the login button"
"Swipe up to scroll down"
"Take a screenshot of the current screen"
| Feature | Description |
|---|---|
| Session Management | Launch Chrome, Firefox, Edge, or Safari in headed/headless mode with custom dimensions; attach to an existing Chrome instance via CDP |
| Navigation | Navigate to URLs; manage multiple tabs |
| Element Interaction | Click elements, type text, find elements by various selectors |
| Page Analysis | Get interactable elements (with pagination), accessibility tree (with role filtering) |
| Screenshots | Capture screenshots (auto-optimized to max 1MB) |
| Scrolling | Scroll up/down by configurable pixel amounts |
| Cookie Management | Get, set, and delete cookies |
| Device Emulation | Emulate mobile/tablet viewports in browser (BiDi required) |
| Script Execution | Execute custom JavaScript in browser context |
| Feature | Description |
|---|---|
| Session Management | Launch apps on simulators, emulators, or real devices |
| Touch Gestures | Tap (element or coordinates), swipe, drag and drop |
| Element Detection | Smart element detection with multiple locator strategies and pagination |
| App Lifecycle | Get app state (foreground, background, not running, not installed) |
| Context Switching | Switch between native and webview contexts in hybrid apps |
| Device Control | Rotate device, keyboard control, GPS override |
| Permissions | Automatic permission and alert handling |
| Script Execution | Execute Appium mobile commands (pressKey, deepLink, shell, etc.) |
| Feature | Description |
|---|---|
| Browser Sessions | Run browser sessions on BrowserStack, Sauce Labs, TestMu, or TestingBot (Windows, macOS, Linux) |
| Mobile Sessions | Run app sessions on real devices via BrowserStack, Sauce Labs, TestMu, or TestingBot |
| App Management | Upload .apk/.ipa files; list previously uploaded apps across all four providers |
| Local Tunnel | Auto-manage provider-specific tunnel binaries for accessing localhost |
| Reporting | Tag sessions with project/build/session labels (works identically across all providers) |
xcode-select --install
npm install -g appium
appium driver install xcuitest
appium
export ANDROID_HOME=$HOME/Library/Android/sdk
export PATH=$PATH:$ANDROID_HOME/emulator
export PATH=$PATH:$ANDROID_HOME/platform-tools
npm install -g appium
appium driver install uiautomator2
appium
WebdriverIO MCP acts as a bridge between AI assistants and browser/mobile automation:
āāāāāāāāāāāāāāāāāāā MCP Protocol āāāāāāāāāāāāāāāāāāā
ā Claude Desktop ā āāāāāāāāāāāāāāāāāāāāŗ ā @wdio/mcp ā
ā or Claude Code ā (stdio or HTTP) ā Server ā
āāāāāāāāāāāāāāāāāāā āāāāāāāāāā¬āāāāāāāāā
ā
WebDriverIO API
ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā ā ā
āāāāāāāāā¼āāāāāāāā āāāāāāāāā¼āāāāāāāā āāāāāāāāā¼āāāāāāāā
ā Browser ā ā Appium ā ā Cloud ā
ā (local/CDP) ā ā (iOS/Android)ā ā Providers ā
āāāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāāā
noReset: true) automatically detach on closeThe MCP server supports multiple selector strategies. See Selectors for detailed documentation.
# CSS Selectors
button.my-class
#element-id
[data-testid="login"]
# XPath
//button[@class='submit']
//a[contains(text(), 'Click')]
# Text Selectors (WebdriverIO specific)
button=Exact Button Text
a*=Partial Link Text
# Accessibility ID (recommended - works on iOS & Android)
~loginButton
# Android UiAutomator
android=new UiSelector().text("Login")
# iOS Predicate String
-ios predicate string:label == "Login"
# iOS Class Chain
-ios class chain:**/XCUIElementTypeButton[`label == "Login"`]
# XPath (works on both platforms)
//android.widget.Button[@text="Login"]
//XCUIElementTypeButton[@label="Login"]
The MCP server provides 29 tools for browser and mobile automation. See Tools for the complete reference.
| Tool | Platform | Description |
|---|---|---|
start_session | all | Start a browser or mobile session (local or cloud provider) |
close_session | all | Close or detach from the current session |
launch_chrome | browser | Open Chrome with remote debugging for CDP attach |
navigate | browser | Load a URL in the current tab |
get_tabs | browser | List all open tabs |
switch_tab | browser | Focus a tab by handle or index |
switch_frame | browser | Switch into an iframe by selector, or back to top-level |
click_element | browser | Click an element |
set_value | all | Type text into an input |
scroll | browser | Scroll the page up or down |
get_elements | all | Get interactable elements (with filtering + pagination) |
get_accessibility_tree | browser | Get accessibility tree (with role filtering) |
get_screenshot | all | Capture screenshot (auto-optimized) |
get_cookies | browser | Get all cookies or a specific cookie |
set_cookie | browser | Set a browser cookie |
delete_cookies | browser | Delete all or one cookie |
emulate_device | browser | Emulate a mobile/tablet device viewport |
execute_script | all | Run JavaScript (browser) or Appium commands (mobile) |
tap_element | mobile | Tap an element or screen coordinates |
swipe | mobile | Swipe gesture in a direction |
drag_and_drop | mobile | Drag between elements or coordinates |
get_contexts | mobile | List available native/webview contexts |
switch_context | mobile | Switch between native and webview contexts |
rotate_device | mobile | Rotate to portrait or landscape |
hide_keyboard | mobile | Dismiss the software keyboard |
set_geolocation | all | Override device GPS coordinates |
get_app_state | mobile | Get app lifecycle state |
list_apps | cloud | List uploaded apps (BrowserStack, Sauce Labs, TestMu, TestingBot) |
upload_app | cloud | Upload an .apk/.ipa to a cloud provider |
In addition to tools, the server exposes live session state as MCP resources. See Resources for the complete reference.
| Resource URI | Description |
|---|---|
wdio://sessions | Index of all sessions |
wdio://session/current/elements | Interactable elements (prefer over screenshot) |
wdio://session/current/screenshot | Screenshot as base64 |
wdio://session/current/accessibility | Accessibility tree |
wdio://session/current/cookies | Browser cookies |
wdio://session/current/tabs | Open browser tabs |
wdio://session/current/contexts | Available mobile contexts |
wdio://session/current/context | Active mobile context |
wdio://session/current/app-state/{bundleId} | Mobile app lifecycle state |
wdio://session/current/geolocation | Current GPS override |
wdio://session/current/logs | Session logs (browser console, logcat, crashlog) |
wdio://session/current/capabilities | Raw WebDriver capabilities |
wdio://session/current/code | Generated WebdriverIO JS |
wdio://session/current/steps | Session step log |
wdio://session/{sessionId}/code | Generated JS for past session |
wdio://session/{sessionId}/steps | Steps for past session |
wdio://browserstack/local-binary | BrowserStack Local setup instructions |
wdio://saucelabs/local-binary | Sauce Connect Proxy setup instructions |
wdio://testmu/local-binary | TestMu Tunnel setup instructions |
wdio://testingbot/local-binary | TestingBot Tunnel setup instructions |
By default, the MCP server automatically grants app permissions (autoGrantPermissions: true), eliminating the need to manually handle permission dialogs during automation.
System alerts (like "Allow notifications?") are automatically accepted by default (autoAcceptAlerts: true). This can be configured to dismiss instead with autoDismissAlerts: true.
By default, the server runs over stdio (launched as a subprocess by the AI client). For clients that don't support subprocess-based MCP (llama.cpp, Codex secure mode), use HTTP transport:
npx @wdio/mcp --http --port 3000
See Transport for full options including --allowedHosts and --allowedOrigins.
The MCP server is optimized for efficient AI assistant communication:
All tools are designed with robust error handling:
appium)appiumConfigappium driver list)xcrun simctl list devices)adb devices)ANDROID_HOME environment variable is set