website/docs/mcp/faq.md
Frequently asked questions about WebdriverIO MCP.
MCP (Model Context Protocol) is an open protocol that enables AI assistants like Claude to interact with external tools and services. WebdriverIO MCP implements this protocol to provide browser and mobile automation capabilities to Claude Desktop and Claude Code.
You can automate:
No! That's the main benefit of MCP. You can describe what you want to do in natural language, and Claude will use the appropriate tools to accomplish the task.
Example prompts:
You don't need to install it separately. The MCP server runs automatically via npx when you configure it in Claude Desktop or Claude Code.
Add this to your Claude Desktop config:
{
"mcpServers": {
"wdio-mcp": {
"command": "npx",
"args": ["-y", "@wdio/mcp"]
}
}
}
~/Library/Application Support/Claude/claude_desktop_config.json%APPDATA%\Claude\claude_desktop_config.jsonNo. Browser automation only requires Chrome to be installed. WebdriverIO handles the ChromeDriver automatically.
Yes. Mobile automation requires:
npm install -g appium && appium)appium driver install xcuitest for iOS, appium driver install uiautomator2 for Android)Currently, only Chrome is supported. Support for other browsers may be added in future versions.
Yes! Ask Claude to start the browser in headless mode:
"Start Chrome in headless mode"
Or Claude will use this option when appropriate (e.g., in CI/CD contexts).
Yes. You can specify dimensions when starting the browser:
"Start Chrome with a window size of 1920x1080"
Supported dimensions: 400-3840 pixels wide, 400-2160 pixels tall. Default is 1920x1080.
Yes! Use the navigationUrl parameter:
"Start Chrome and navigate to https://webdriver.io"
This is more efficient than starting the browser and then navigating separately.
Simply ask Claude:
"Take a screenshot of the current page"
Screenshots are automatically optimized:
Currently, the MCP server operates on the main document. iframe interaction may be added in future versions.
Yes! Use the execute_script tool:
"Execute script to get the page title" "Execute script: return document.querySelectorAll('button').length"
Ask Claude with the necessary details:
"Start my iOS app located at /path/to/MyApp.app on the iPhone 15 simulator"
Or for an installed app:
"Start the app with noReset enabled on the iPhone 15 simulator"
"Start my Android app at /path/to/app.apk on the Pixel 7 emulator"
Or for an installed app:
"Start the app with noReset enabled on the Pixel 7 emulator"
Yes! For real devices, you'll need the device UDID:
adb devices in terminalThen ask Claude:
"Start my iOS app on the real device with UDID abc123..."
By default, permissions are automatically granted (autoGrantPermissions: true). If you need to test permission flows, you can disable this:
"Start my app without automatically granting permissions"
Note: long_press is available through execute_script with Appium mobile commands.
Use swipe gestures:
"Swipe up to scroll down" "Swipe down to scroll up"
Yes:
"Rotate the device to landscape" "Rotate the device to portrait"
For apps with webviews, you can switch contexts:
"Get available contexts" "Switch to the webview context" "Switch back to native context"
Yes! Use the execute_script tool:
Execute script "mobile: pressKey" with args [{ keycode: 4 }] // Press BACK on Android
Execute script "mobile: activateApp" with args [{ appId: "com.example.app" }]
Execute script "mobile: terminateApp" with args [{ bundleId: "com.example.app" }]
Claude uses the get_visible_elements tool to identify interactive elements on the page/screen. Each element comes with multiple selector strategies.
Use pagination to manage large element lists:
"Get the first 20 visible elements" "Get visible elements with offset 20 and limit 20"
The response includes total, showing, and hasMore to help navigate through elements.
Yes! Use the elementType parameter:
interactable (default): Buttons, links, inputsvisual: Images, SVGsall: Both types"Get visible visual elements on the page"
You can be more specific:
~loginButtonid=login_button-ios predicate string:label == "Login"The accessibility tree provides semantic information about page elements (roles, names, states). Use get_accessibility when:
get_visible_elements doesn't return expected elements"Get accessibility tree filtered to button and link roles"
No. The MCP server uses a single-session model. Only one browser or app session can be active at a time.
It depends on the session type and settings:
noReset: false: App terminatesnoReset: true or no appPath: App stays open (session detaches automatically)Yes! Use the noReset option:
"Start my app with noReset enabled"
This preserves login state, preferences, and other app data.
Detach is useful when you want to manually inspect the state after automation.
Increase the command timeout:
"Start my app with newCommandTimeout of 300 seconds"
Default is 60 seconds. For long debugging sessions, try 300-600 seconds.
This means no active session exists. Start a browser or app session first:
"Start Chrome and navigate to google.com"
The element might not be visible or might have a different selector. Try:
inViewportOnly: false to find off-screen elementsThis is the most common issue when starting mobile automation.
curl http://localhost:4723/statusappiumappium driver list --installed:::tip The MCP server requires Appium to be running before starting mobile sessions. Make sure to start Appium first:
appium
Future versions may include automatic Appium service management. :::
xcode-select --installxcrun simctl list devicesANDROID_HOME: export ANDROID_HOME=$HOME/Library/Android/sdkemulator -list-avdsemulator -avd <avd-name>adb devicesScreenshots are automatically compressed to max 1MB, so large screenshots will work but may be lower quality.
Mobile automation involves:
Tips for faster automation:
inViewportOnly: true for element detectionlimit) to reduce token usageThe MCP server already optimizes element detection using XML page source parsing (2 HTTP calls vs 600+ for traditional element queries). Additional tips:
inViewportOnly: true (default)includeContainers: false (default)limit and offset for pagination on large screensScreenshots are automatically optimized:
This optimization reduces processing time and ensures Claude can handle the image.
WebdriverIO MCP is designed for interactive AI-assisted automation. For production CI/CD testing, consider using WebdriverIO's traditional test runner with full programmatic control.
The MCP server runs locally on your machine. All automation happens through local browser/Appium connections. No data is sent to external servers beyond what you explicitly navigate to.
Claude can see page content and interact with elements, but:
<input type="password"> fields are maskedVisit the GitHub repository to: