website/docs/mcp/faq.md
Frequently asked questions about WebdriverIO MCP.
MCP (Model Context Protocol) is an open protocol that enables AI assistants like Claude to interact with external tools and services. WebdriverIO MCP implements this protocol to provide browser and mobile automation capabilities to Claude Desktop and Claude Code.
You can automate:
No! That's the main benefit of MCP. You can describe what you want to do in natural language, and Claude will use the appropriate tools to accomplish the task.
Example prompts:
You don't need to install it separately. The MCP server runs automatically via npx when you configure it in your harness. Add this to your config:
{
"mcpServers": {
"wdio-mcp": {
"command": "npx",
"args": ["-y", "@wdio/mcp"]
}
}
}
~/Library/Application Support/Claude/claude_desktop_config.json%APPDATA%\Claude\claude_desktop_config.jsonNo. Browser automation only requires the target browser to be installed. WebdriverIO handles driver management automatically.
Yes. Mobile automation requires:
npm install -g appium && appium)appium driver install xcuitest for iOS, appium driver install uiautomator2 for Android)Chrome, Firefox, Edge, and Safari are all supported. Use the browser parameter in start_session:
"Start a Firefox session"
"Start Chrome in headless mode"
Yes. Headless is the default (headless: true). Ask Claude to run headed if you want to see the browser:
"Start Chrome in headed mode (not headless)"
Yes. You can specify dimensions when starting the browser:
"Start Chrome with a window size of 1920x1080"
Supported dimensions: 400–3840 pixels wide, 400–2160 pixels tall. Default is 1920×1080.
Yes! Use the navigationUrl parameter:
"Start Chrome and navigate to https://webdriver.io"
This is more efficient than starting the browser and then navigating separately.
Simply ask:
"Take a screenshot of the current page"
Screenshots are automatically optimized:
Yes. Use the switch_frame tool to switch into an iframe by CSS or XPath selector. All subsequent click_element, set_value, and get_elements calls operate within the switched frame. Omit the selector to switch back to the top-level frame. Iframes must be from the same origin as the main page.
Yes! Use the execute_script tool:
"Execute script to get the page title" "Execute script: return document.querySelectorAll('button').length"
Yes. Use launch_chrome first (opens Chrome with remote debugging), then start_session with attach: true.
"Launch Chrome with remote debugging, then attach to it"
Yes. Use get_tabs to list open tabs and switch_tab to focus a specific one:
"Get all open tabs" "Switch to the tab at index 1"
Use start_session with the appropriate platform:
"Start my iOS app located at /path/to/MyApp.app on the iPhone 15 simulator"
"Start my Android app at /path/to/app.apk on the Pixel 7 emulator"
Or for an already-installed app:
"Start the app with noReset enabled on the iPhone 15 simulator"
Yes! For real devices, you'll need the device UDID:
adb devices in terminalThen ask:
"Start my iOS app on the real device with UDID abc123..."
By default, permissions are automatically granted (autoGrantPermissions: true). If you need to test permission flows, you can disable this:
"Start my app without automatically granting permissions"
tap_element)swipe)drag_and_drop)Note: long_press is available through execute_script with Appium mobile commands.
Use swipe gestures:
"Swipe up to scroll down" "Swipe down to scroll up"
Yes:
"Rotate the device to landscape" "Rotate the device to portrait"
For apps with webviews, you can switch contexts:
"Get available contexts" "Switch to the webview context" "Switch back to native context"
Yes! Use the execute_script tool:
Execute script "mobile: pressKey" with args [{ keycode: 4 }] // Press BACK on Android
Execute script "mobile: activateApp" with args [{ bundleId: "com.example.app" }]
Execute script "mobile: terminateApp" with args [{ bundleId: "com.example.app" }]
It uses the wdio://session/current/elements resource or get_elements tool to identify interactive elements on the page/screen. Each element comes with ready-to-use selectors.
Use pagination to manage large element lists:
"Get the first 20 elements" "Get elements with offset 20 and limit 20"
The response includes total, showing, and hasMore to help navigate through elements.
You can be more specific:
~loginButtonid=login_button-ios predicate string:label == "Login"The accessibility tree provides semantic information about page elements (roles, names, states). Use get_accessibility_tree when:
get_elements doesn't return expected elements"Get accessibility tree filtered to button and link roles"
No. The MCP server uses a single-session model. Only one browser or app session can be active at a time.
It depends on the session type and settings:
noReset: false: App terminatesnoReset: true or no appPath: App stays open (session detaches automatically)Yes! Use the noReset option:
"Start my app with noReset enabled"
This preserves login state, preferences, and other app data.
Detach is useful when you want to manually inspect the state after automation.
Increase the command timeout:
"Start my app with newCommandTimeout of 300 seconds"
Default is 300 seconds. For very long debugging sessions, try 600 seconds.
This means no active session exists. Start a browser or app session first:
"Start Chrome and navigate to google.com"
The element might not be visible or might have a different selector. Try:
inViewportOnly: false to find off-screen elementsThis is the most common issue when starting mobile automation.
curl http://localhost:4723/statusappiumappiumConfig in start_session)appium driver list --installed:::tip The MCP server requires Appium to be running before starting mobile sessions. Make sure to start Appium first:
appium
Future versions may include automatic Appium service management. :::
xcode-select --installxcrun simctl list devicesANDROID_HOME: export ANDROID_HOME=$HOME/Library/Android/sdkemulator -list-avdsemulator -avd <avd-name>adb devicesScreenshots are automatically compressed to max 1MB, so large screenshots will work but may be lower quality.
Mobile automation involves:
Tips for faster automation:
inViewportOnly: true for element detectionlimit) to reduce token usageThe MCP server already optimizes element detection using XML page source parsing (2 HTTP calls vs 600+ for traditional element queries). Additional tips:
inViewportOnly: true to filter off-screen elementsincludeContainers: false (default)limit and offset for pagination on large screensScreenshots are automatically optimized:
This optimization reduces processing time and ensures Claude can handle the image.
switch_frame; cross-origin iframes are not accessible due to browser security restrictionsWebdriverIO MCP is designed for interactive AI-assisted automation. For production CI/CD testing, consider using WebdriverIO's traditional test runner with full programmatic control.
The MCP server runs locally on your machine. All automation happens through local browser/Appium connections. No data is sent to external servers beyond what you explicitly navigate to.
When using HTTP transport mode (--http), the server defaults to only accepting connections from localhost; use --allowedHosts and --allowedOrigins to control access. See Transport for details.
Claude can see page content and interact with elements, but:
<input type="password"> fields are maskedVisit the GitHub repository to: