docs/DRIVER.md
Internal implementation document for browser automation in karate. WebSocket infrastructure:
io.karatelabs.httppackage (WsClient, WsFrame, CdpClient)
| Phase | Description | Status |
|---|---|---|
| 1-8 | CDP Driver (WebSocket + launch + elements + frames + intercept) | ✅ Complete |
| 9a | Gherkin/DSL Integration | ✅ Complete |
| 9b | Gherkin E2E Tests | ✅ Complete |
| 9c | PooledDriverProvider (browser reuse) | ✅ Complete |
| 10 | Playwright Backend | ⬜ Not started |
| 11 | W3C WebDriver Backend | ✅ Complete (100% pass, separate w3c Maven profile + CI job) |
| 12 | WebDriver BiDi (Future) | ⬜ Not started |
| 13 | Cloud Provider Integration | ⬜ Not started |
Deferred: Capabilities query API, Video recording (→ commercial app), karate-robot
io.karatelabs.driver/
├── Driver, Element, Locators, Finder # Backend-agnostic API
├── DriverOptions, DriverException # Configuration and errors
├── Mouse, Keys, Dialog # Input interfaces
├── DialogHandler, InterceptHandler # Functional interfaces
├── InterceptRequest, InterceptResponse # Data classes
├── DriverProvider, PooledDriverProvider # Lifecycle management
├── PageLoadStrategy # Enum
└── cdp/ # CDP implementation
├── CdpDriver, CdpMouse, CdpKeys, CdpDialog
├── CdpClient, CdpMessage, CdpEvent, CdpResponse
└── CdpLauncher, CdpDriverOptions
| Aspect | V2 Approach |
|---|---|
| Gherkin DSL | Drop-in compatible - same keywords (click(), html(), etc.) |
| Java API | Clean break - redesigned, not constrained by v1 quirks |
Target interface | Replaced by DriverProvider (simpler, more flexible) |
@AutoDef, Plugin | Removed |
getDialogText() polling | Replaced by onDialog(handler) callback |
showDriverLog | No effect (TODO: implement driver log forwarding) |
Gherkin Syntax (unchanged from v1):
* configure driver = { type: 'chrome', headless: true }
* driver serverUrl + '/login'
* input('#username', 'admin')
* click('button[type=submit]')
* waitFor('#dashboard')
* match driver.title == 'Welcome'
Navigation:
| V1 Gherkin | V2 Status |
|---|---|
driver 'url' | ✅ Working |
driver.url | ✅ Working |
driver.title | ✅ Working |
refresh() | ✅ Working |
reload() | ✅ Working |
back() | ✅ Working |
forward() | ✅ Working |
Element Actions:
| V1 Gherkin | V2 Status |
|---|---|
click(locator) | ✅ Working |
input(locator, value) | ✅ Working |
input(locator, ['a', Key.ENTER]) | ✅ Working |
focus(locator) | ✅ Working |
clear(locator) | ✅ Working |
value(locator, value) | ✅ Working |
select(locator, text) | ✅ Working |
scroll(locator) | ✅ Working |
highlight(locator) | ✅ Working |
Element State:
| V1 Gherkin | V2 Status |
|---|---|
html(locator) | ✅ Working |
text(locator) | ✅ Working |
value(locator) | ✅ Working |
attribute(locator, name) | ✅ Working |
enabled(locator) | ✅ Working |
exists(locator) | ✅ Working |
position(locator) | ✅ Working |
Element Navigation:
V2 drops v1's parent, children, firstChild, lastChild, previousSibling, nextSibling element accessors by design, in favor of a lean selector-based surface that mirrors the native W3C DOM Element API. Hop-counting patterns like e.parent.parent are fragile under markup changes; selectors are not.
| API | V2 Status |
|---|---|
element.closest(selector) | ✅ Working — nearest ancestor (or self) matching CSS selector |
element.matches(selector) | ✅ Working — boolean "does this element match the selector" |
element.locate(childSelector) | ✅ Working — scoped single match within this element (already available) |
element.locateAll(childSelector) | ✅ Working — scoped multi-match within this element (already available) |
element.parent, .children, .firstChild, .lastChild, .previousSibling, .nextSibling | ❌ Removed — use closest / scoped locateAll, or drop into element.script() for arbitrary DOM walks |
Wait Methods:
| V1 Gherkin | V2 Status |
|---|---|
waitFor(locator) | ✅ Working |
waitForAny(loc1, loc2) | ✅ Working |
waitForUrl('path') | ✅ Working |
waitForText(loc, text) | ✅ Working |
waitForEnabled(loc) | ✅ Working |
waitForResultCount(loc, n) | ✅ Working |
waitUntil('js') | ✅ Working |
waitUntil(loc, 'js') | ✅ Working |
Frames/Dialogs/Cookies:
| V1 Gherkin | V2 Status |
|---|---|
switchFrame(index) | ✅ Working |
switchFrame(locator) | ✅ Working |
switchFrame(null) | ✅ Working |
dialog(accept) | ✅ Working |
cookie(name) | ✅ Working |
clearCookies() | ✅ Working |
driver.intercept(config) | ✅ Working (CDP only, supports mock and handler) |
Mouse/Keys:
| V1 Gherkin | V2 Status |
|---|---|
mouse() | ✅ Working |
mouse(locator) | ✅ Working |
keys() | ✅ Working |
Key.ENTER, Key.TAB | ✅ Working |
V2 preserves V1's driver lifecycle behavior for called features:
Auto-Start on First Use:
# In karate-config.js
karate.configure('driver', { type: 'chrome', headless: true })
# In feature file - driver auto-starts when keyword is encountered
* driver serverUrl + '/login'
When driver url is encountered:
initDriver() is calleddriverConfigDriverProvider is set (via Runner.driverProvider()), it's used to acquire the driverConfig Inheritance for Called Features:
When a feature calls another feature (call read('sub.feature')), the called feature inherits:
KarateConfig.copyFrom()) - including driverConfig, ssl, proxy, headers, cookies, timeouts, etc.This matches V1 behavior where the entire Config object is copied/shared with called features.
# main.feature - Scenario Outline entry point
Scenario Outline: <config>
* call read('orchestration.feature')
# orchestration.feature - receives inherited driverConfig
Background:
* driver serverUrl + '/index.html' # auto-starts using inherited config
Scenario:
* call read('sub.feature') # sub inherits the driver instance
# sub.feature - uses inherited driver, no new browser opened
Scenario:
* driver serverUrl + '/page2' # same browser, navigates to new page
* match driver.title == 'Page 2'
Driver Not Closed Until Top-Level Exit:
The driver is only closed when the top-level scenario (the entry point) completes:
driverInherited = true)Driver Upward Propagation (Automatic):
For shared-scope calls (call read('...') without a result variable), drivers automatically propagate back to the caller — matching V1 behavior. No special configuration is needed.
# init-driver.feature - called feature creates driver
@ignore
Feature: Initialize driver
Background:
* configure driver = driverConfig
Scenario: Init driver
* driver serverUrl + '/page.html' # driver auto-propagates to caller
# main.feature - caller receives the driver
Scenario:
* call read('init-driver.feature')
* match driver.title == 'Page' # works - driver propagated from callee
| Scenario | Behavior |
|---|---|
| Caller has driver, callee inherits | ✅ Driver shared |
| Callee inits driver, shared-scope call | ✅ Propagated to caller |
Callee inits driver, isolated-scope call (def result = call ...) | Released to pool |
Two Approaches to Driver Management:
| Approach | Use Case | How It Works |
|---|---|---|
| Default (PooledDriverProvider) | Parallel tests, browser pooling | Auto-created, pool size = thread count |
| Custom DriverProvider | Containers, cloud providers | Set via Runner.driverProvider() |
How it works:
configure driver sets driver options (timeout, headless, etc.)// Example: DriverProvider with config from karate-config.js
Runner.path("features/")
.configDir("classpath:karate-config.js") // sets driverConfig
.driverProvider(new PooledDriverProvider()) // uses driverConfig
.parallel(4);
void setUrl(String url) // Navigate and wait
String getUrl() // Get current URL
String getTitle() // Get page title
void waitForPageLoad(PageLoadStrategy strategy) // Wait for load
void waitForPageLoad(PageLoadStrategy, Duration) // With timeout
void refresh() // Soft reload
void reload() // Hard reload
void back() // Navigate back
void forward() // Navigate forward
Object script(String expression) // Execute JS
Object script(String locator, String expression) // JS on element (_ = element)
List<Object> scriptAll(String locator, String expression)
byte[] screenshot() // PNG bytes
byte[] screenshot(boolean embed) // Optional embed in report
void onDialog(DialogHandler handler) // Register callback
String getDialogText() // Get message
void dialog(boolean accept) // Accept/dismiss
void dialog(boolean accept, String input) // With prompt input
void switchFrame(int index) // By index
void switchFrame(String locator) // By locator (null = main)
Map<String, Object> getCurrentFrame() // Get frame info
Element click(String locator)
Element focus(String locator)
Element clear(String locator)
Element input(String locator, String value)
Element value(String locator, String value) // Set value
Element select(String locator, String text)
Element select(String locator, int index)
Element scroll(String locator)
Element highlight(String locator)
Select Matching Behavior:
| Syntax | Behavior |
|---|---|
select(loc, 'us') | Match by value first, then fall back to text |
select(loc, 'United States') | Falls back to text match if value not found |
select(loc, '{}United States') | Match by exact text only |
select(loc, '{^}Unit') | Match by text contains |
select(loc, 1) | Match by index (0-based) |
Events dispatched: input then change, both with {bubbles: true} for React/Vue compatibility.
String text(String locator)
String html(String locator)
String value(String locator) // Get value
String attribute(String locator, String name)
Object property(String locator, String name)
boolean enabled(String locator)
boolean exists(String locator)
Map<String, Object> position(String locator)
Map<String, Object> position(String locator, boolean relative)
Element locate(String locator)
List<Element> locateAll(String locator)
Element optional(String locator) // No throw if missing
// On Element — selector-based DOM navigation matching native W3C Element API
Element closest(String selector) // Nearest ancestor (or self) matching CSS
boolean matches(String selector) // Does this element match the selector
closest returns an Element carrying a pure-JS locator, so it composes with every other element op — e.closest('form').attribute('id'), e.closest('tr').locateAll('td'), waitFor(e.closest('.row').getLocator()).
matches returns a boolean — useful in predicates and conditional walks.
# Walk up from a labelled input to its form
* def form = locate('#username').closest('form')
* match form.attribute('id') == 'test-form'
# Check membership against a selector
* match locate('#username').matches('input[type=text]') == true
# Row-scoped enumeration replaces v1's e.parent.children
* def cells = locate('//td[text()="John"]').closest('tr').locateAll('td')
For arbitrary DOM walks that don't map onto a CSS selector, drop into the browser via element.script():
* def nextId = locate('#anchor').script('_.nextElementSibling.id')
Element waitFor(String locator)
Element waitFor(String locator, Duration timeout)
Element waitForAny(String locator1, String locator2)
Element waitForAny(String[] locators)
Element waitForAny(String[] locators, Duration timeout)
Element waitForText(String locator, String expected)
Element waitForText(String locator, String expected, Duration timeout)
Element waitForEnabled(String locator)
Element waitForEnabled(String locator, Duration timeout)
String waitForUrl(String expected)
String waitForUrl(String expected, Duration timeout)
Element waitUntil(String locator, String expression)
Element waitUntil(String locator, String expression, Duration timeout)
boolean waitUntil(String expression)
boolean waitUntil(String expression, Duration timeout)
Object waitUntil(Supplier<Object> condition)
Object waitUntil(Supplier<Object> condition, Duration timeout)
List<Element> waitForResultCount(String locator, int count)
List<Element> waitForResultCount(String locator, int count, Duration timeout)
Map<String, Object> cookie(String name) // Get cookie
void cookie(Map<String, Object> cookie) // Set cookie
void deleteCookie(String name)
void clearCookies()
List<Map<String, Object>> getCookies()
void maximize()
void minimize()
void fullscreen()
Map<String, Object> getDimensions()
void setDimensions(Map<String, Object> dimensions)
void activate() // Bring to front
byte[] pdf(Map<String, Object> options)
byte[] pdf()
Mouse mouse() // At (0, 0)
Mouse mouse(String locator) // At element center
Mouse mouse(Number x, Number y) // At coordinates
Keys keys()
Keyboard Implementation Notes (CdpKeys):
rawKeyDown → char → keyUptext: "\r" (required for form submission)windowsVirtualKeyCode (e.g., . = 190, , = 188)List<String> getPages()
void switchPage(String titleOrUrl)
void switchPage(int index)
Finder rightOf(String locator)
Finder leftOf(String locator)
Finder above(String locator)
Finder below(String locator)
Finder near(String locator)
void intercept(List<String> patterns, InterceptHandler handler)
void intercept(InterceptHandler handler) // All requests
void stopIntercept()
void quit()
void close() // Alias for quit()
boolean isTerminated()
DriverOptions getOptions()
karate.driver(config)The driver is exposed to JavaScript through karate.driver(config):
// Create driver instance
var driver = karate.driver({ type: 'chrome', headless: true })
// Navigate
driver.setUrl('http://localhost:8080/login')
// Interact
driver.input('#username', 'admin')
driver.click('button[type=submit]')
driver.waitFor('#dashboard')
// Read state
var title = driver.title // Property access via ObjectLike
var url = driver.url
var cookies = driver.cookies
// Cleanup
driver.quit()
Driver implements ObjectLike for JS property access:
default Object get(String name) {
return switch (name) {
case "url" -> getUrl();
case "title" -> getTitle();
case "cookies" -> getCookies();
default -> null;
};
}
Accessible properties:
driver.url → getUrl()driver.title → getTitle()driver.cookies → getCookies()| Gherkin | JS Equivalent |
|---|---|
* driver 'url' | driver.setUrl('url') |
* click('#id') | driver.click('#id') |
* input('#id', 'val') | driver.input('#id', 'val') |
* waitFor('#id') | driver.waitFor('#id') |
* match driver.title == 'x' | driver.title |
Driver methods are bound as globals for Gherkin compatibility:
engine.putRootBinding("click", (ctx, args) -> driver.click(args[0].toString()));
engine.putRootBinding("input", (ctx, args) -> driver.input(args[0].toString(), args[1].toString()));
engine.putRootBinding("Key", new JavaType(Keys.class));
// ... all driver methods
Hidden from getAllVariables() - keeps reports clean.
V2 automatically waits for elements to exist before performing operations. This reduces flaky tests caused by timing issues where elements haven't yet appeared in the DOM.
Auto-wait is built into all element operations:
click(), input(), value(), select(), focus(), clear()scroll(), highlight()text(), html(), attribute(), property(), enabled()position(), script(locator, expr), scriptAll(locator, expr)How it works:
retryInterval (default 500ms) up to retryCount times (default 3)DriverException if element still not found after retriesConfiguration:
* configure driver = { retryCount: 5, retryInterval: 200 }
Note: exists() does NOT auto-wait - it immediately returns true/false. Use waitFor() for explicit waiting with longer timeouts.
# Default: auto-wait before click
* click('#button')
# Explicit: custom wait before action
* waitFor('#button').click()
* waitForEnabled('#button').click()
# Extended timeout
* retry(5, 10000).click('#button')
Fixed interval polling (v1-style):
driver.timeout(Duration.ofSeconds(30)) // Default timeout
driver.retryInterval(Duration.ofMillis(500)) // Poll interval
| Method | Behavior |
|---|---|
waitFor(locator) | Wait until element exists |
waitForAny(locators) | Wait for any match |
waitForText(loc, text) | Wait for text content |
waitForEnabled(loc) | Wait until not disabled |
waitForUrl(substring) | Wait for URL to contain |
waitUntil(expression) | Wait for JS truthy |
waitUntil(loc, expr) | Wait for element + JS |
waitForResultCount(loc, n) | Wait for element count |
Arrow function syntax in JS API (not underscore shorthand):
// JS API - arrow function
driver.waitUntil('#btn', el => !el.disabled)
driver.waitUntil('#btn', el => el.textContent.includes('Ready'))
// Gherkin - underscore shorthand (v1 compat)
* waitUntil('#btn', '!_.disabled')
public interface DriverProvider {
Driver acquire(ScenarioRuntime runtime, Map<String, Object> config);
void release(ScenarioRuntime runtime, Driver driver);
void shutdown();
}
Built-in implementation for parallel execution:
Runner.path("features/")
.driverProvider(new PooledDriverProvider())
.parallel(4); // Creates pool of 4 drivers
Features:
Runner.parallel(N)about:blank, clear cookies)public class ContainerDriverProvider extends PooledDriverProvider {
private final ChromeContainer container;
public ContainerDriverProvider(ChromeContainer container) {
super();
this.container = container;
}
@Override
protected Driver createDriver(Map<String, Object> config) {
return CdpDriver.connect(container.getCdpUrl(), CdpDriverOptions.fromMap(config));
}
}
See Cloud Provider Integration for SauceLabs, BrowserStack examples.
configure driverDriverProvider works alongside configure driver:
configure driver in karate-config.js sets driver options (timeout, headless, etc.)driver url is encountered, ScenarioRuntime.initDriver() checks for a providerprovider.acquire(runtime, configMap) with the config# karate-config.js
karate.configure('driver', { timeout: 30000, headless: true })
# feature file
* driver serverUrl + '/login' # uses provider if set, passes config
This allows:
type implies backend (v1 style):
| Config | Backend |
|---|---|
type: 'chrome' | CDP |
type: 'playwright' | Playwright |
type: 'chromedriver' | WebDriver |
| Feature | CDP | Playwright | WebDriver |
|---|---|---|---|
| Navigation | ✅ | ✅ | ✅ |
| Element actions | ✅ | ✅ | ✅ |
| Wait methods | ✅ | ✅ | ✅ |
| Screenshots | ✅ | ✅ | ✅ |
| Frames | ✅ Explicit | ✅ Auto | ✅ Explicit |
| Dialogs | ✅ Callback | ✅ Callback | ⚠️ Limited |
| Request interception | ✅ | ✅ | ❌ |
| PDF generation | ✅ | ✅ | ❌ |
ariaTree() | ✅ | ❌ (future) | ❌ |
| Raw protocol access | ✅ cdp() | ❌ | ❌ |
Throw UnsupportedOperationException at runtime:
public byte[] pdf() {
throw new UnsupportedOperationException(
"PDF generation not supported on WebDriver backend");
}
CdpLauncher resolves the browser executable in this order:
executable option — configure driver = { executable: '/path/to/chrome' }KARATE_CHROME_EXECUTABLE env var — useful for Docker/CI where Chromium is at a non-standard path/Applications/Google Chrome.app/... (macOS), /usr/bin/google-chrome (Linux), C:\Program Files\Google\Chrome\... (Windows)| Env Var | Purpose |
|---|---|
KARATE_CHROME_EXECUTABLE | Override Chrome/Chromium executable path |
KARATE_CHROME_ARGS | Extra args (space-separated), appended to every launch |
KARATE_DRIVER_HEADLESS | Run browser headless when set to true |
# Docker: Debian Chromium at /usr/bin/chromium
export KARATE_CHROME_EXECUTABLE=/usr/bin/chromium
export KARATE_CHROME_ARGS="--no-sandbox --disable-gpu --disable-dev-shm-usage"
# CI: custom Chrome install
export KARATE_CHROME_EXECUTABLE=/opt/chrome/chrome
CdpClient WebSocket implementationArchitecture:
PlaywrightClient for wire protocolFrame handling:
switchFrame() as fallback for cross-backend testsMVP Definition:
Status: Core implementation complete. Infrastructure works (session creation, navigation, element operations). Many edge cases still failing in E2E tests. Marked as experimental until stabilized.
Architecture:
W3cDriver class in io.karatelabs.driver.w3c packageW3cBrowserType enum drives all browser-specific differences (port, executable, CLI args, capabilities)W3cSession uses java.net.http.HttpClient for W3C protocol (clean separation from scenario HTTP)W3cElement extends BaseElement with native W3C element ID operationsW3cDriverOptions supports webDriverUrl, webDriverSession, capabilities, start (v1-compatible)W3cKeys for keyboard input via sendKeys on active elementType mapping (v1-compatible):
configure driver type | Backend | Executable | Default Port |
|---|---|---|---|
chrome (default) | CDP | (Chrome) | 9222 |
chromedriver | W3C | chromedriver | 9515 |
geckodriver | W3C | geckodriver | 4444 |
safaridriver | W3C | safaridriver | 5555 |
msedgedriver | W3C | msedgedriver | 9515 |
Cloud/Remote configuration:
// SauceLabs / BrowserStack - set webDriverUrl to remote hub
configure driver = {
type: 'chromedriver',
webDriverUrl: 'https://ondemand.saucelabs.com:443/wd/hub',
capabilities: { platformName: 'Windows 10', 'sauce:options': { tunnelId: '...' } }
}
// Local chromedriver already running
configure driver = { type: 'chromedriver', webDriverUrl: 'http://localhost:9515' }
Capabilities configuration (v1-compatible):
{ capabilities: { alwaysMatch: { browserName: '...' } } } from typecapabilities key merges into alwaysMatch for conveniencewebDriverSession provides full payload override (v1 backward compat)Process management:
webDriverUrl is set, connects to existing server (no process launch)W3cBrowserType.defaultPortCDP-only operations (throw UnsupportedOperationException):
mouse() — coordinate-based mouse inputpdf() — PDF generationintercept() — request interceptiononDialog() — dialog callback handler (use dialog(true/false) after dialog appears)stopIntercept() — request interceptionRefactoring done for multi-backend support:
Element extracted to interface (was concrete class)BaseElement — locator-based impl that delegates to Driver (works with any backend)PooledDriverProvider.createDriver() now dispatches on config type (CDP or W3C)ScenarioRuntime.initDriver() detects W3C types and creates W3cDriver accordinglyCurrent W3C test results: 110/110 pass (100%)
Main suite: 92/92 pass (CDP-only scenarios tagged @cdp and excluded).
Frame suite: 16/16 pass (dedicated W3cFrameFeatureTest, single-threaded).
Separate Maven profile: mvn verify -Pw3c -pl karate-core. Runs in parallel CI job.
What works:
__kjs (driver.js) runtime injection (wildcard locators, shadow DOM traversal) — same as CDPreturn prefix for W3C executeScript (detects statement blocks vs value expressions).click() (not W3C element endpoint — more reliable across browsers)keys().press('Control+a'), keys().press('Shift+ArrowLeft') etc.Design decisions (v1 patterns retained):
| Operation | V1 WebDriver | V2 W3cDriver | Rationale |
|---|---|---|---|
| click | JS .click() | JS .click() | More reliable across browsers, handles shadow DOM |
| text/html/value/attribute | JS eval | JS eval | Avoids stale element references |
| input (sendKeys) | W3C endpoint | W3C endpoint | Only way to trigger framework event handlers |
| clear | JS value = '' | JS value = '' | More consistent than W3C clear endpoint |
| eval() retry | Single retry on JS error | Single retry on JS error | Handles transient page-load timing |
| elementId retry | Single retry on locator error | Single retry on locator error | Handles transient DOM changes |
| frame switch | Find index by iterating all iframes | W3C element reference directly | Improvement: simpler, no race conditions |
| key combos | W3C Actions API | W3C Actions API | Proper modifier key support |
CDP-only edge cases (tagged @cdp, excluded from W3C suite):
Completed:
-Pw3c) for W3C tests — keeps cicd fast (~1:30)cicd.yml — build (CDP) and w3c run concurrentlyRunner.Builder.tags() bug — multiple varargs now stored as List (was v2 regression from v1)Remaining TODOs:
outline-xbrowser.feature with examples table for chrome/chromedriver/geckodriver/safaridriver, karate-config-xbrowser.js, and LocalParallelRunner.java in e2e package. Not part of cicd (requires local browsers).configure driver = { type: 'chromedriver', webDriverUrl: '...', capabilities: { ... } }. Document working config in MIGRATION_GUIDE.md.Cloud providers use WebDriver se:cdp capability:
public class SauceLabsDriverProvider implements DriverProvider {
@Override
public Driver acquire(ScenarioRuntime runtime, Map<String, Object> config) {
// 1. POST to SauceLabs /session with capabilities
String sessionUrl = "https://ondemand.saucelabs.com/wd/hub/session";
Map<String, Object> caps = buildCapabilities(config);
JsonNode response = httpClient.post(sessionUrl, caps);
// 2. Extract se:cdp WebSocket URL from response
String cdpUrl = response.at("/value/capabilities/se:cdp").asText();
// 3. Return CdpDriver.connect()
return CdpDriver.connect(cdpUrl, CdpDriverOptions.fromMap(config));
}
@Override
public void release(ScenarioRuntime runtime, Driver driver) {
reportTestStatus(runtime); // PUT pass/fail to cloud API
driver.quit();
}
}
| Provider | CDP Support | Method |
|---|---|---|
| SauceLabs | ✅ | se:cdp via WebDriver |
| BrowserStack | ✅ | se:cdp via WebDriver |
| LambdaTest | ✅ | se:cdp via WebDriver |
| Phase | Summary |
|---|---|
| 1 | WebSocket client, CDP message protocol |
| 2 | Browser launch, minimal driver |
| 3 | Testcontainers + ChromeContainer + TestPageServer |
| 4 | Screenshots, DOM, console, network utilities |
| 5 | Locators, Element class, wait methods, v1 bug fixes |
| 6 | Dialog handling (callback), frame switching |
| 7 | Intercept, cookies, window, PDF, Mouse, Keys, Finder |
| 8 | Package restructuring: Driver interface + cdp/ subpackage |
9a: Gherkin/DSL Integration
configure driver support to KarateConfigdriver keyword to StepExecutorObjectLike for JS property access9b: Gherkin E2E Tests
DriverFeatureTest.java - JUnit runner with Testcontainers9c: PooledDriverProvider
Runner.parallel(N)Architecture:
PlaywrightClient for WebSocket protocolDriver interface, different implementationGoals:
Frame handling:
switchFrame()switchFrame() work on bothe2e/
├── feature/ # Gherkin E2E tests
│ ├── karate-config.js
│ ├── navigation.feature
│ ├── element.feature
│ ├── cookie.feature
│ ├── mouse.feature
│ ├── keys.feature
│ ├── frame.feature
│ └── dialog.feature
│
└── js/ # JS API E2E tests
├── navigation.js
├── element.js
└── ...
| Suite | Purpose |
|---|---|
| Gherkin E2E | V1 syntax compatibility, DSL coverage |
| JS API E2E | Pure JS API coverage, LLM workflow validation |
Both test suites must pass when switching backends:
type: 'chrome' (CDP)type: 'playwright' (when implemented)| Decision | Choice | Rationale |
|---|---|---|
| Interface name | Driver | V1 familiarity |
| CDP-only APIs | Graceful degradation | Returns null/no-op on WebDriver |
| Async events | Callback handlers | Driver stays pure sync |
| Error handling | Always verbose | AI-agent friendly |
| Docker | Testcontainers + chromedp/headless-shell | ~200MB, fast |
| Wait model | Auto-wait + override | Playwright-style, v1 compat |
| Retry | Fixed interval | Simple, predictable |
| JS API | Unified flat | All methods on driver object |
| Backend selection | type implies backend | v1 style, familiar |
| Cloud providers | Extension points only | DriverProvider pattern |
| ariaTree impl | CDP-only initially | Simpler scope |
driver.sessionTimeout(Duration) // Overall session
driver.pageLoadTimeout(Duration) // Navigation
driver.elementTimeout(Duration) // Element waits
driver.scriptTimeout(Duration) // JS execution
All errors include:
AI-agent friendly: detailed context aids debugging.
Wildcard locators like {div}Account need to:
{button} matches <div role="button">)XPath-based solutions have semantic mismatches with JavaScript's DOM APIs (visibility, textContent, etc.), leading to edge cases.
Instead of expanding wildcards to XPath, we inject a JavaScript resolver into the browser:
{div}text → window.__kjs.resolve("div", "text", 1, false)
{^div}text → window.__kjs.resolve("div", "text", 1, true)
{div:2}text → window.__kjs.resolve("div", "text", 2, false)
Resource: karate-core/src/main/resources/io/karatelabs/driver/driver.js
Namespace: window.__kjs (Karate JS Runtime):
__kjs.resolve(tag, text, index, contains) - Wildcard resolver__kjs.log(msg, data) - Log for debugging__kjs.getLogs() - Get log entries (for LLM debugging)__kjs.clearLogs() - Clear log entries__kjs.isVisible(), __kjs.getVisibleText() - Shared utilities| Feature | Implementation |
|---|---|
| Visibility | Checks display, visibility, aria-hidden, bounding rect |
| Text extraction | TreeWalker over text nodes, excludes hidden ancestors |
| Leaf preference | Skips elements if a matching descendant exists |
| Role expansion | {button} → button, [role="button"], input[type="submit"] |
| Index counting | Counts only visible, leaf-matched elements |
// Load once at class load
private static final String DRIVER_JS = loadResource("driver.js");
// Inject on-demand before wildcard evaluation
public Object script(String expression) {
if (expression.contains("__kjs")) {
ensureKjsRuntime();
}
return eval(expression);
}
private void ensureKjsRuntime() {
Boolean exists = (Boolean) evalDirect("typeof window.__kjs !== 'undefined'");
if (!Boolean.TRUE.equals(exists)) {
evalDirect(DRIVER_JS);
}
}
The driver.js is pure browser JavaScript - any driver backend can use it:
| Backend | Implementation |
|---|---|
| CDP | Inject via Runtime.evaluate |
| Playwright | Inject via page.evaluate or addInitScript |
| WebDriver | Inject via executeScript |
Future backends should implement the same injection pattern:
driver.js from resourceswindow.__kjs existsThe resolver expands certain tags to include ARIA roles:
{
'button': 'button, [role="button"], input[type="submit"], input[type="button"]',
'a': 'a[href], [role="link"]',
'select': 'select, [role="combobox"], [role="listbox"]',
'input': 'input:not([type="hidden"]), textarea, [role="textbox"]'
}
Modern web components (Lit, Shoelace, Material Web, Salesforce Lightning) use Shadow DOM to encapsulate their internals. Standard document.querySelector() cannot reach inside shadow roots.
hasShadowDOM() — cached check for any shadow roots on the page. Zero overhead on non-shadow pages.document.querySelector() firstwindow.__kjs)| Function | Description |
|---|---|
__kjs.hasShadowDOM() | Cached check: does the page have any shadow roots? |
__kjs.querySelectorDeep(sel, root) | Recursive single-element finder across shadow boundaries |
__kjs.querySelectorAllDeep(sel, root) | Recursive all-elements finder across shadow boundaries |
__kjs.qsDeep(sel) | Convenience: querySelector with shadow fallback (used by Locators.java) |
__kjs.qsaDeep(sel) | Convenience: querySelectorAll with shadow fallback (used by Locators.java) |
__kjs._getShadowText(shadowRoot) | Extract visible text from a shadow root |
| Operation | Shadow Support | Notes |
|---|---|---|
CSS selectors (#id, [attr]) | Yes | Via qsDeep() fallback in Locators.java |
Wildcard locators ({button}Text) | Yes | resolve() searches shadow roots after light DOM |
getVisibleText() | Yes | Falls back to shadow root text if no light DOM text |
exists(), click(), input(), text() | Yes | All use Locators.selector() which has shadow fallback |
| XPath locators | No | XPath is DOM-level-3, does not support shadow DOM |
CSS selectors at document scope use conditional shadow fallback:
// Generated JS for selector("#myBtn")
(window.__kjs && window.__kjs.qsDeep
? window.__kjs.qsDeep("#myBtn")
: document.querySelector("#myBtn"))
window.__kjs.qsDeep exists before calling (backward compatible)querySelector (shadow elements already resolved)Page.startScreencast streams frames