karate-js-test262/TEST262.md
ECMAScript test262 conformance harness for
karate-js. Reproducible pass/fail matrix across the ES surface area, declarative
skip list (etc/expectations.yaml), and the roadmap
for what to tackle next. Not published to Maven Central.
The bar is can karate-js run real-world JavaScript written in the wild, especially by LLMs? test262 is the scorecard; pragmatic ES6 coverage of idiomatic code is the goal — not spec-lawyer compliance for its own sake.
See also: ../karate-js/README.md — what karate-js is · ../docs/JS_ENGINE.md — engine architecture, slot family, prototype machinery, spec invariants, benchmarks. Design reference for every TODO below. · ../docs/DESIGN.md — wider project design · test262 INTERPRETING.md — authoritative test-runner spec.
This file is the roadmap. For why a TODO exists or how a subsystem is shaped, follow the JS_ENGINE.md anchors below.
Operating-mode maxims for the test262 conformance loop. Treat as load-bearing.
Real-world JS first; test262 is the scorecard, spec is ground truth. A fix that unblocks 500 idiomatic tests beats one that tightens a rare spec corner. Existing JUnit tests can be wrong: when the spec disagrees, the spec wins — fix the test along with the engine.
Errors must look like JavaScript, not Java. A raw
IndexOutOfBoundsException or at io.karatelabs.js.Interpreter.eval(...)
frame escaping Engine.eval(...) is a correctness bug, not cosmetic
noise. See JS_ENGINE.md § Exception Handling
and § Error routing & shape.
Fix friction before moving on. Bad error messages in results.jsonl,
parse-vs-runtime classification gaps, missing report fields, --single -vv
not showing what you need — stop and fix the tooling rather than working
around it.
Protect the hot path — pay edge-case cost on the edge case. Sentinels
over thrown signals, type-check rare cases after the common-case miss,
parse-time analysis over inner-loop checks. After any non-trivial engine
change, run EngineBenchmark profile and compare against
JS_ENGINE.md § Performance Benchmarks.
Code should be DRY and aligned with the JS spec. Near-duplicate dispatch and wrong-layer workarounds are clues that the layer below is wrong; collapse to a single spec-shaped seam. Fix it inline or file a Deferred TODO with concrete pointers (file, method, what the unification looks like) — vague "this could be cleaner" notes are worthless.
Batched commits are fine if the message enumerates the changes. What matters is that the commit message lets a future bisect attribute regressions.
Each session that touches the engine should:
--only before scoping. Old slice
numbers go stale fast — record fresh before/after pass counts in the
commit message and pin the run-dir.mvn -f pom.xml -pl karate-js -o test →
Tests run: 981+, Failures: 0, Errors: 0, Skipped: 2 (count grows as
SpecPinTest accretes invariants).results.jsonl against the previous
run. Zero regressions (PASS → FAIL). Document any flip in the commit
message.mvn -f pom.xml -pl karate-js -o install -DskipTests
mvn -f pom.xml -o test -pl karate-core
Tests run: 1969, Failures: 0, Errors: 0, Skipped: 1.Slice order, picked for feedback-loop speed + foundations first. Re-probe each session before scoping. Symbol is last, not first — Math/Number/Date/String are independent of Symbol.
| # | Slice | What's left |
|---|---|---|
| 1 | test/built-ins/Number/** | Object.prototype.toString.call(num) [object Number] (Symbol-gated, slice #7), MAX_VALUE / MIN_VALUE literal-form parser edge. See JS_ENGINE.md § Numeric / coercion. |
| 2 | test/built-ins/Date/** | Date.parse ISO format edges, UTC vs local hour math, invalid-date propagation. See JS_ENGINE.md § Date. |
| 3 | test/built-ins/String/** | substring / lastIndexOf / charAt edge cases tied to ToInteger spec corners, parser-blocked tests, Symbol-gated tail. Spec preamble pattern + JS-spec whitespace + replace substitution doc'd in JS_ENGINE.md § Spec preamble at built-in entry points. |
| 4 | test/built-ins/RegExp/** | Symbol.{match,replace,search,split,matchAll} (slice #7), RegExp.escape (ES2025), parser R_PAREN / Unicode-escape edges, cross-realm tests (multi-realm not modeled). See JS_ENGINE.md § Built-in accessor descriptors on prototypes and § JsRegex.replace. |
| 5 | test/built-ins/Object/** | defineProperty length-descriptor / array-length cluster (~10 fails post-Apr-27 fixes), seal (TypedArray-feature-gated), Annex-B arguments parameter-map aliasing (deferred — see Engine cleanup), Symbol-gated tail. See JS_ENGINE.md § Property attributes and § Own-key ordering. |
| 6 | test/built-ins/Array/** | splice / concat Symbol.species residuals (slice #7), parser-blocked async / generator paths, harness-feature-gated (Int8Array). See JS_ENGINE.md § Prototype machinery. |
| 7 | test/built-ins/Symbol/** + cascades | Full Symbol primitive: typeof === "symbol", unique identity, Symbol.for / keyFor / description, Object.getOwnPropertySymbols, Reflect.ownKeys. Touches Terms.typeOf / eq / coercion; PropertyKey abstraction across JsObject.props / isOwnProperty. 2–4 sessions. Unblocks the Array / String / RegExp Symbol-gated tail. |
Picked off opportunistically when nearby — not session-sized on their own.
String iterator splits surrogate pairs. IterUtils.stringIterator
walks charAt(i) so '🥰💩' yields four iterations instead of two
code-points. Spec §22.1.5.1 calls for code-point iteration. Affects
test262 Object/groupBy/string.js, Map/groupBy/string.js, and any
spec that consumes a string via for-of. Fix: walk codePointAt /
charCount in the iterator. ~1 h.
Array.prototype.values() returns raw List instead of a spec
iterator object. IterUtils.iteratorFromCallable was made lenient
(falls back to getIterator) to absorb the mismatch for set-algebra
iteration of [v].values()-style set-likes; the underlying gap is
that Array.prototype.values should return a wrapped iterator with
next / @@iterator so direct arr.values().next() works. Use
IterUtils.toIteratorObject(listIterator(...)). ~30 min.
Block-scope const redeclaration across sibling { ... } blocks.
test262 Set/prototype/{union,intersection,difference, symmetricDifference}/result-order.js declares const s1 inside each
of four sibling blocks; the second block reports
s1.union(s2) is not defined (the AST node text leaks into the
reference-error message), suggesting block scoping for const does
not isolate the declarations cleanly. Investigate parser/scope
handling for LET / CONST inside BLOCK nodes.
.length / .name rollout to remaining prototypes. JsBuiltinMethod
infra in place; most residual name.js fails are Symbol-gated.
Destructuring residuals. Lexer identifier-escape support, TDZ / init-order corners, negative parse tests needing pattern-vs-literal two-mode parsing.
Cleanup residuals. Occasional "null" NPE paths, IllegalName JDK
lambda leak, Java heap space OOM in array-slice paths.
Tracked but un-scheduled. Three flavors: feature gaps that are intentional non-goals today, engine cleanup needing a dedicated session, and harness-quality fixes. For design context behind each, see JS_ENGINE.md — anchors inline below.
Strict mode plumbing. Parse "use strict" directive prologue,
thread strictMode flag via CoreContext, flip ~7 lenient sites
(frozen-write / writable=false / read-only / set-only-accessor /
non-extensible-add / non-configurable-delete) through one
failSilentOrThrow helper. AccessorSlot.write already accepts a
strict arg. ~3–4 h. Risk: parser changes may regress
flags: [noStrict] test262 paths if directive parsing is over-eager.
Promises + async / await + setTimeout. Skipped (feature: Promise,
async-functions, Symbol.asyncIterator, include: promiseHelper.js).
karate-js is synchronous. Viable path: synchronous subset first —
Promise as eagerly-resolving thenable, async function runs
synchronously, await synchronously unwraps. Escalate to a microtask
runtime only when a real workload needs timer-driven scheduling.
Class syntax (ES6). Skipped (feature: class and friends). Engine
has prototype machinery + new but no parser support for class /
extends / super / method-definition. Parse fails loudly with
SyntaxError — the right shape until real workload demands it.
Symbol primitive. Slice #7 above. Tracked here as a feature gap
because it gates a long tail of fails across String / Array / RegExp /
Object that depend on @@iterator / Symbol.species / Symbol.toPrimitive.
Items needing a dedicated session — benchmark-gated or coordinated with other work. Design context in JS_ENGINE.md § Spec Invariants and the per-section anchors.
Prototype.toMap() rebuilds on every call. Allocates fresh
LinkedHashMap per invocation; iteration paths re-pay the cost.
Memoize on slot-map modification stamp or expose non-materializing
iterator. ~1 h. Defer until benchmark shows it matters.
HOLE → tombstone full elimination. Centralization fallback
landed; full elimination needs sparse-array storage rework (dense
list only set values; sparse positions consult props with
tombstoned DataSlot entries). Pair with parser in support.
Pinned in SpecPinTest. ~6–8 h.
HOLE leak audit at JsArray Java-interop seams. iterator(),
toArray(), toArray(T[]), subList(int,int), contains(Object),
indexOf(Object), lastIndexOf(Object) route to list.foo() raw —
HOLE escapes to Java consumers. get(int) already translates HOLE→null;
the rest don't. Centralize on a single unwrap helper (or wrap
list.iterator() in a translating Iterator). No test262 wins
expected (these are Java-interop seams), but spec-shape pinning in
SpecPinTest. Pairs with the full HOLE elimination above. ~30 min.
Abrupt-completion gap in non-if control-flow statements.
Interpreter.evalIfStmt now checks context.isStopped() after
evaluating the test expression so a thrown error in the condition
skips both branches (was: condition's throw returned undefined, the
truthy check read it as falsy, the else-branch ran silently — pinned
in SpecPinTest.ifConditionThrowPropagatesToCatch /
…ThrowInOrChain_propagates). The same gap exists at
evalWhileStmt / evalDoWhileStmt / evalForStmt (loop-test
expressions), evalSwitchStmt (switch discriminant + per-case),
evalTernary (?: test), and evalLogicalExpr (&& / || short-
circuit). The engine uses context.stopAndThrow sentinels rather
than Java exceptions, so each control-flow site that consumes a
sub-expression must check isStopped() before acting on the result.
Audit and fix in one pass with pins. ~1 h.
PropertyKey abstraction. Symbol prep. Defer to slice #7 itself —
introducing PropertyKey ahead of a concrete consumer is YAGNI.
Arguments → spec exotic Arguments object. Now a JsArray (cached
per-call frame) — supports arguments[i] / .length / iteration /
property writes. Not yet: arguments.callee (deprecated, strict-mode
TypeError), dynamic alias to formal parameters in non-strict mode
(function f(x){ arguments[0]=2; return x } → 2),
Object.prototype.toString.call(arguments) === "[object Arguments]".
Subclass JsArguments extends JsArray with the alias map +
Symbol.toStringTag override when a real workload demands.
CreateDataPropertyOrThrow + ArraySpeciesCreate proper. Array
result-allocation methods (slice / concat / splice / map /
filter / flat / flatMap) currently new JsArray(new ArrayList<>())
then populate via .add(). Spec shape uses ArraySpeciesCreate(O, len)
(depends on Symbol.species — slice #7) and
CreateDataPropertyOrThrow(A, k, v) per element. Defer until Symbol
lands; current shape passes ~all non-Symbol-gated tests.
Cases where current behavior is observably non-spec but not yet a slice priority. Pick up when the relevant slice surfaces them.
JsArray.handleLengthAssign return value dropped on direct
assignment. Returns boolean for "throw TypeError on writable=false /
partial-truncate"; only defineLength consumes it. Direct
arr.length = X silently no-ops on writable=false. Spec wants
TypeError under strict. Pair with the strict-mode plumbing TODO above.
ToObject coercion for non-empty string descriptor sources.
Non-empty strings short-circuit to TypeError matching the spec
end-state but skip the wrapper-iterate-each-char pipeline. Become
spec-shaped when adjacent.
JsArray.jsEntries vs [[OwnPropertyKeys]] semantics asymmetry.
jsEntries yields indices only (correct for Array.prototype.*).
For-in / Object.keys / defineProperties want indices PLUS named
non-index props. JsObjectConstructor works around it via its own
ownKeys helper. Cleaner long-term: split into arrayEntries(ctx)
vs ownEntries(ctx). ~1 h once a fourth caller surfaces the bug.
ToPropertyKey for ObjectLike — remaining no-ctx callers.
defineProperty now passes ctx through and toPropertyKey(o, ctx)
routes ObjectLike receivers through spec ToPrimitive(string) →
ToString (throws TypeError when neither toString nor valueOf yields
a primitive). Still on the no-ctx toPropertyKey(o) path:
JsObjectConstructor.hasOwn and getOwnPropertyDescriptor (both
wired as JsInvokable, no Context arg). Migrate when a real workload
passes non-string keys to either; switching the wiring to
JsCallable is the mechanical part. See
JS_ENGINE.md § Property attributes.
Integer-index accessors beyond JsArray.list.size().
JsArray.jsEntries Phase 1 walks 0..list.size(), Phase 2 yields
non-index named props in insertion order. An accessor installed at
e.g. index 100 on an empty array (via defineProperty(arr, "100", {get, enumerable:true})) is missed by both — Phase 1's bound, Phase 2's
parseIndex >= 0 skip. Spec §9.4.2 [[OwnPropertyKeys]] says
ALL integer-index keys ascend first. The defineOwnAccessor HOLE-pad
loop is the current workaround (extends list to idx + 1 so Phase 1
reaches the slot), but it allocates idx HOLE entries — wasteful at
large indices and a correctness liability if a later spec-shape change
drops the pad. Real fix: track integer-index entries in namedProps
and merge into Phase 1's ascending walk (or extend Phase 1's bound
to include the max integer-index key in namedProps). Pair with the
HOLE elimination above.
JsGlobalThis two-store reads. Data descriptors live on
BindingsStore, accessor descriptors on JsObject.props (because
BindingSlot is data-only). getMember / isOwnProperty / toMap
consult both, but the split is a smell. Either extend BindingSlot
to carry an AccessorSlot side-table, or commit to a unified
two-store contract with a single authoritative read seam. ~2 h. See
JS_ENGINE.md § Globals.
Symbol.toPrimitive is not dispatched. Matches our minimal
Symbol surface. Fix as part of slice #7.
(0, fn)() indirect call this-binding.
(1, Object.prototype.valueOf)() should pass undefined as this
(per spec the comma operator drops the reference base), invoking the
spec preamble's TypeError. Today the receiver still falls through to
globalThis. Audit evalCallExpr for the parenthesized comma case.
Replace hand-rolled YAML parser with SnakeYAML. Expectations.java /
Test262Metadata.java break on # in quoted reasons, block-scalar
description: fields, and block-form list values. Swap when next
touched.
--resume refreshes stale records. Currently echoes records for
tests that no longer exist or are now SKIP'd. Gate on test-still-exists
--resume-crash-only.Cache parsed harness ASTs, not source text. HarnessLoader
re-parses assert.js, sta.js, and per-test includes: on every
test; ~50k re-parses per full-suite run.
Surface per-test console capture in ResultRecord. evaluate(...)
already wires Engine.setOnConsoleLog(...) to a per-test sink
(discarded); plumb into ResultRecord for --single -vv and the
HTML drill-down.
phase: resolution (module-resolution) negative tests. Conflated
with runtime today. Modules globally skipped, so latent.
Structured $262 surface. AbstractModuleSource, IsHTMLDDA,
agent.broadcast/getReport/sleep/monotonicNow absent; all
feature-gated, unreachable today. Add stubs when a feature comes off
the skip list.
Parallel execution. Prior attempts showed no speedup (thread
context-switch beat ~1 ms per-test cost) and the engine doesn't poll
Thread.interrupt(). Revisit when per-test cost grows or the engine
learns cooperative abort.
readHeadSha path fragility. Test262Runner.readHeadSha walks up
cli.test262.getParent().getParent(). Prefer git rev-parse HEAD
or --karate-sha.
Commit target/test262/results.jsonl once stable. Gitignored
today; re-evaluate when engine churn slows.
All commands run from karate-js-test262/ (the runner resolves
etc/expectations.yaml and test262/ relative to cwd). Use -f ../pom.xml
so Maven finds the parent reactor. After any change under karate-js/,
re-install it first — the runner uses the karate-js jar from your local
Maven repo, not from the reactor.
cd karate-js-test262
etc/fetch-test262.sh # first time only — shallow clone
etc/run.sh --only 'test/language/expressions/addition/**' # install + run + HTML
open target/test262/html/index.html
Every run writes a fresh, self-contained directory:
target/test262/run-<timestamp>/ containing results.jsonl,
results.jsonl.partial, run-meta.json, progress.log, and html/. Old
runs are never touched (clean up with mvn clean). The runner prints
Run dir: <path> on completion; pass that to Test262Report --run-dir <path> to render the HTML. etc/run.sh plumbs the path through
automatically.
etc/run.sh does install + run + HTML in a single command:
etc/run.sh # full suite + HTML
etc/run.sh --only 'test/language/**' --max-duration 300000 # 5-min cap
# Live tap (path is printed by the runner; substitute the actual <ts>):
tail -f target/test262/run-<ts>/progress.log # live progress
tail -f target/test262/run-<ts>/results.jsonl.partial # live per-test JSONL
# 1. After editing karate-js/, refresh the local Maven repo
mvn -f ../pom.xml -pl karate-js -o install -DskipTests
# 2. Run the suite (sequential; minutes to tens of minutes)
# Run-dir defaults to target/test262/run-<timestamp>/; the runner prints it.
mvn -f ../pom.xml -pl karate-js-test262 -o exec:java \
-Dexec.args="--only test/language/expressions/**"
# 3. HTML report — required: --run-dir pointing at step 2's output
mvn -f ../pom.xml -pl karate-js-test262 -o exec:java \
-Dexec.mainClass=io.karatelabs.js.test262.Test262Report \
-Dexec.args="--run-dir target/test262/run-<ts>"
Why install instead of -am: exec:java is a direct goal, not a phase,
so Maven invokes it on every selected reactor project. With -am, the reactor
also includes karate-parent (no mainClass), and the goal aborts the run
before our module is reached. Installing karate-js to the local repo first,
then running without -am, sidesteps the reactor entirely.
| Flag | Default | Purpose |
|---|---|---|
--expectations <path> | etc/expectations.yaml | skip list manifest |
--test262 <path> | test262 | suite clone dir |
--run-dir <path> | target/test262/run-<timestamp>/ | output dir for this run; everything (results.jsonl, run-meta.json, progress.log, html/) goes inside |
--timeout-ms <n> | 10000 | per-test watchdog (infinite-loop guard) |
--max-duration <ms> | 0 (unlimited) | overall wall-clock cap; writes partial results and prints Aborted: on hit |
--only <glob> | — | restrict to matching paths |
--single <path> | — | run one test, no file writes |
-v / -vv | off | (with --single) -v prints parsed metadata + source location; -vv adds full source |
Runs are silent except failures + periodic progress. A FAIL <path> — <type>: <msg> line prints to stdout per failure; a [progress] line every
500 tests or every 5 seconds. Banner / progress / summary lines are mirrored
to <run-dir>/progress.log via Logback so tail -f is a lightweight live
view. Per-FAIL detail lives only in JSONL (mirroring would duplicate without
new signal).
The runner uses a single-thread ExecutorService to enforce --timeout-ms
per test. The karate-js engine doesn't poll Thread.interrupt(), so
cancel(true) can't stop the underlying thread. When a timeout fires, the
runner retires the executor (shuts it down, creates a fresh one) so
subsequent tests don't queue behind the stuck thread. Net cost of a genuine
hang: one abandoned daemon thread, one Timeout row in results.jsonl, a few
ms of recreate overhead.
Pass --max-duration <ms> so a catastrophic engine bug can't wedge the
caller. The periodic [progress] line gives a machine-parseable heartbeat —
stream stdout and tail the most recent line without waiting for the run.
There is only one concept: SKIP. A test matching any rule in
etc/expectations.yaml is not run and appears as
{"status":"SKIP",...} in results. Everything else is attempted; failures
are failures.
Match order: paths → flags → features → includes. First match wins.
Every entry requires a reason.
Precedence example. A test at test/language/statements/class/foo.js
with flags: [module] and features: [Symbol] is skipped with the module
reason (the flags match fires before features is consulted). If you want
features: [Symbol] to win, don't have a matching flag rule.
Starter set covers Symbol, BigInt, generators, class syntax, Proxy, Reflect,
Promises, async/await, Temporal, TypedArray beyond Uint8Array, WeakRef,
ArrayBuffer, and the suite directories test/intl402/, test/staging/,
test/annexB/. To add a skip: edit the YAML under the right section with a
reason. To remove a skip: delete the entry, re-run the relevant --only
glob, debug failures with --single -vv.
Two JSONL files during a run:
<run-dir>/results.jsonl.partial — appended per test as results
arrive, flushed per write. Run order, not sorted. Deleted on clean
exit; preserved on abort (--max-duration hit, Ctrl-C, JVM kill).<run-dir>/results.jsonl — canonical output, sorted alphabetically
by path, atomically written at end-of-run (tmp + rename). This is what
tooling reads and what --resume skips-seen against.Example line shape (same in both):
{"path":"test/language/expressions/addition/S11.6.1_A1.js","status":"PASS"}
{"path":"test/.../something.js","status":"FAIL","error_type":"TypeError","message":"foo is not a function"}
{"path":"test/.../bigint-test.js","status":"SKIP","reason":"BigInt not supported"}
Error types are classified into:
SyntaxError | TypeError | ReferenceError | RangeError | Error | Timeout | Harness | Unknown by inspecting message prefixes (the engine emits
"TypeError: ..." style messages at most failure sites). The classifier
itself is in ErrorUtils.
mvn -pl karate-js-test262 -o exec:java \
-Dexec.args="--single <path> -vv"
-v prints parsed YAML metadata (description / flags / features / includes
/ negative), the classification, and — if the engine attached a position
— a location: <path>:<line>:<col> line. -vv additionally prints the
full test source. --single does no file writes; pass any test path
(relative to the test262 clone root) directly. No HTML drill-down page is
generated — the details.html report shows path + error_type + message
inline; for full source / repro context, run --single -vv directly.
python3 -c "
import json, sys
def state(p): return {json.loads(l)['path']: json.loads(l)['status'] for l in open(p)}
prev = state(sys.argv[1])
curr = state(sys.argv[2])
regressed = sorted(p for p in prev if prev[p]=='PASS' and curr.get(p)=='FAIL')
new_pass = sorted(p for p in prev if prev[p]=='FAIL' and curr.get(p)=='PASS')
print(f'Regressed (PASS->FAIL): {len(regressed)}')
for p in regressed: print(f' {p}')
print(f'New PASS (FAIL->PASS): {len(new_pass)}')
for p in new_pass: print(f' {p}')
" target/test262/run-<prev>/results.jsonl target/test262/run-<curr>/results.jsonl
This is the per-session safety check against the slice baseline.
The conformance suite allocates a fresh Engine per test (~50k tests); small
regressions compound into minutes of wall time. Prefer profile mode — the
30 s warm loop is JIT-stable and directly comparable to the
reference table in JS_ENGINE.md.
mvn -pl karate-js -q test-compile
# Profile mode (30 s warm loop; JIT-stable, ~16k iterations averaged).
java -cp "karate-js/target/classes:karate-js/target/test-classes:$(find ~/.m2/repository -name 'slf4j-api-*.jar' | head -1):$(find ~/.m2/repository -name 'json-smart-*.jar' | head -1):$(find ~/.m2/repository -name 'accessors-smart-*.jar' | head -1):$(find ~/.m2/repository -name 'asm-9*.jar' | grep -v asm-tree | grep -v asm-commons | head -1)" \
io.karatelabs.parser.EngineBenchmark profile
# Fast mode (median of 10 cold runs) — noisy, gut-check only
java -cp "…same classpath…" io.karatelabs.parser.EngineBenchmark
If averages move >±10%, understand why before merging. If unavoidable
(correctness > speed), update the reference table in JS_ENGINE.md in the
same commit.
Edit TEST262_SHA=... at the top of etc/fetch-test262.sh, delete the local
test262/ directory, re-run the script. All subsequent runs use the new
commit. Coordinate bumps with whoever else is iterating — the suite itself
evolves.
| Symptom | Likely cause / fix |
|---|---|
expectations file not found: etc/expectations.yaml | Wrong directory. cd karate-js-test262 first. |
test262 directory not found: test262 | Haven't run etc/fetch-test262.sh yet. |
Failed to execute goal ... exec-maven-plugin ... on project karate-parent: 'mainClass' ... missing | Used -am with exec:java. Don't — install karate-js separately and run without -am. |
| Engine change has no effect on test262 output | Forgot mvn ... -pl karate-js -o install -DskipTests. The runner uses the local Maven repo jar, not the reactor classpath. |
Test262Report says --run-dir <path> is required | Pass the path the runner printed on completion: --run-dir target/test262/run-<ts>. etc/run.sh does this for you. |
| Where's my report? | The runner prints Run dir: <path> on completion. Look in <path>/html/index.html. Each invocation creates a fresh run-<timestamp>/ dir; nothing is overwritten. |
| Suite hangs on one test | Infinite loop; watchdog kicks in at --timeout-ms. The inner executor is retired and replaced; a genuine hang leaks one daemon thread and keeps going. Bisect with --only, or add --max-duration as a safety net. |
| Driving from a script that must not block | Pass --max-duration <ms>. On hit, partial results written and Aborted: replaces Summary:. |
| Tests that used to pass now fail | Run EngineBenchmark too — perf regression sometimes manifests as timeouts before correctness. |
target/test262/ growing unbounded across iteration sessions | No auto-pruning; each run writes its own run-<ts>/. mvn clean wipes the lot. |
karate-js-test262/
├── TEST262.md # this file
├── pom.xml # Maven module (deploy explicitly disabled)
├── etc/
│ ├── expectations.yaml # declarative SKIP list (committed)
│ ├── fetch-test262.sh # shallow clone of tc39/test262 at pinned SHA
│ └── run.sh # one-shot: install + run + HTML
├── src/main/java/…/test262/ # runner + report + helpers
├── src/test/java/…/test262/ # unit tests for the harness itself
├── src/main/resources/report/ # HTML/CSS/JS templates for the report
├── src/main/resources/logback.xml # logger config (file appender → target/test262/)
├── test262/ # [gitignored] the cloned suite
└── target/test262/ # [gitignored] one subdir per run
└── run-<timestamp>/ # self-contained per-run dir
├── results.jsonl # per-test pass/fail/skip, sorted by path (end of run)
├── results.jsonl.partial # live feed — appended per test, flushed; deleted on clean exit, kept on abort
├── run-meta.json # per-run context (test262 SHA, karate-js ver+SHA, JDK, OS, started/ended, counts)
├── progress.log # banner + [progress] lines + final summary
└── html/ # two-file static HTML report
├── index.html # tree + per-slice summary tiles
└── details.html # full per-test list with search + status filter
Each run is self-contained and immutable; old runs persist until mvn clean.
The CI workflow uploads target/test262/ (parent) as a single artifact.
A workflow_dispatch-only workflow at
.github/workflows/test262.yml runs
etc/fetch-test262.sh + the runner + the report, and uploads the whole
target/test262/ directory as a single artifact. Never triggered
automatically — kick off from the Actions tab when you want a fresh run. Two
inputs (only and timeout_ms) default to full-suite / 10 s per test.
The module's pom.xml sets maven.deploy.skip=true / gpg.skip=true /
skipPublishing=true so the release workflow does not publish this module to
Maven Central.