karate-js-test262/TEST262.md
ECMAScript test262 conformance harness for
karate-js. Reproducible pass/fail matrix across the ES surface area, declarative
skip list (etc/expectations.yaml), and the roadmap
for what to tackle next. Not published to Maven Central.
The bar is can karate-js run real-world JavaScript written in the wild, especially by LLMs? test262 is the scorecard; pragmatic ES6 coverage of idiomatic code is the goal — not spec-lawyer compliance for its own sake.
See also: ../karate-js/README.md — what karate-js is · ../docs/JS_ENGINE.md — engine architecture, slot family, prototype machinery, spec invariants, benchmarks. Design reference for every TODO below. · ../docs/DESIGN.md — wider project design · test262 INTERPRETING.md — authoritative test-runner spec.
This file is the roadmap. For why a TODO exists or how a subsystem is shaped, follow the JS_ENGINE.md anchors below.
Operating-mode maxims for the test262 conformance loop. Treat as load-bearing.
Real-world JS first; test262 is the scorecard, spec is ground truth. A fix that unblocks 500 idiomatic tests beats one that tightens a rare spec corner. Existing JUnit tests can be wrong: when the spec disagrees, the spec wins — fix the test along with the engine.
Errors must look like JavaScript, not Java. A raw
IndexOutOfBoundsException or at io.karatelabs.js.Interpreter.eval(...)
frame escaping Engine.eval(...) is a correctness bug, not cosmetic
noise. See JS_ENGINE.md § Exception Handling
and § Error routing & shape.
Fix friction before moving on. Bad error messages in results.jsonl,
parse-vs-runtime classification gaps, missing report fields, --single -vv
not showing what you need — stop and fix the tooling rather than working
around it.
Protect the hot path — pay edge-case cost on the edge case. Sentinels
over thrown signals, type-check rare cases after the common-case miss,
parse-time analysis over inner-loop checks. After any non-trivial engine
change, run EngineBenchmark profile and compare against
JS_ENGINE.md § Performance Benchmarks.
Code should be DRY and aligned with the JS spec. Near-duplicate dispatch and wrong-layer workarounds are clues that the layer below is wrong; collapse to a single spec-shaped seam. Fix it inline or file a Deferred TODO with concrete pointers (file, method, what the unification looks like) — vague "this could be cleaner" notes are worthless.
Batched commits are fine if the message enumerates the changes. What matters is that the commit message lets a future bisect attribute regressions.
Aggregate, don't dump — context is precious. A full run is ~53k JSONL rows. Treat run output as files to query, not streams to tail. Full rules in Context discipline.
Playbook hygiene is the work, not a chore. Stale counts, "past wins" narration, log patterns that flood context, JSONL the queries can't parse — fix the rot inline in the session that surfaced it. Fix it at the writer, not in a workaround. A playbook future sessions can trust is worth more than a museum piece.
Refactor — or rewrite — boldly; the regression net is the license.
This repo carries an unusually strong safety net: the test262 language
slice with byte-for-byte FAIL-set diffing (Diff two run-dirs),
1086+ unit tests with SpecPinTest spec-invariant pins, 2224+ karate-core
consumer tests, and JIT-stable benchmarks. That net exists so you can do the
right structural thing instead of accreting another local workaround. When a
subsystem is fighting you — near-duplicate traversals, a check at the wrong
layer, a seam that every new feature has to special-case — you are empowered to
restructure or rewrite it, not just patch around it. This is the active form of
principle #5: #5 says spot the wrong-layer smell; #9 says act on it. The
discipline that makes boldness safe, not reckless: (a) state the smell and the
target shape before cutting; (b) keep behavior-preserving refactors and new
behavior in separate commits; (c) gate every such change on the full net
— unit tests, test/language/** 0-regression diff, EngineBenchmark profile
within budget, karate-core consumer check — and quote the before/after in the
commit. A refactor that the net certifies as behavior-identical is always
cheaper than the compounding cost of the workaround it removes. (Worked example:
the 2026-05-30 fused early-error walk — three full-tree validation passes
collapsed to one, ~13% of parse CPU reclaimed, FAIL set byte-for-byte identical.
See Engine — cleanup → Fuse the early-error parse walks.)
Each session that touches the engine should:
--only before scoping. Old slice
numbers go stale fast — record fresh before/after pass counts in the
commit message and pin the run-dir. If target/test262/ has no
run-* dirs yet (clean clone, or after mvn clean), your first
--only invocation is the baseline; pin its run-dir in the commit
so the next session has a diff target.mvn -f pom.xml -pl karate-js -o test →
Tests run: 1086+, Failures: 0, Errors: 0, Skipped: 2 (count grows as
SpecPinTest accretes invariants).results.jsonl against the previous
run. Zero regressions (PASS → FAIL). Document any flip in the commit
message.mvn -f pom.xml -pl karate-js -o install -DskipTests
mvn -f pom.xml -o test -pl karate-core
Tests run: 2224+, Failures: 0, Errors: 0, Skipped: 3.A full conformance run is ~53k JSONL rows. A slice (test/language/**)
emits one FAIL <path> — <type>: <msg> line per failure on stdout plus a
growing results.jsonl.partial. Per-test -vv dumps full source. Pulling
any of this raw into your context burns the budget you need for the
actual engineering work. Treat run output as files to be queried, not
streams to be tailed.
Rules:
Never tail -f or cat a full progress.log / results.jsonl.
For liveness, tail -n 1 <progress.log> returns the last heartbeat
(processed N pass M fail K skip L @ rate) — that's tests-done
authoritatively in either mode. (wc -l <partial> counts only
FAIL+SKIP in dev mode, so don't use it for total-processed.) For
slicing use the Failure triage jq one-liners.
Default --single to -v, not -vv. -v prints metadata +
classification + the engine's location: <path>:<line>:<col> —
usually enough to find the call site. Escalate to -vv (full test
source) only after -v fails to localize the cause.
Cap diff output. When comparing two run-dirs, emit counts + top-N representative paths + per-slice cluster breakdown. Never the full regressed / new-pass lists. The Diff two run-dirs recipe is already capped — use it as written.
Delegate slice runs to a sub-agent with a strict return contract.
Spawn a general-purpose agent (it has Bash) and require a ≤200-word
digest: pass/fail/skip counts, top 3 failure clusters with one
example each, anything surprising. The agent reads the full output;
you receive the digest. See Delegate a slice run
for the exact prompt template.
Prefer reading engine source over reading log streams. A FAIL
line tells you what threw; the engine source tells you why.
Once you have one representative failing path and the call site
from --single -v, close the JSONL and work from the code — the
slice re-run to confirm the fix is a single etc/run.sh --only
away (delegate it).
Mvn output is verbose — pipe to tail -n 30. Unit tests,
benchmark, and karate-core consumer check from the
per-session ritual all dump compile noise
before the summary. mvn ... -o test 2>&1 | tail -n 30 is enough
to see Tests run: ... and any failures. Use -q to suppress
compile chatter when you don't need it.
etc/expectations.yaml is 175 lines — fine to Read whole when
editing the skip list. Long-form files in target/test262/run-*/
are not — query them.
Strict mode + onlyStrict — DONE and un-skipped. The keystone landed: the
parser tracks lexical strictness (JsParser.checkStrictEarlyErrors, a strict-gated
post-parse walk: program prologue → function-body prologue → always-strict class
bodies) and enforces the simple-binding early errors (legacy/non-octal-decimal
literals 0755/08, eval/arguments as assign/update target or function-name /
param / var-binding name) plus the full BoundNames walk over binding patterns
(collectBoundNames — duplicate names in arrow params / non-simple parameter lists /
catch params, and eval/arguments bound inside any pattern). The runner prepends a
"use strict" directive for flags: [onlyStrict] (Test262Runner.evaluate), and
the flag: onlyStrict skip is removed. Measured onlyStrict un-skip
(test/language/**): 282 PASS / 146 FAIL, 0 regressions (lang pass 5433 → 5715).
⚠️ This only worked once a latent tooling bug was fixed — etc/run.sh ran the
runner via exec:java, which does not recompile, so edits to Test262Runner
(the strict-prepend) silently never took effect; run.sh now test-compiles the
module first. The prepend had measured as a no-op (71/357) for a full cycle because
of this.
Next up — negative parse-phase early errors (the dominant test/language
cluster). A probe of test/language/statements/** buckets these under the new
MissingParseError error type (negative phase: parse tests the engine parses
instead of rejecting — see Results schema): 141 remain in
the statements slice (was 183; −42 from the declaration-in-statement-position
sweep below). Use it to scope: jq -r 'select(.error_type=="MissingParseError").path'.
Function- AND lexical/class-declaration-in-statement-position, AND per-scope
lexical-redeclaration are DONE (see Background sweeps — the latter cleared the
whole switch/syntax/redeclaration/** cluster, 24→0). The residual is the long
tail of other early errors: escaped-keyword / reserved-word misuse,
break/continue to undefined labels, new.target / super outside a method,
getter/setter arity, etc. Pick the next sub-cluster by count from the
MissingParseError histogram. The remaining for-(of|in)/dstr/** cluster (~56) is
a fragmented long tail of distinct destructuring-pattern rules — lower leverage per
unit effort. One known un-enforced parser corner carried over: a "use strict"
prologue inside a non-simple-parameter-list function is itself an early error.
(Lexical duplicate-BoundNames for let/const patterns — let {a,a}=… — is now
covered by the redeclaration walk's per-VAR_DECL BoundNames collection.)
Also residual from the onlyStrict un-skip: (2) ~16 runtime SyntaxError
not thrown; (3) ~14 strict-assignment runtime TypeError (arguments-object
write guards et al.). with-statement early error stays deferred (with lexes as
a call; statements/with/** path-skipped — negligible payoff). Details:
Deferred TODOs → Strict mode.
Beyond that, remaining work is concentrated in test/language/**, dominated by
destructuring-assignment pattern parsing (see Background sweeps).
Symbol stays parked — real-world JS doesn't use Symbol(...), and the
well-known symbols (@@iterator / @@toPrimitive / @@toStringTag)
already work as string stand-ins. For current pass/fail/skip counts,
query the latest run-dir (Recipes → Failure triage) —
counts go stale fast and don't belong in this file.
Qualitative verdict from a scoped probe of the data-type built-ins (the
methods business-rules and logic scripts actually lean on). Counts rot,
so they're omitted — re-probe with --only 'test/built-ins/<X>/**' for
fresh numbers. The shape of what's solid vs. gapped is the durable part:
toFixed/parseInt/parseFloat/
toString(radix)/isNaN/isInteger; Date parse/format/getTime/
getFullYear/arithmetic; keys/values/entries/assign/freeze/
create/getPrototypeOf/hasOwn/spread/fromEntries all work.
Residual fails are spec-corner arg-validation (missing TypeError on
bad args), descriptor-attribute edges, and Symbol gates — not core
method behavior. Object had zero Java-leak rows.split/replace/slice/
substring/indexOf/includes/trim/pad/case and push/map/
filter/reduce/slice/concat/find/sort/from/spread all work.
The low raw pass-% is dominated by strict coercion-error semantics and
Symbol/feature gates, not everyday breakage. Caveats: a few principle-#2
Java leaks — String.prototype.replaceAll/endsWith Range […) out of bounds, Array at near-2³³ lengths (Index out of bounds / VM-size /
heap). Narrow but real; see Cleanup residuals.test/exec/match/replace (string + function replacer) / split /
search, g/i/m flags, and named-group capture (m.groups.name,
$<name> substitution, the function-replacer groups arg) all work —
Java Pattern is the backend. Remaining gaps: lookbehind, unicode
property escapes, /v flag, group-name early-error validation; plus
null-arg Java leaks in exec/test and one catastrophic-backtracking
timeout. The Symbol.{replace,match,matchAll,split,search} protocol
fails are conformance-only — the everyday str.replace(re, fn) path does
not route through them and works.Bottom line for the target workload: String/Number/Date/Array/Object
are dependable, and RegExp now covers the common path including named
groups; the residual RegExp tail (lookbehind / unicode escapes / /v /
early-error validation) is advanced-pattern territory.
| Slice | What's blocking it |
|---|---|
test/language/statements/for-of | IteratorClose done — on destructuring (normal/throwing/non-object return(), rest-skips-close) AND on abrupt loop exit (break/return/throw closes the outer iterator); body-skip-on-abrupt-binding; member-expression LHS (for (x.attr of …)) is now PutValue (invokes setters) not a var declaration — this also fixed the body-put-error.js infinite-loop hang. (Interpreter.destructurePattern/evalForStmt + JsIterator.close.) Remaining: assignment-pattern target-eval-order ([ obj[sideEffect()] ] of … must evaluate the target reference before stepping the iterator — the *thrw-close* family, a rare spec corner); fn-name inference for [x = (function(){})] of …; negative-parse tightenings; array-elem-put-let.js-style ReferenceError-on-bad-target (now fires under in-body "use strict"; onlyStrict variants stay SKIP). |
test/language/expressions/object | Escaped-keyword cover-name dominates; __proto__-duplicate edges; computed-key / spread / method-def tail. |
test/language/expressions/assignment | Iterator-return semantics on default-expr throw. |
test/language/{statements,expressions}/function + arrow-function | fn-name inference for [x = (function(){})]-style defaults; IteratorClose-on-throw; rest-element edges. |
test/language/expressions/compound-assignment | Strict-mode ReferenceError on undeclared LHS now fires under in-body "use strict" (the onlyStrict-flagged variants stay SKIP until the runner runs a strict pass); valueOf / ToNumeric ordering for += / *=; A5.*_T2/T3 family (non-identifier LHS — Annex-B carve-out). |
test/language/statements/{try,for,switch} | Control-flow tail; abrupt-completion handles headline cases. |
test/built-ins/Array/** | splice / concat Symbol.species (Symbol-gated). |
test/built-ins/RegExp/** | Named-group capture access done (result.groups + $<name> + function-replacer groups arg; see Background sweeps). Residual: group-name early-error validation, Symbol.{match,replace,search,split,matchAll} protocol (Symbol-gated, conformance-only — everyday str.replace(re,fn) doesn't use it), lookbehind / unicode-property-escapes / /v flag (feature-gated), one functional-replace-global ordering case. Null-arg Java leaks + one catastrophic-backtracking timeout in exec/test (principle #2 — see Cleanup residuals). |
test/built-ins/String/** | substring / lastIndexOf / charAt ToInteger corners; parser-blocked; Symbol-gated tail; replaceAll/endsWith Range […) out of bounds Java leaks (principle #2). See JS_ENGINE.md § Spec preamble at built-in entry points. |
test/built-ins/Object/** | Descriptor edges; seal (TypedArray-gated); Annex-B arguments aliasing. See JS_ENGINE.md § Property attributes. |
test/built-ins/JSON/** | JSON.stringify reviver/replacer 2-arg semantics; -0/__proto__ parser tail. Calibration: run JSONTestSuite — see JS_ENGINE.md § Future TODO Items. |
test/built-ins/Number/** | [object Number] (Symbol-gated) + a literal-form parser edge. |
test/built-ins/Date/** | ISO format edges + invalid-date propagation. See JS_ENGINE.md § Date. |
test/built-ins/Symbol/** (parked) | Symbol primitive. Deprioritized — no real-world code uses it. Pick up after the language work. |
Picked off opportunistically when nearby — not session-sized on their own.
Function-declaration-in-statement-position early error — DONE.
JsParser now rejects a FunctionDeclaration used as the sole body of a
Statement clause (its body STATEMENT directly wraps an FN_EXPR; a braced
body is a BLOCK and stays legal). Loop bodies (for / for-in / for-of /
while / do-while) are an early error in BOTH modes — no Annex B carve-out —
so they live in validateEarlyErrors (checkNoFunctionDeclarationBody). The
if/else clause is sloppy-legal (Annex B.3.4) but a strict-mode early error,
so it rides the strict-gated checkStrictEarlyErrors walk. Labelled-function
declarations (label: function f(){}) are NOT covered — the parser has no
LABELLED node type. Slice delta (test/language/statements/**): ~13 PASS, 0
regressions (if 8, plus one each in for/while/do-while/for-in/
for-of). Pinned by SpecPinTest.functionDeclAsLoopBody_* / functionDeclAsIfBody_*. Invariant recorded in
JS_ENGINE.md § Strict Mode Policy.
Lexical/class-declaration-in-statement-position early error — DONE.
JsParser now also rejects a LexicalDeclaration (let/const) or a
ClassDeclaration used as the sole body of an if/else/loop clause
(checkNoLexicalOrClassDeclarationBody, beside the function-decl helper, called
from the mode-independent validateEarlyErrors). Unlike FunctionDeclaration these
have no Annex B carve-out, so they are an early error in BOTH modes for every
clause including if/else (§13.6/§14.x). var hoists and stays legal; a braced
body is a BLOCK. The let-vs-var distinction lives on the VAR_STMT keyword
token, not VAR_DECL (isLexicalVarStmt). One sloppy-mode subtlety handled: let
is not reserved, so if (x) let\n y is let-the-identifier + ASI (the only
forbidden let-form at ExpressionStatement start is let [); a LineTerminator
after a let keyword (lineTerminatorFollows, scanning across WS/comments to the
next primary token) means it is NOT a lexical declaration here — const is
reserved and has no such escape. Slice delta (test/language/statements/**):
+42 PASS, 0 regressions (MissingParseError 183 → 141), dominated by the
let/syntax + const/syntax statement-position families. Pinned by
SpecPinTest.lexicalOrClassDeclAsClauseBody_isAlwaysParseError / letAsIdentifierWithLineTerminator_isNotALexicalDeclaration.
Per-scope lexical-redeclaration early error — DONE. JsParser now enforces,
per lexical scope (Script, function body, plain Block, switch CaseBlock), that
LexicallyDeclaredNames has no duplicates and does not intersect VarDeclaredNames
(§14.2.1 / §14.12.1 / §16.1.1). It rides the strict-aware checkStrictEarlyErrors
walk (checkScopeRedeclaration + checkSwitchCaseBlockRedeclaration +
collectVarNames + declarationName + directStatements) because the only
Annex B.3.3 relaxation is strict-gated: a duplicate bound solely by
FunctionDeclarations is sloppy-legal but a strict early error (e.g. { function f(){} function f(){} }). The lexical∩var clash has no carve-out (always an error).
FunctionDeclarations are lexical in a Block/CaseBlock but var-scoped at Script /
function-body top level (the topLevel flag, derived for a BLOCK from whether its
parent is a function via Node.getParent). A hot-path guard returns immediately
when a scope has no lexical declarations (keeps the benchmark flat). Error message
aligned to the existing runtime wording (identifier 'X' has already been declared, CoreContext) so it reads the same whichever layer catches it; the two
JUnit tests that asserted the old runtime message (EvalTest.testConstRedeclare,
EngineTest.testConstRedeclareAcrossEvals) — correct in spirit, these ARE
phase: parse early errors — still pass unchanged (REPL cross-eval redeclaration
stays legal since each eval parses independently). Also subsumes the
previously-deferred let {a,a}=… duplicate-pattern rule (per-VAR_DECL BoundNames).
Slice delta (test/language/**): switch redeclaration cluster 24 → 0,
+134 PASS overall, 0 regressions. Pinned by SpecPinTest.duplicateLexicalNamesInScope_* / lexicalNameClashingWithVar_* / duplicateFunctionDeclarations_areSloppyLegalStrictError.
⚠️ Three positive tests still FAIL with the same has already been declared
message but from the runtime CoreContext check, not the parser — pre-existing
env-scoping gaps unrelated to this change (see Deferred TODOs → spec alignment):
indirect (0,eval)(...) must get a distinct declarative environment, and a switch
CaseBlock must get its own block environment at runtime (scope-lex-close-case.js).
C-style for per-iteration let/const environment — DONE.
Interpreter.evalForStmt now models §14.7.4.3 ForBodyEvaluation properly: the
test + body run in one iteration scope (so a body closure captures that
iteration's distinct binding), then a fresh scope is seeded from the body's
end-of-iteration values and the increment runs in it. Fixes the infinite-loop
hang on an in-body update with no increment clause (for (let x = 0; x < 10;) { x++; } — previously the per-iteration scope discarded x++ and the snapshot
reset to 0). The old code wrote the body's values back to the LOOP_INIT slot via
update(), which corrupted closures created in the initializer (for (let i = 0, f = () => i; …) must keep returning 0); the rewrite threads values through an
explicit carry list, never resolving back to the captured LOOP_INIT slots. Also
fixed: loopVarNames collected initializer identifiers (for (let i = digits.length - 1; …) wrongly captured digits) — now only binding targets.
Slice delta (test/language/statements/**): +7 PASS, 0 regressions (4
continue/ timeout hangs + 3 let/for closure-scope tests). Pinned by
SpecPinTest.forLet_*.
String iterator splits surrogate pairs — DONE. IterUtils.stringIterator
now walks code-points (codePointAt / Character.charCount) per spec
§22.1.5.1, so for-of over a string with astral chars / emoji yields one
element per code point. test262 for-of/string-astral.js now passes.
Array.prototype.values() returns raw List — DONE. Now returns a
spec Array Iterator object via
IterUtils.toIteratorObject(IterUtils.listIterator(...)), so
arr.values().next() works. listIterator exposed package-private.
test262 Array/prototype/values/{iteration,returns-iterator,returns-iterator-from-object}.js
now pass; JsArrayTest.testArrayApi updated to spec iterator semantics.
Note: keys() / entries() still return raw List (same class of bug,
lower-value — arr.keys().next() is rare); apply the same fix when a
workload surfaces it.
Object-literal spread of arbitrary expressions — DONE. {...fn()} /
{...obj.method()} / {...{x:1}} now parse: object_elem() parses
expr(-1, true) after ... (mirrors array_elem). evalLitObject
evaluates the operand and merges own-enumerable props via spreadInto
(Map / JsArray index keys / String code-unit keys / null+undefined no-op),
which also fixed the latent {...array} / {...string} cases.
EvalTest.testObjectLiteralSpread covers it.
.length / .name rollout to remaining prototypes —
JsBuiltinMethod infra in place; most residual name.js fails are
Symbol-gated.
RegExp named-group capture access (result.groups) — DONE.
JsRegex.exec / JsStringPrototype.match / matchAll now attach a
spec-shaped groups object (null prototype; name→value, undefined for
non-participating groups, undefined when the pattern has no named
groups); function replacers receive the trailing groups arg per spec
§22.1.3.18. Names derived once at construction via JsRegex.groupNames
(scans the source for (?<name>, skipping (?<=/(?<! lookbehind and
escaped/char-class contexts). feature: regexp-named-groups skip
removed. Slice delta (run-2026-05-30-003211 vs …-001414): +12
PASS, 0 regressions, covered in JsRegexTest.testNamedGroups*.
Residual named-groups/** tail (still failing, separate concerns):
group-name early-error validation ((?<__proto__>…) / (?<_>…)
should SyntaxError — engine accepts; part of the parser-tightening
sweep), the Symbol.replace/match protocol (Symbol-gated), and one
global functional-replace argument-ordering case
(named-groups/functional-replace-global.js — «badc» vs «bacd»;
worth a focused look, likely pre-dates this change).
Destructuring BoundNames early-error walk — DONE. JsParser now has a
spec-shaped BoundNames walk over binding patterns (collectBoundNames +
collectObjectElemBoundNames + collectBindingBoundNames) that mirrors the
binding structure — {a: x = y} binds only {x}, so keys / defaults / renamed
targets do not false-positive (verified: 0 regressions across
test/language/**). Wired into checkFormalParameters (arrow params and
non-simple parameter lists: duplicate BoundNames always SyntaxError; plain
duplicate simple params in a sloppy non-arrow fn stay legal) and a new
checkCatchParameter (duplicate catch BoundNames always SyntaxError;
eval/arguments bound in any pattern under strict). try/early-catch-duplicates.js
un-skipped → PASS. Slice delta: +17 PASS, 0 regressions (13 arrow, 2
function, 1 object-method, 1 catch). Pinned by SpecPinTest.dupParams_* / boundNames_mirrorStructure_noFalsePositive / dupCatchParam_* / evalArguments_boundInsidePattern_strictOnly. Residual (deferred): lexical
duplicate-BoundNames for let/const patterns (let {a,a}=… — distinct
rule; VAR_DECL doesn't carry the let-vs-var distinction), and object-method
simple-param dup in sloppy code (rare).
Cleanup residuals — occasional "null" NPE paths, IllegalName
JDK lambda leak, Java heap space OOM in array-slice paths. Built-in
probe (2026-05-30) added concrete principle-#2 leaks to chase:
String.prototype.replaceAll/endsWith throwing Java Range […) out of bounds instead of behaving/throwing-as-JS; RegExp exec/test
surfacing Cannot invoke Object.toString() because args[N] is null on
null args + one catastrophic-backtracking Timeout
(RegExp/.../S15.10.2.8_A3_T17.js); Array at near-2³³ lengths leaking
Index out of bounds / Requested array size exceeds VM limit / heap
(unshift/splice/reverse). All confined to edge/pathological inputs.
Tracked but un-scheduled. Each item: a one-line what + why parked + file pointer. For how the subsystem is shaped, read the file. For spec invariants worth honoring, see JS_ENGINE.md § Spec Invariants.
"use strict" activates the spec runtime flips: this→undefined in
plain calls, ReferenceError on implicit-global assign, and TypeError
on write-to-frozen / read-only / getter-only / non-extensible and
delete of non-configurable. Strictness is lexical, cached on
JsFunctionNode.strict, threaded via CoreContext.strict. See
JS_ENGINE.md § Strict Mode Policy
for the flip table; pinned by SpecPinTest.strict_*. The parser now also
tracks lexical strictness (JsParser.checkStrictEarlyErrors, a strict-gated
post-parse walk: program prologue → function-body prologue → always-strict
class bodies) and raises parse-phase SyntaxError for legacy/non-octal-decimal
literals (0755/08), eval/arguments as assign/update target or as a
function-name / param / var-binding name, and duplicate simple params.
Pinned by SpecPinTest.strict_octalLiteral* / *EvalOrArguments* / *duplicateParameters* / *classBodyIsAlwaysStrict* / *parenthesizedDirective*. The runner prepends a
strict directive for flags: [onlyStrict] (Test262Runner.evaluate), the
BoundNames walk over binding patterns landed (collectBoundNames), and the
flag: onlyStrict skip is removed — measured 282 PASS / 146 FAIL, 0
regressions (lang pass 5433 → 5715). Remaining (the 146 residual, now visible
in probes): (1) ~99 negative parse-phase early errors — function-declaration in
statement position (if (x) function f(){}), block-scope function-decl rules; (2)
~16 runtime SyntaxError; (3) ~14 strict-assignment runtime TypeError
(arguments-object write guards). with-statement early error deferred
(path-skipped, lexes as a call). Two known un-enforced parser corners: a
"use strict" prologue inside a non-simple-parameter-list function is itself an
early error; lexical duplicate-BoundNames for let/const patterns (let {a,a}=…).feature: Promise,
async-functions, Symbol.asyncIterator). karate-js is synchronous.
Viable path: sync subset first — Promise as eager thenable,
async function runs sync, await sync-unwraps.class declarations + expressions parse and evaluate: constructor, instance
methods, static methods, get/set accessors, computed method names,
default-constructor synthesis, always-strict bodies, constructor-without-new
TypeError, extends + super(...) + super.method(), and public
instance + static fields (x = 1 / static n = …, computed names, ASI,
enumerable own props, derived-class fields init after super() —
JsFunctionNode.instanceFields + Interpreter.runInstanceFieldInitializers). Desugared at eval
time onto the existing constructor-function + prototype machinery
(Interpreter.evalClassExpr → constructor JsFunctionNode whose .prototype
holds the methods; statics on the constructor; methods/accessors non-
enumerable). extends links both chains: Child.__proto__ = Parent (static
inheritance + the super(...) target) and Child.prototype.__proto__ = Parent.prototype (instance inheritance). super dispatch uses a
JsFunctionNode.homeObject ([[HomeObject]]) + a CoreContext.activeFunction
seam set per non-arrow call (arrows inherit their defining method's): a
super.m() reads off homeObject.getPrototype() with this=current
receiver; super(...) runs the parent constructor against the instance under
construction (Interpreter.runSuperConstructor — no invokeCallable
refactor needed, since the derived instance is created normally and super()
only initializes it). extends Error/built-ins works via a pragmatic
copy-own-props shim. Public fields ride defineOwn/putMember (enumerable,
unlike the non-enumerable methods); computed field names are resolved once at
class-definition time, the value per instance. New tokens
CLASS/EXTENDS/SUPER + NodeTypes
CLASS_EXPR/CLASS_METHOD/CLASS_FIELD/SUPER_EXPR (CLASS_METHOD also
carries fields — eval distinguishes by the trailing FN_EXPR). Covered by
JsClassTest (44 cases). Remaining conformance tail (deferred): private
#x fields/methods, generator/async methods, decorators, static-init blocks,
class early-errors, object-literal-method super (needs object
[[HomeObject]]), two super edge cases (this-TDZ before super(), super()
return-override), numeric/string-literal method-name canonicalization
(get 0x10(){} → key "16"; shared with object literals' NUMBER-key path),
escaped-keyword method names. Most have existing feature:-tag skips
(class-fields-private / class-methods-private / generators /
async-functions / decorators); see the Skip list note for
the path-skip un-skip plan.Benchmark-gated or coordinated with other work.
JsParser.parse ran
three full post-parse traversals (validateEarlyErrors,
validateCoverInitializedNames, checkStrictEarlyErrors); a JFR profile showed
the three walks at ~13% of CPU on both EngineBenchmark and
RealisticBenchmark — as costly as the entire interpreter. They are now a
single descent: earlyErrors(node, strict, inPattern) threads the two pieces of
top-down state (strict, inPattern) and calls per-node helpers
(earlyErrorNodeChecks + the inlined CoverInitializedName/rest-element checks +
strictNodeChecks, which returns the propagated childStrict). Per-node check
order mirrors the former pass order so multi-error messages are unchanged.
Behavior-preserving: test/language/** FAIL set byte-for-byte identical
before/after (5849 PASS / 2397 FAIL, 0 regressions), all 1167 unit tests + 2235
karate-core tests green. Perf: EngineBenchmark array 1.41→1.33 ms / object
0.62→0.57 ms (+6.7% iters); RealisticBenchmark 68.6→62.3 µs/feature. This is now
the single seam for new parse-phase early errors — the dominant MissingParseError
backlog (escaped-keyword, undefined labels, new.target/super placement,
getter/setter arity, regex group-name, the non-simple-param "use strict" corner)
should each be added as another per-node helper in earlyErrors, never another
whole-tree walk. Reference table updated in
JS_ENGINE.md § Performance Benchmarks.Prototype.toMap() rebuilds per call — memoize on slot-map mod
stamp or expose a non-materializing iterator. Defer until benchmark
shows it matters.HOLE → tombstone full elimination. Sparse-array storage rework;
pair with parser in support. Pinned in SpecPinTest. ~6–8 h.JsArray Java-interop seams —
iterator() / toArray() / subList() / contains() / indexOf() /
lastIndexOf() route raw; only get(int) translates HOLE→null.
Centralize on one unwrap helper. ~30 min. Pairs with above.PropertyKey abstraction. Symbol prep — YAGNI until Symbol lands.Arguments → spec exotic Arguments object. Cached JsArray
today; missing arguments.callee (strict TypeError), non-strict
alias-to-formal-parameters, and [object Arguments] toStringTag.
Subclass when a workload demands.CreateDataPropertyOrThrow + ArraySpeciesCreate. Array
result-allocation (slice / concat / splice / map / filter /
flat / flatMap) bypasses spec sequence; depends on Symbol.species.
Defer until Symbol.Observably non-spec; pick up when the owning slice surfaces them.
CoreContext redeclaration check (identifier 'X' has already been declared,
not the parser) fires where the spec wants a fresh declarative environment.
(1) A switch CaseBlock needs its own block environment so let x inside a
case does not collide with an outer let x (statements/switch/scope-lex-close-case.js).
(2) Indirect (0,eval)('const x…') must evaluate in a NewDeclarativeEnvironment
off the global env, so the eval'd lexical binding does not collide with an outer
global const x (eval-code/indirect/lex-env-distinct-{const,let}.js). Both are
runtime scoping gaps, independent of the parse-phase early-error work.JsArray.handleLengthAssign strict TypeError on non-writable length.
Strict-mode plumbing has landed (CoreContext.strict), but the length
write still routes through handleLengthAssign(value, ctx) with no strict
arg — PropertyAccess.setByName special-cases "length" before the
strict-aware putMember, so 'use strict'; arr.length = 0 on a
non-writable length is still a silent no-op. Thread strict into
handleLengthAssign to finish the flip; everyday code doesn't hit it.ToObject for non-empty string descriptor sources — short-circuits
to TypeError (correct end-state), skips wrapper pipeline.JsArray.jsEntries vs [[OwnPropertyKeys]] asymmetry —
jsEntries is indices only; for-in / Object.keys /
defineProperties want indices + named. JsObjectConstructor.ownKeys
works around it. Split into arrayEntries(ctx) / ownEntries(ctx)
when a 4th caller surfaces.ToPropertyKey no-ctx callers — JsObjectConstructor.hasOwn and
getOwnPropertyDescriptor still on the no-ctx path. Migrate when a
workload passes non-string keys.JsArray.list.size() — high-index
accessor via defineProperty is missed by jsEntries. Current
workaround: defineOwnAccessor HOLE-pads. Real fix: merge integer-index
namedProps into Phase 1. Pairs with HOLE elimination.JsGlobalThis two-store reads — data in BindingsStore,
accessors in JsObject.props. Extend BindingSlot with accessor
side-table OR commit to a unified two-store contract. ~2 h.Symbol.toPrimitive not dispatched — matches minimal Symbol
surface; fix with Symbol.(0, fn)() indirect-call this-binding — comma should drop
reference base (→ this = undefined); today falls through to
globalThis. Audit evalCallExpr for the parenthesized-comma case.etc/run.sh now compiles the runner. exec:java does not
trigger compilation, so for a full cycle the runner ran stale target/classes
and edits to Test262Runner (the onlyStrict strict-prepend) silently never
took effect — the prepend measured as a no-op (71/357) until a test-compile
step was added before exec:java. Lesson for harness edits: changes under
karate-js-test262/src need the module recompiled; only karate-js engine
changes are picked up by the install step alone.Expectations.java /
Test262Metadata.java — breaks on # in quoted reasons, block scalars).--resume echoes records for deleted / now-SKIP'd tests — gate or
rename to --resume-crash-only.HarnessLoader (~50k re-parses per run).ResultRecord (currently wired
and discarded by evaluate(...)).phase: resolution (module-resolution) negatives conflated with
runtime — latent (modules skipped).$262 surface stubs (AbstractModuleSource, IsHTMLDDA,
agent.*) — add when a feature unblocks.Thread.interrupt(). Revisit when per-test cost grows.Test262Runner.readHeadSha walks parent chain — prefer
git rev-parse HEAD or --karate-sha.target/test262/results.jsonl once engine churn slows.All commands run from karate-js-test262/ (the runner resolves
etc/expectations.yaml and test262/ relative to cwd). Use -f ../pom.xml
so Maven finds the parent reactor. After any change under karate-js/,
re-install it first — the runner uses the karate-js jar from your local
Maven repo, not from the reactor.
etc/run.sh does install + run (+ HTML on --full):
cd karate-js-test262
etc/fetch-test262.sh # first time only — shallow clone
etc/run.sh # dev mode, full suite
etc/run.sh --only 'test/language/**' --max-duration 300000 # scoped, 5-min cap
etc/run.sh --full # PASS rows + HTML
Each run writes a fresh target/test262/run-<timestamp>/ (the runner
prints the path) containing results.jsonl, results.jsonl.partial,
run-meta.json, progress.log; html/ only with --full. Old runs
are immutable; mvn clean wipes them.
Dev mode (default) keeps results.jsonl to FAIL+SKIP only; the
pass count is in run-meta.json (counts.pass). --full adds PASS
rows (for CI artifacts / audits / HTML).
Liveness sampling (never tail -f — see
Context discipline):
tail -n 1 <run-dir>/progress.log # last heartbeat: processed N pass M fail K skip L
tail -n 5 <run-dir>/progress.log
If you need to invoke the runner or HTML report without etc/run.sh,
read etc/run.sh — it documents the install step and the
-am gotcha (exec:java is a direct goal; with -am the reactor
includes karate-parent, which has no mainClass, and aborts before
this module). Install karate-js separately, then run without -am.
Most-used flags below. Full set + defaults: read main(...) in
Test262Runner.
| Flag | Purpose |
|---|---|
--only <glob> | restrict to matching paths |
--single <path> [-v] [-vv] | run one test, no file writes. -v prints metadata + classification + engine location; -vv adds full source |
--full | write PASS rows (default is FAIL+SKIP only); also gates HTML render in etc/run.sh |
--max-duration <ms> | overall wall-clock cap (default unlimited); writes partial results + prints Aborted: on hit |
--timeout-ms <n> | per-test watchdog (default 10s) |
--run-dir <path> | output dir (default target/test262/run-<ts>/) |
Runs are silent except FAIL lines + periodic [progress]. FAIL lines
on stdout are capped at 20 (footer (… N more FAILs, see results.jsonl)
fires after). [progress] lines emit every 5000 tests or 60 s and are
mirrored to <run-dir>/progress.log. Per-FAIL detail lives only in
JSONL — sample progress.log for liveness, never tail -f
results.jsonl.partial.
The runner uses a single-thread ExecutorService to enforce --timeout-ms
per test. The karate-js engine doesn't poll Thread.interrupt(), so
cancel(true) can't stop the underlying thread. When a timeout fires, the
runner retires the executor (shuts it down, creates a fresh one) so
subsequent tests don't queue behind the stuck thread. Net cost of a genuine
hang: one abandoned daemon thread, one Timeout row in results.jsonl, a few
ms of recreate overhead.
For scripts / agents driving the runner: pass --max-duration <ms> as a
safety net and follow Context discipline.
There is only one concept: SKIP. A test matching any rule in
etc/expectations.yaml is not run and appears as
{"status":"SKIP",...} in results. Everything else is attempted; failures
are failures.
Match order: paths → flags → features → includes. First match wins.
Every entry requires a reason.
Precedence example. A test at test/language/statements/class/foo.js
with flags: [module] and features: [Symbol] is skipped with the module
reason (the flags match fires before features is consulted). If you want
features: [Symbol] to win, don't have a matching flag rule.
Starter set covers Symbol, BigInt, generators, class syntax, Proxy, Reflect,
Promises, async/await, Temporal, TypedArray beyond Uint8Array, WeakRef,
ArrayBuffer, and the suite directories test/intl402/, test/staging/,
test/annexB/. To add a skip: edit the YAML under the right section with a
reason. To remove a skip: delete the entry, re-run the relevant --only
glob, debug failures with --single -v.
Adding a new unimplemented feature. If you hit FAILs for an ES surface
the engine genuinely doesn't implement (e.g. JSON.rawJSON / isRawJSON
from ES2024), add a features: rule with the test262 feature flag name —
not a paths: rule. The feature names match what the tests declare in
their YAML frontmatter (features: [json-parse-with-source]), which is
also what --single -v prints under features:. See existing entries
for the exact shape; precedence rules above still apply.
Two JSONL files during a run:
<run-dir>/results.jsonl.partial — appended per test as results
arrive, flushed per write. Run order, not sorted. Deleted on clean
exit; preserved on abort (--max-duration hit, Ctrl-C, JVM kill).<run-dir>/results.jsonl — canonical output, sorted alphabetically
by path, atomically written at end-of-run (tmp + rename). This is what
tooling reads.Dev mode (default): only FAIL and SKIP rows are written. The pass
count comes from run-meta.json (counts.pass). The Failure triage
and Diff two run-dirs recipes are designed to work without PASS rows.
--full mode: PASS rows are also written, one per attempted test
that didn't fail or get skipped. Use when you need the canonical full
record (CI artifact, deep audit) or want the HTML report (which
etc/run.sh gates on --full).
Example line shape (same in both):
{"path":"test/language/expressions/addition/S11.6.1_A1.js","status":"PASS"}
{"path":"test/.../something.js","status":"FAIL","error_type":"TypeError","message":"foo is not a function"}
{"path":"test/.../bigint-test.js","status":"SKIP","reason":"BigInt not supported"}
(The PASS row only appears in --full mode.)
Error types are classified into:
SyntaxError | TypeError | ReferenceError | RangeError | Error | Timeout | Harness | Unknown by inspecting message prefixes (the engine emits
"TypeError: ..." style messages at most failure sites). The classifier
itself is in ErrorUtils. Two buckets are assigned by the runner (not the
classifier) for negative tests: ExpectedThrow (a non-parse negative test
completed normally) and MissingParseError (a phase: parse negative test
parsed instead of being rejected — the engine is missing that early error; the
code then ran and usually tripped the harness $DONOTEVALUATE() marker). Keeping
MissingParseError distinct stops the unimplemented-early-error backlog from
hiding inside Unknown alongside genuine engine crashes.
# Default: -v gives metadata + classification + location — usually enough
# to find the engine call site without dumping test source into context.
mvn -pl karate-js-test262 -o exec:java \
-Dexec.args="--single <path> -v" 2>&1 | tail -n 40
# Escalate to -vv (full source) only if -v didn't pinpoint the cause:
mvn -pl karate-js-test262 -o exec:java \
-Dexec.args="--single <path> -vv" 2>&1 | tail -n 200
-v prints parsed YAML metadata (description / flags / features / includes
/ negative), the classification, and — if the engine attached a position
— a location: <path>:<line>:<col> line. -vv additionally prints the
full test source. --single does no file writes. No HTML drill-down
page is generated — the details.html report shows path + error_type +
message inline.
Location-line caveat. location: only appears when the engine
itself threw and attached a position. Two common FAIL shapes carry no
location and you should skip straight to reading the relevant built-in
source:
Test262Error: <expectation> — the harness assertion fired
inside the test's own JS, not the engine. Find the failure inside
the test source (or look at what the test is asserting) and trace
back to the engine method that built the wrong value.Unknown: java.lang.StackOverflowError / NullPointerException /
other Java exceptions — uncaught Java throwables surface without a
JS-level position. Grep the stack for the engine class.Compact rollups over results.jsonl. All return tens of lines, not
thousands. Use these instead of reading the raw JSONL when scoping a
slice or hunting for clusters.
RD=target/test262/run-<ts> # the run-dir to analyze
JSONL=$RD/results.jsonl # use .partial during an in-progress run
# PASS / FAIL / SKIP counts.
jq -r .status "$JSONL" | sort | uniq -c
# FAIL histogram by error_type — which classifier buckets dominate.
jq -r 'select(.status=="FAIL").error_type' "$JSONL" \
| sort | uniq -c | sort -rn
# Top 20 FAIL message clusters (numbers normalized so near-duplicates merge).
jq -r 'select(.status=="FAIL").message' "$JSONL" \
| sed 's/[0-9][0-9]*/N/g' \
| sort | uniq -c | sort -rn | head -20
# FAIL counts per slice (two path components deep).
jq -r 'select(.status=="FAIL") | .path | split("/")[1:3] | join("/")' "$JSONL" \
| sort | uniq -c | sort -rn | head -30
# One example failing path per error_type — for `--single -v` follow-up.
jq -r 'select(.status=="FAIL") | "\(.error_type)\t\(.path)"' "$JSONL" \
| sort -u -k1,1 | head -20
# All FAILs under a specific slice — bounded with head, never raw.
jq -r 'select(.status=="FAIL" and (.path|startswith("test/language/statements/for-of"))) | .path' \
"$JSONL" | head -30
FAIL-set difference — works in dev mode (no PASS rows needed). Capped
output: counts + first 10 of each list + per-slice cluster breakdown.
Assumes both runs covered the same --only scope (recorded in each
run-meta.json if you want to verify).
PREV=target/test262/run-<prev>/results.jsonl
CURR=target/test262/run-<curr>/results.jsonl
python3 - "$PREV" "$CURR" <<'PY'
import json, sys, collections
def fails(p):
return {json.loads(l)['path'] for l in open(p) if json.loads(l)['status']=='FAIL'}
prev, curr = fails(sys.argv[1]), fails(sys.argv[2])
regr = sorted(curr - prev) # newly failing — likely regressions
fixed = sorted(prev - curr) # newly passing (or removed/skipped)
def by_slice(paths):
c = collections.Counter('/'.join(p.split('/')[1:3]) for p in paths)
return c.most_common(10)
def show(label, paths):
print(f'{label}: {len(paths)}')
for p in paths[:10]: print(f' {p}')
if len(paths) > 10: print(f' ... {len(paths)-10} more')
if paths: print(f' by slice: {by_slice(paths)}')
show('Regressed (newly FAIL)', regr)
show('Fixed (no longer FAIL)', fixed)
PY
Per-session safety check against the slice baseline. If Regressed is
non-zero, drill into a couple of representative paths with
--single -v — do not paste the full list into context. Note: a
path appearing under "Fixed" could mean it now PASSes or that it was
moved to SKIP / removed from scope; cross-check with the SKIP set if
ambiguous.
For re-probing a slice or triaging a cluster, spawn a general-purpose
sub-agent and require a digest. The sub-agent reads the full output;
you receive only the summary.
Prompt template (copy, fill in <glob>, paste into Agent):
Run
etc/run.sh --only '<glob>' --max-duration 600000fromkarate-js-test262/. After it completes, query the run-dir'sresults.jsonl(the runner printsRun dir: <path>on completion) and return ≤200 words:
- PASS / FAIL / SKIP counts for the slice.
- Top 3 FAIL clusters (group by error_type + normalized message prefix). For each: count, one example path, one example message.
- Anything surprising: Timeouts, NPE-shaped errors,
Java heap space,IllegalNamelambda leaks, parse-vs-runtime classification gaps.Do not paste raw FAIL lines, full test source, or JSONL contents. If you need to inspect a specific test, use
--single -vand quote ≤3 relevant lines.
Use it for: slice probes, cluster triage, "did my engine change regress anything" checks, post-edit slice re-runs. Skip for small targeted lookups (one test, one symbol) — run those inline.
The conformance suite allocates a fresh Engine per test (~50k tests); small
regressions compound into minutes of wall time. Prefer profile mode — the
30 s warm loop is JIT-stable and directly comparable to the
reference table in JS_ENGINE.md.
mvn -pl karate-js -q test-compile
# Profile mode (30 s warm loop; JIT-stable, ~16k iterations averaged).
java -cp "karate-js/target/classes:karate-js/target/test-classes:$(find ~/.m2/repository -name 'slf4j-api-*.jar' | head -1)" \
io.karatelabs.parser.EngineBenchmark profile
# Fast mode (median of 10 cold runs) — noisy, gut-check only
java -cp "…same classpath…" io.karatelabs.parser.EngineBenchmark
If averages move >±10%, understand why before merging. If unavoidable
(correctness > speed), update the reference table in JS_ENGINE.md in the
same commit.
Edit TEST262_SHA=... at the top of etc/fetch-test262.sh, delete the local
test262/ directory, re-run the script. All subsequent runs use the new
commit. Coordinate bumps with whoever else is iterating — the suite itself
evolves.
| Symptom | Likely cause / fix |
|---|---|
expectations file not found: etc/expectations.yaml | Wrong directory. cd karate-js-test262 first. |
test262 directory not found: test262 | Haven't run etc/fetch-test262.sh yet. |
Failed to execute goal ... exec-maven-plugin ... on project karate-parent: 'mainClass' ... missing | Used -am with exec:java. Don't — install karate-js separately and run without -am. |
| Engine change has no effect on test262 output | Forgot mvn ... -pl karate-js -o install -DskipTests. The runner uses the local Maven repo jar, not the reactor classpath. |
Test262Report says --run-dir <path> is required | Pass the path the runner printed on completion: --run-dir target/test262/run-<ts>. etc/run.sh does this for you. |
| Where's my report? | The runner prints Run dir: <path> on completion. Look in <path>/html/index.html. Each invocation creates a fresh run-<timestamp>/ dir; nothing is overwritten. |
| Suite hangs on one test | Infinite loop; watchdog kicks in at --timeout-ms. The inner executor is retired and replaced; a genuine hang leaks one daemon thread and keeps going. Bisect with --only, or add --max-duration as a safety net. |
| Driving from a script that must not block | Pass --max-duration <ms>. On hit, partial results written and Aborted: replaces Summary:. |
| Tests that used to pass now fail | Run EngineBenchmark too — perf regression sometimes manifests as timeouts before correctness. |
target/test262/ growing unbounded across iteration sessions | No auto-pruning; each run writes its own run-<ts>/. mvn clean wipes the lot. |
karate-js-test262/
├── TEST262.md # this file
├── pom.xml # Maven module (deploy explicitly disabled)
├── etc/
│ ├── expectations.yaml # declarative SKIP list (committed)
│ ├── fetch-test262.sh # shallow clone of tc39/test262 at pinned SHA
│ └── run.sh # one-shot: install + run + HTML
├── src/main/java/…/test262/ # runner + report + helpers
├── src/test/java/…/test262/ # unit tests for the harness itself
├── src/main/resources/report/ # HTML/CSS/JS templates for the report
├── src/main/resources/logback.xml # logger config (file appender → target/test262/)
├── test262/ # [gitignored] the cloned suite
└── target/test262/ # [gitignored] one subdir per run
└── run-<timestamp>/ # self-contained per-run dir
├── results.jsonl # per-test pass/fail/skip, sorted by path (end of run)
├── results.jsonl.partial # live feed — appended per test, flushed; deleted on clean exit, kept on abort
├── run-meta.json # per-run context (test262 SHA, karate-js ver+SHA, JDK, OS, started/ended, counts)
├── progress.log # banner + [progress] lines + final summary
└── html/ # two-file static HTML report
├── index.html # tree + per-slice summary tiles
└── details.html # full per-test list with search + status filter
Each run is self-contained and immutable; old runs persist until mvn clean.
The CI workflow uploads target/test262/ (parent) as a single artifact.
A workflow_dispatch-only workflow at
.github/workflows/test262.yml runs
etc/fetch-test262.sh + the runner + the report, and uploads the whole
target/test262/ directory as a single artifact. Never triggered
automatically — kick off from the Actions tab when you want a fresh run. Two
inputs (only and timeout_ms) default to full-suite / 10 s per test.
The module's pom.xml sets maven.deploy.skip=true / gpg.skip=true /
skipPublishing=true so the release workflow does not publish this module to
Maven Central.