JIT Testing

We would like to ensure that the CoreCLR contains sufficient test collateral and tooling to enable high-quality contributions to RyuJit or LLILC's JIT.

JIT testing is somewhat specialized and can't rely solely on the general framework tests or end to end application tests.

This document describes some of the work needed to bring JIT existing tests and technology into the CoreCLR, and touches on some areas as that open for innovation.

We expect to evolve this document into a road map for the overall JIT testing effort, and to spawn a set of issues in the CoreCLR and LLILC repos for implementing the needed capabilities.

Requirements and Assumptions

It must be easy to add new tests.
Tests must execute with high throughput. We anticipate needing to run thousands of tests to provide baseline level testing for JIT changes.
Tests should generally run on all supported/future chip architectures and all OS platforms.
Tests must be partitionable so CI latency is tolerable (test latency goal TBD).
Tests in CI can be run on private changes (currently tied to PRs; this may be sufficient).
Test strategy harmonious with other .NET repo test strategies.
Test harness behaves reasonably on test failure. Easy to get at repro steps for subsequent debugging.
Tests must allow fine-grained inspection of JIT outputs, for instance comparing the generated code versus a baseline JIT.
Tests must support collection of various quantitative measurements, eg time spent in the JIT, memory used by the JIT, etc.
For now, JIT test assets belong in the CoreCLR repo.
JIT tests use the same basic test xunit harness as existing CoreCLR tests.
JIT special-needs testing will rely on extensions/hooks. Examples below.

Tasks

Below are some broad task areas that we should consider as part of this plan. It seems sensible for Microsoft to focus on opening up the JIT self-host (aka JITSH) tests first. A few other tasks are also Microsoft specific and are marked with (MS) below.

Other than that the priority, task list, and possibly assignments are open to discussion.

(MS) Bring up equivalent of the JITSH tests

JITSH is a set of roughly 8000 tests that have been traditionally used by Microsoft JIT developers as the frontline JIT test suite.

We'll need to subset these tests for various reasons:

Some have shallow desktop CLR dependence (e.g. missing cases in string formatting).
Some have deep desktop CLR dependence (testing a desktop CLR feature that is not present in CoreCLR).
Some require tools not yet available in CoreCLR (ilasm in particular).
Some test windows features and won't be relevant to other OS platforms.
Some tests may not be able to be freely redistributed.

We have done an internal inventory and identified roughly 1000 tests that should be straightforward to port into CoreCLR, and have already started in on moving these.

Test script capabilities

We need to ensure that the CoreCLR repo contains a suitably hookable test script. Core testing is driven by xunit but there's typically a wrapper around this (run.cmd today) to facilitate test execution.

The proposal is to implement platform-neutral variant of run.cmd that contains all the existing functionality plus some additional capabilities for JIT testing. Initially this will mean:

Ability to execute tests with a JIT specified by the user (either as alt JIT or as the only JIT)
Ability to pass options through to the JIT (eg for dumping assembly or IR) or to the CoreCLR (eg to disable use of ngen images).

Cache prebuilt test assets

In general we want JIT tests to be built from sources. But given the volume of tests it can take a significant amount of time to compile those sources into assemblies. This in turn slows down the ability to test the JIT.

Given the volume of tests, we might reach a point where the default CoreCLR build does not build all the tests.

So it would be good if there was a regularly scheduled build of CoreCLR that would prebuild a matching set of tests and make them available.

Round out JITSH suite, filling in missing pieces

We need some way to run ILASM. Some suggestions here are to port the existing ILASM or find some equivalent we could run instead. We could also prebuild IL based tests and deploy as a package. Around 2400 JITSH tests are blocked by this.

There are also some VB tests which presumably can be brought over now that VB projects can build.

Native/interop tests may or may not require platform-specific adaption.

(MS) Port the devBVT tests.

devBVT is a broader part of CLR SelfHost that is useful for second-tier testing. Not yet clear what porting this entails.

Leverage peer repo test suites.

We should be able to directly leverage tests provided in peer repo suites, once they can run on top of CoreCLR. In particular libraries and Roslyn test cases could be good initial targets.

Note LLILC is currently working through the remaining issues that prevent it from being able to compile all of Roslyn. See the "needed for Roslyn" tags on the open LLILC issues.

Look for other CoreCLR hosted projects.

Similar to the above, as other projects are able to host on CoreCLR we can potentially use their tests for JIT testing.

Porting of existing suites/tools over to our repos.

Tools developed to test JVM Jits might be interesting to port over to .Net. Suggestions for best practices or effective techniques are welcome.

Bring up quantitative measurements.

For Jit testing we'll need various quantitative assessments of Jit behavior:

Time spent jitting
Speed of jitted code
Size of jitted code
Memory utilization by the jit (+ leak detection)
Debug info fidelity
Coverage ?

There will likely be work going on elsewhere to address some of these same measurement capabilities, so we should make sure to keep it all in sync.

Bring up alternate codegen capabilities.

For LLILC, implementing support for crossgen would provide the ability to drive lots of IL through the JIT. There is enough similarity between the JIT and crossgen paths that this would likely surface issues in both.

Alternatively one can imagine simple test drivers that load up assemblies and use reflection to enumerate methods and asks for method bodies to force the JIT to generate code for all the methods.

Bring up stress testing

The value of existing test assets can be leveraged through various stress testing modes. These modes use non-standard code generation or runtime mechanisms to try an flush out bugs.

GC stress. Here the runtime will GC with much higher frequency in an attempt to maximize the dependence on the GC info reported by the JIT.
Internal modes in the JIT to try and flush out bugs, eg randomized inlining, register allocation stress, volatile stress, randomized block layout, etc.

Bring up custom testing frameworks and tools.

We should invest in things like random program or IL generation tools.