docs/project/jit-testing.md
We would like to ensure that the CoreCLR contains sufficient test collateral and tooling to enable high-quality contributions to RyuJit or LLILC's JIT.
JIT testing is somewhat specialized and can't rely solely on the general framework tests or end to end application tests.
This document describes some of the work needed to bring JIT existing tests and technology into the CoreCLR, and touches on some areas as that open for innovation.
We expect to evolve this document into a road map for the overall JIT testing effort, and to spawn a set of issues in the CoreCLR and LLILC repos for implementing the needed capabilities.
Below are some broad task areas that we should consider as part of this plan. It seems sensible for Microsoft to focus on opening up the JIT self-host (aka JITSH) tests first. A few other tasks are also Microsoft specific and are marked with (MS) below.
Other than that the priority, task list, and possibly assignments are open to discussion.
JITSH is a set of roughly 8000 tests that have been traditionally used by Microsoft JIT developers as the frontline JIT test suite.
We'll need to subset these tests for various reasons:
We have done an internal inventory and identified roughly 1000 tests that should be straightforward to port into CoreCLR, and have already started in on moving these.
We need to ensure that the CoreCLR repo contains a suitably hookable test script. Core testing is driven by xunit but there's typically a wrapper around this (run.cmd today) to facilitate test execution.
The proposal is to implement platform-neutral variant of run.cmd that contains all the existing functionality plus some additional capabilities for JIT testing. Initially this will mean:
In general we want JIT tests to be built from sources. But given the volume of tests it can take a significant amount of time to compile those sources into assemblies. This in turn slows down the ability to test the JIT.
Given the volume of tests, we might reach a point where the default CoreCLR build does not build all the tests.
So it would be good if there was a regularly scheduled build of CoreCLR that would prebuild a matching set of tests and make them available.
We need some way to run ILASM. Some suggestions here are to port the existing ILASM or find some equivalent we could run instead. We could also prebuild IL based tests and deploy as a package. Around 2400 JITSH tests are blocked by this.
There are also some VB tests which presumably can be brought over now that VB projects can build.
Native/interop tests may or may not require platform-specific adaption.
devBVT is a broader part of CLR SelfHost that is useful for second-tier testing. Not yet clear what porting this entails.
We should be able to directly leverage tests provided in peer repo suites, once they can run on top of CoreCLR. In particular libraries and Roslyn test cases could be good initial targets.
Note LLILC is currently working through the remaining issues that prevent it from being able to compile all of Roslyn. See the "needed for Roslyn" tags on the open LLILC issues.
Similar to the above, as other projects are able to host on CoreCLR we can potentially use their tests for JIT testing.
Tools developed to test JVM Jits might be interesting to port over to .Net. Suggestions for best practices or effective techniques are welcome.
For Jit testing we'll need various quantitative assessments of Jit behavior:
There will likely be work going on elsewhere to address some of these same measurement capabilities, so we should make sure to keep it all in sync.
For LLILC, implementing support for crossgen would provide the ability to drive lots of IL through the JIT. There is enough similarity between the JIT and crossgen paths that this would likely surface issues in both.
Alternatively one can imagine simple test drivers that load up assemblies and use reflection to enumerate methods and asks for method bodies to force the JIT to generate code for all the methods.
The value of existing test assets can be leveraged through various stress testing modes. These modes use non-standard code generation or runtime mechanisms to try an flush out bugs.
We should invest in things like random program or IL generation tools.