docs/design/coreclr/jit/Perf-Score.md
A new JIT feature that estimates the dynamic execution cost of a method.
Developers who work on the JIT code base.
Currently JIT developers will generate "Asm Diffs" when making a change to the JIT to see if their change is improving the JIT-ted code. We have a tool called "jit-diffs" which allows a developer to quickly find methods that generate different assembly code for two different JITs.
The jit-analyze" tool will sort this information and provide a list of methods that have the largest decreases and increases in code size. For many methods smaller code size usually means faster execution and larger code size means slower execution. But for complex methods that have loops it is not a very good metric for measuring code quality. For such methods we want the JIT to improve the code generated for the innermost loop and are willing to pay extra costs outside the loop. The total code size for the method does not properly classify regressions or improvements for changes to code inside of loops.
Instead of using total code size as the metric it is better to have the JIT produce an estimate of the dynamic execution cost of a method. This new dynamic execution cost of the method is called the "Perf Score" of the method. The "jit-analyze" tool would also be changed so that instead of using code size it would use the Perf Score to determine the list of methods to display.
When making a Perf change that impacts a Benchmark, you should verify that both the computed Perf Score and the benchmarks execution time improve in a similar way.
It is a very hard problem to get a highly accurate value for what the actual dynamic execution cost of a method is on a particular CPU. The difficulty occurs because all modern hardware executes instructions using a superscalar architecture. This approach partitions each instruction into a series of steps called a pipeline and the CPU manages multiple instruction pipelines each executing concurrently. With modern hardware there can easily be twenty or more instructions executing at one time. In order to accurately compute the actual dynamic execution cost a software model of this process would need to be implemented and it would be large and complex.
This current Perf Score feature does NOT propose to try to accurately model the actual dynamic execution cost. Instead we propose to return an estimate of the dynamic execution cost. We are willing to make several simplifying assumptions that allow us to more quickly and easily return a reasonable estimate to use as the Perf Score.
Hardware vendors, such as Intel and ARM, provide information for assembly language programmers and compiler writers. They typically provide the following details for instructions:
For the Perf Score implementation, we will call the method getInsExecutionCharacteristics and it will return the insExecutionCharacteristics for each instruction. We will use the information returned along with the BasicBlock weight to calculate the weighted execution cost for each instruction. For this implementation we won't try to model the actual def/use latencies and instead will use a simplified model to estimate the instruction latency. If in the future, we decide to add an instruction scheduler phase to the JIT we can revisit this area. I believe that it will be straightforward to add it at that time.
We also want to consider the amount of code generated by the JIT for the method. We do this as follows:
I will modify the "jit-analyze" tool to use the Perf Score to identify the methods that have the largest regressions and improvements.