docs/design/coreclr/jit/lsra-throughput.md
There are a number of ways in which the current implementation of linear scan register allocation (LSRA) is sub-optimal:
TreeNodeInfoInit pass must be separate.
Further investigation is needed.Lowering and communicated to the register allocator via a gtLsraInfo field on the node, that
is otherwise unused, and is basically duplicated when the RefPositions are built for the node.
Lowering, where existing transformations already
take into account the parent context.
CodeGen::genCodeForTreeNode()), and then additionally when considering whether the operands of the current
node are contained.RefPositions, though see below.Lowering, and fundamentally requires more
space (it must specify register masks for sources, destination and any internal registers). In addition, the requirement for a new
register definition (the destination of the node, or any internal registers) is independent of the parent, so this could be done in
LinearScan::buildRefPositionsForNode() without having to do a dual traversal, unlike the identification of contained nodes.RefPositions, they are traversed in order to set the last use bits.
This is done separately because there are currently inconsistencies between the gtNext/gtPrev links and the actual order of codegen.
Once this is resolved, the lastUse bits should be set prior to register allocation by the liveness pass (#7256).RefPositions are all created prior to beginning the register allocation pass. However, they are only really needed in advance
for the lclVars, which, unlike the "tree temps", have multiple definitions and may be live across basic blocks.
The RefPositionss for the tree temps could potentially be allocated on-the-fly, saving memory and probably improving locality (#7257).LinearScan::tryAllocateFreeReg() and in LinearScan::allocateBusyReg() could be
short-circuited when a register is found that has the best possible score. Additionally, in the case of MinOpts, it could potentially
short-circuit as soon as a suitable candidate is found, though one would want to weight the throughput benefit against the code
quality impact.My original plan for this was to combine all of the functionality of the current TreeNodeInfoInit pass with the building of RefPositions,
and eliminate gtLsraInfo.
The idea was to later consider pulling the containment analysis back into the first phase of Lowering.
However, after beginning down that path (extracting the TreeNodeInfoInit methods into separate lsra{arch}.cpp files), I realized that
there would be a great deal of throw-away work to put the containment analysis into LinearScan, only to potentially pull it out later.
Furthermore, the representation of containedness is not very clean:
Lowering first communicates this as a combination of implicit knowledge of a node's behavior and its gtLsraInfo.dstCount.CodeGen, it is determined by combining similar implicit node characteristics with the presence or absence of a register.I propose instead to do the following:
GTF_REG_VAL for the non-LEGACY_BACKEND. Doing so will require some additional cleanup,
but in the process a number of hacks can be eliminated that are currently there to workaround the fact that the emitter was designed
to work with a code generator that dynamically assigned registers, and set that flag when the code had been generated to put it in a
register, unlike the RyuJIT backend, which assigns the registers before generating code.REG_UNK is assigned by Lowering when a register is required.REG_OPT is assigned by Lowering when a register is optional at both definition and use.REG_OPT_USE is assigned by Lowering when a register is required at the definition, but optional at the use.REG_OPT_DEF, but that could be added as well.IsContained().It may be more effective to use the extra bit for an actual GTF_CONTAINED flag, and that is something we might want to consider
eventually, but initially it is easier to simplify the containedness check using GTF_TREE_ROOT without having to change all the
places that currently mark nodes as contained.
Once we've changed containedness to use the above representation, we can move the code to set it into the first pass of Lowering.
There are likely to be some phase ordering challenges, but I don't think they will be prohibitive.
Issue #7225.
After the containedness changes above, all that remains to communicate via gtLsraInfo is the register requirements.
This step would still use the TreeNodeInfo data structure and the TreeNodeInfoInit() methods, but they would be called as
each node is handled by LinearScan::buildRefPositionsForNode().