docs/design/coreclr/jit/first-class-structs.md
Primary Objectives
Secondary Objectives
In RyuJIT, the concept of a type is very simplistic (which helps support the high throughput of the JIT). Rather than a symbol table to hold the properties of a type, RyuJIT primarily deals with types as simple values of an enumeration. When more detailed information is required about the structure of a type, we query the type system, across the JIT/EE interface. This is generally done only during the importer (translation from MSIL to the RyuJIT IR), during struct promotion analysis, and when determining how to pass or return struct values. As a result, struct types are generally treated as an opaque type (TYP_STRUCT) of unknown size and structure.
In order to treat fully-enregisterable struct types as "first class" types in RyuJIT, we created new types to represent vectors, in order for the JIT to support operations on them:
TYP_SIMD8, TYP_SIMD12, TYP_SIMD16 and (where supported by the target) TYP_SIMD32.Vector2, Vector3, Vector4 and Vector<T>)
types as well as the types used for platform-specific hardware intrinsics ('Vector64<T>, Vector128<T>
and Vector256<T>).SIMD and HWIntrinsic nodes.We had previously proposed to create additional types to be used where struct types of the given size are passed and/or returned in registers:
TYP_STRUCT1, TYP_STRUCT2, TYP_STRUCT4, TYP_STRUCT8 (on 64-bit systems)However, further investigation and implementation has suggested that this may not be necessary.
Rather, storage decisions should largely be deferred to the backend (Lowering and register allocation).
The following transformations need to be supported effectively for all struct types:
Correct and effective code generation for structs requires that the JIT have ready access to information about the shape and size of the struct. This information is obtained from the VM over the JIT/EE interface. This includes:
With the changes from @mikedn in
#21705 Pull struct type info out of GenTreeObj
this information is captured in a ClassLayout object which captures the size and GC layout of a struct type.
The associated ClassLayoutTable on the Compiler object which supports lookup. This enables associating this
this shape information with all struct-typed nodes, without impacting node size.
These struct-typed nodes are created by the importer, but transformed in morph, and so are not encountered by most phases of the JIT:
GT_FIELD: This is transformed to a GT_LCL_VAR by the Compiler::fgMarkAddressExposedLocals() phase
if it's a promoted struct field, or to a GT_LCL_FLD or GT_INDbyfgMorphField()`.
GT_OBJ, so that consistently all struct
nodes, even r-values, have ClassLayout.GT_OBJ nodes represent struct types with a handle, and store a pointer to the ClassLayout object.GT_BLK nodes represent struct types with no GC references, or opaque blocks of fixed size.
GT_OBJ since after
#21705 they are no longer large nodes.GT_STORE_OBJ and GT_STORE_BLK have the same structure as GT_OBJ and GT_BLK, respectively
Data() is op2GT_LCL_FLD nodes, we store a pointer to ClassLayout in the node.GT_LCL_VAR nodes, the ClassLayout is obtained from the LclVarDsc.Structs only appear as rvalues in the following contexts:
On the RHS of an assignment
As a call argument
GT_OBJ, GT_LCL_VAR, GT_LCL_FLD or GT_FIELD_LIST.As an operand to a hardware intrinsic (for TYP_SIMD* only)
GT_HWINTRINSIC node.ClassLayout.After morph, a struct-typed value on the RHS of assignment is one of:
GT_IND: in this case the LHS is expected to provide the struct handle
GT_IND would no longer be used for struct typesGT_CALLGT_LCL_VARGT_LCL_FLDGT_OBJ nodes can also be used as rvalues when they are call arguments
GT_OBJ nodes can be used in any context where a struct rvalue or lvalue might occur,
except after morph when the struct is independently promoted.Ideally, we should be able to obtain a valid CLASS_HANDLE for any struct-valued node.
Once that is the case, we should be able to transform most or all uses of gtGetStructHandleIfPresent() to
gtGetStructHandle().
There are three main phases in the JIT that make changes to the representation of struct nodes and lclVars:
Importer
TYP_SIMD* type. Other struct nodes have TYP_STRUCT.ClassLayout
pointer or an index into the ClassLayout cache.Struct promotion
Global morph
Some promoted structs are forced to stack, and become “dependently promoted”.
Call args
If the struct has been promoted it is morphed to GT_FIELD_LIST
Lowering.If it is passed in a single register, it is morphed into a GT_LCL_FLD node of the appropriate primitive type.
GT_OBJ and would be appropriately transformed in Lowering,
e.g. using GT_BITCAST.If is passed in multiple registers
GT_FIELD_LIST is constructed that represents the load of each register using GT_LCL_FLD.GT_OBJ (or GT_FIELD_LIST if promoted) and would be transformed
to a GT_FIELD_LIST with the appropriate load, assemble or extraction code as needed.Otherwise, if it is passed by reference or on the stack, it is kept as GT_OBJ or GT_LCL_VAR
Lowering, at which time the
liveness information can provide lastUse information to allow a dead struct to be passed
directly by reference instead of being copied.
Related: #4524 Add optimization to avoid copying a struct if passed by reference and there are no
writes to and no reads after passed to a calleeIt is proposed to add the following transformations in Lowering:
GT_FIELD_LIST of GT_LCL_FLD when the struct is non-enregisterable and is passed in multiple
registers.GT_BITCAST when a promoted floating point field of a single-field struct is passed in an integer register.This is a rough breakdown of the work into somewhat separable tasks. These work items are organized in priority order. Each work item should be able to proceed independently, though the aggregate effect of multiple work items may be greater than the individual work items alone.
This includes all copies and IR transformations that are only required to pass or return the arguments as required by the ABI.
Other transformations would remain:
GT_FIELD_LIST creation) required to expose references
to promoted struct fields.This would be done in multiple phases:
Lowering, but retain any "pessimizations"
(e.g. marking nodes as GTF_DONT_CSE or marking lclVars as lvDoNotEnregister)GT_LCL_FLD is currently used to "retype" the struct, change it to use either
GT_LCL_FLD, if it is already address-taken, or to use a GT_BITCAST otherwise.
JIT\Regressions\JitBlue\GitHub_1161),
#7200 Struct getters are generating unnecessary
instructions on x64 when struct contains floats
and #11413 Inefficient codegen for casts between same size types.LocalAddressVisitor::PostOrderVisit() for the GT_RETURN case.Lowering and CodeGen to handle call arguments where the fields of a promoted struct
must be extracted or reassembled in order to pass the struct in non-matching registers. This probably
includes producing the appropriate IR, in order to correctly represent the register requirements.Related issues:
JIT\Regressions\JitBlue\GitHub_1133.Most of the existing places in the code where structs are handled conservatively are marked
with TODO-1stClassStructs. This work item involves investigating these and making the
necessary improvements (or determining that they are infeasible and removing the TODO).
Related:
This would be enabled first by Defer ABI-specific transformations to Lowering. Then the register allocator would consider them as candidates for enregistration.
First, fully enregister pointer-sized-or-less structs only if there are no field accesses and they are not
marked lvDoNotEnregister.
Next, fully enregister structs that are passed or returned in multiple registers and have no field accesses.
Next, when there are field accesses, but the struct is more frequently accessed as a
full struct (e.g. assignment or passing as the full struct), Lowering would expand the field accesses
as needed to extract the field from the register(s).
Related: #10045 Accessing a field of a Vector4 causes later codegen to be inefficient if inlined
fgMorphOneAsgBlockOp should probably be eliminated, and its functionality either moved to
Lowering or simply subsumed by the combination of the addition of fixed-size struct types and
the full enablement of struct optimizations. Doing so would also involve improving code generation
for block copies. See #21711 Improve init/copy block codegen.
This also includes cleanup of the block morphing methods such that block nodes needn't be visited multiple
times, such as fgMorphBlkNode and fgMorphBlkOperand.
These methods were introduced to preserve old behavior, but should be simplified.
These are all marked with TODO-1stClassStructs or TODO-Cleanup in the last case:
The checking at the end of gtNewTempAssign() should be simplified.
When we create a struct assignment, we use impAssignStruct(). This code will, in some cases, create
or re-create address or block nodes when not necessary.
For Linux X64, the handling of arguments could be simplified. For a single argument (or for the same struct
class), there may be multiple calls to eeGetSystemVAmd64PassStructInRegisterDescriptor(), and in some cases
(e.g. look for the TODO-Cleanup in fgMakeTmpArgNode()) there are awkward workarounds to avoid additional
calls. It might be useful to cache the struct descriptors so we don't have to call across the JIT/EE interface
for the same struct class more than once. It would also potentially be useful to save the descriptor for the
current method return type on the Compiler object for use when handling RETURN nodes.
The following issues illustrate some of the motivation for improving the handling of value types (structs) in RyuJIT (these issues are also cited above, in the applicable sections):
#4308 JIT: Excessive copies when inlining
#10879 Unix: Unnecessary struct copy while passing struct of size <=16
#9839 [RyuJIT] Eliminate unnecessary copies when passing structs
GT_LCL_FLD to retype a value that needs
to be passed in a register. It may have been addressed by PR #37745