docs/design/coreclr/jit/jit-call-morphing.md
In C# and IL, and unlike C/C++, the evaluation order of the arguments for calls is strictly defined. Basically this is left to right in C# or the IL instruction ordering for IL.
One issue that must be addressed is the problem of nested calls. Consider Foo(x[i], Bar(y)).
We first must evaluate x[i] and possibly set up the first argument for Foo(). But immediately
after that, we must set up y as first argument of Bar(). Thus, when we evaluate x[i] we will
need to hold that value someplace while we set up and call Bar(). Arguments that contain an
assignment are another issue that we need to address. Most cases of this are rare except for
post/pre increment, perhaps like this: Foo(j, a[j++]). Here j is updated via assignment
when the second arg is evaluated, so the earlier uses of j would need to be evaluated and
saved in a new LclVar.
One simple approach would be to create new single definition, single use LclVars for every argument that is passed. This would preserve the evaluation order. However, it would potentially create hundreds of LclVar for moderately sized methods and that would overflow the limited number of tracked local variables in the JIT. One observation is that many arguments to methods are either constants or LclVars and can be set up anytime we want. They usually will not need a new LclVar to preserve the order of evaluation rule.
Each argument is an arbitrary expression tree. The JIT tracks a summary of observable side-effects
using a set of five bit flags in every GenTree node: GTF_ASG, GTF_CALL, GTF_EXCEPT, GTF_GLOB_REF,
and GTF_ORDER_SIDEEFF. These flags are propagated up the tree so that the top node has a particular
flag set if any of its child nodes has the flag set. Decisions about whether to evaluate arguments
into temp LclVars are made by examining these flags on each of the arguments.
Our design goal for call sites is to create a few temp LclVars as possible, while preserving the order of evaluation rules of IL and C#.
The most important data structure is the GenTreeCall node which represents a single
call site and is created by the Importer when it sees a call in the IL. It is also
used for internal calls that the JIT needs such as helper calls. Every GT_CALL node
should have a GTF_CALL flag set on it. Nodes that may be implemented using a function
call also should have the GTF_CALL flag set on them. The arguments for a single call
site are encapsulated in the CallArgs class. Every call has an instance of this class
in GenTreeCall::gtArgs. CallArgs contains two linked list of arguments: the "normal"
linked list, which can be enumerated via CallArgs::Args, and the "late" linked list,
enumerated via CallArgs::LateArgs.
The normal linked list is a linked list of CallArg structures, in normal argument
order. When the GenTreeCall is first created the late args list is empty and is
set up later when we call fgMorphArgs() during the global Morph of all nodes. The short
explanation of why we need two lists is that we may need to force the correct evaluation
order of arguments and also architecture-specific ways of passing some arguments. See
below and the documentation of fgMorphArgs and AddFinalArgsAndDetermineABIInfo for
more information about late args.
In addition to containing IR nodes, each CallArg entry also contains information about
how it was evaluated and ABI information describing how to pass it.
FEATURE_FIXED_OUT_ARGSAll architectures support passing a limited number of arguments in registers and the
additional arguments must be passed on the stack. There are two distinctly different
approaches for passing arguments of the stack. For the x86 architecture a push
instruction is typically used to pass a stack based argument. For all other architectures
that we support we allocate the lvaOutgoingArgSpaceVar, which is a variable-sized
stack based LclVar. Its size is determined by the call site that has the largest
requirement for stack based arguments. The define FEATURE_FIXED_OUT_ARGS is 1 for
architectures that use the outgoing arg space to pass their stack based arguments.
There is only one outgoing argument space and any values stored there are considered
to be killed by the very next call even if the next call doesn't take any stack based
arguments. For x86, we can push some arguments on the stack for one call and leave
them there while pushing some new arguments for a nested call. Thus we allow nested
calls for x86 but do not allow them for the other architectures.
During the first Morph phase known as global Morph we call CallArgs::ArgsComplete()
after we have completed determining ABI information for each arg. This method applies
the following rules:
GTF_ASG, then we
force all previous non-constant arguments to be evaluated into temps. This is very
conservative, but at this phase of the JIT it is rare to have an assignment subtree
as part of an argument.GTF_CALL flag, then
we force that argument and any previous argument that is marked with any of the
GTF_ALL_EFFECT flags into temps.
FEATURE_FIXED_OUT_ARGS, any previous stack based args that
we haven't marked as needing a temp but still need to store in the outgoing args
area is marked as needing a placeholder temp using needPlace.localloc to be evaluated into temps.GTF_GLOB_REF flag. For a special
case we call SetNeedsTemp() and set up the temp in fgMorphArgs.
SetNeedsTemp records the tmpNum used and sets isTmp so that we handle it
like the other temps. The special case is for a TYP_STRUCT argument passed by
value when we can't optimize away the extra copy.After calling ArgsComplete() the SortArgs() method is called to determine the
optimal way to evaluate the arguments. This sorting controls the order that we place
the nodes in the late argument list.
After calling SortArgs(), the EvalArgsToTemps() method is called to create
the temp assignments and to populate the LateArgs list.
gtNewTempAssign. This assignment replaces
the original argument in the early argument list. After we create the assignment
the argument is marked with m_isTmp = true.m_isTmp are treated similarly as
above except we don't create an assignment for them.TYP_STRUCT argument passed by value will have m_isTmp set to true
and will use a GT_COPYBLK or a GT_COPYOBJ to perform the assignment of the temp.arg1 SETUP in the JitDump.m_needPlace is true (only for FEATURE_FIXED_OUT_ARGS)
then the existing node is moved to the late argument list.After the Call node is fully morphed the LateArgs list will contain the arguments
passed in registers as well as additional ones for m_needPlace marked
arguments whenever we have a nested call for a stack based argument.
When m_needTmp is true the LateArg will be a LclVar that was created
to evaluate the arg (single-def/single-use). When m_needTmp is false
the LateArg can be an arbitrary expression tree.