Optimization

Bytecode interpreters commonly employ a variety of optimizations to achieve better performance. This section discusses how to employ these optimizations in Bytecode DSL interpreters.

Boxing elimination

A major source of overhead in interpreted code (for both Truffle AST and bytecode interpreters) is boxing. By default, values are passed between operations as objects, which forces primitive values to be boxed up. Often, the boxed value is subsequently unboxed when it gets consumed.

Boxing elimination avoids these unnecessary boxing steps. The interpreter can speculatively rewrite bytecode instructions to specialized instructions that pass primitive values whenever possible. Boxing elimination can also improve compiled performance, because Graal is not always able to remove box-unbox sequences during compilation.

To enable boxing elimination, specify a set of boxingEliminationTypes on the @GenerateBytecode annotation. For example, the following configuration

java

@GenerateBytecode(
    ...
    boxingEliminationTypes = {int.class, long.class}
)

will instruct the interpreter to automatically avoid boxing for int and long values. (Note that boolean boxing elimination is supported, but is generally not worth the overhead of the additional instructions it produces.)

Boxing elimination is implemented using quickening, which is described below.

Quickening

Quickening is a general technique to rewrite an instruction with a specialized version that (typically) requires less work. The Bytecode DSL supports quickened operations, which handle a subset of the specializations defined by an operation.

Quickened operations can be introduced to reduce the work required to evaluate an operation. For example, a quickened operation that only accepts int inputs might avoid operand boxing and the additional type checks required by the general operation. Additionally, a custom operation that has only one active specialization could be quickened to an operation that only supports that single specialization, avoiding extra specialization state checks.

At the moment, quickened instructions can only be specified manually using @ForceQuickening. In the future, tracing will be able to automatically infer useful quickenings.

Superinstructions

Note: Superinstructions are not yet supported.

Superinstructions combine common sequences of instructions together into single instructions. Using superinstructions can reduce the overhead of instruction dispatch, and it can enable the host compiler to perform optimizations across the instructions (e.g., eliding a stack push for a value that is subsequently popped).

In the future, tracing will be able to automatically infer useful superinstructions.

Tracing

Note: Tracing is not yet supported.

Determining which instructions are worth optimizing (via quickening or superinstructions) typically requires manual profiling and benchmarking. In the future, the Bytecode DSL will automatically infer optimization opportunities by tracing the execution of a representative corpus of code.