third_party/xla/docs/effort_levels.md
XLA provides options to control the amount of effort the compiler will expend to
Similar to the -O flags in gcc or clang, this field allows the user to influence how much work the compiler does in optimizing for execution time. It can be set via the optimization_level field of the ExecutableBuildOptionsProto message, or the optimization_level field of the ExecutionOptions message.
Lower optimization levels will cause various HLO passes to behave differently, typically doing less work, or may disable certain HLO passes entirely. The optimization level may also influence the compiler backend, such that the exact effect of this field has a dependence on the target platform. However, as a general guideline, the following table describes the expected overall effect of each value:
| Level | Use Case |
|---|---|
| EFFORT_O0 | Fastest compilation, slowest runtime |
| EFFORT_O1 | Faster compilation with reasonable runtime |
| EFFORT_O2 | Strongly prioritize runtime (suitable default for production workloads) |
| EFFORT_O3 | Expensive or experimental optimizations |
In XLA:GPU, there are several passes that we disable by default because they significantly increase compilation time by increasing the HLO size. For convenience, we consolidate them under the optimization level option, such that setting optimization_level to O1 or above will lead to the following behavior:
xla_gpu_enable_pipelined_all_gatherxla_gpu_enable_pipelined_all_reducexla_gpu_enable_pipelined_reduce_scatterxla_gpu_enable_while_loop_double_bufferingxla_gpu_enable_latency_hiding_schedulerAnother effort level option controls the degree to which the compiler will attempt to make the resulting program "fit in memory", where "fit" and "memory" have backend-dependent meanings (for example, in XLA:TPU, this option controls the degree to which the compiler works to keep the TPU's high-bandwidth memory (HBM) usage below the HBM capacity). It can be set via the memory_fitting_level field of the ExecutableBuildOptionsProto message, or the memory_fitting_level field of the ExecutionOptions message.
As with optimization level, the exact meaning of each effort level value is backend-dependent, but the following table describes the expected effect as a general guideline:
| Level | Use Case |
|---|---|
| EFFORT_O0 | Minimal effort to fit (fail compilation as quickly as possible instead) |
| EFFORT_O1 | Reduced effort to fit |
| EFFORT_O2 | Significant effort to fit (suitable default for production workloads) |
| EFFORT_O3 | Expensive or experimental algorithms to reduce memory usage |