rfcs/experimental/global_control_fast_leave/README.md
In oneTBB 2021.13.0, PR https://github.com/uxlfoundation/oneTBB/pull/1352 introduced a change to worker thread behavior that causes them to spin after completing work before leaving an arena (delayed leave). While this optimization improves performance for workloads with frequent parallel phases by keeping workers readily available, it can cause performance regressions for workloads that interleave short parallel phases with single-threaded work.
The root cause is that spinning worker threads increase CPU load even when yielding, which can reduce clock speeds due to thermal and power management. This is particularly problematic in scenarios where single-threaded code constitutes a significant portion of the workload.
Issue https://github.com/uxlfoundation/oneTBB/issues/1876 reports this regression with a
reproducible example demonstrating ~8% slowdown in wall-clock time. The reporter notes that while
the parallel_phase API provides per-arena control over this behavior, a simpler global mechanism
would be preferable for many use cases.
parallel_phase API provides fine-grained per-arena control,
many applications would benefit from a single global switch that affects all arenas.task_arena construction or parallel algorithm
calls.The parallel_phase API (RFC:
parallel_phase_for_task_arena)
provides per-arena control over worker retention through task_arena::leave_policy and
start_parallel_phase/end_parallel_phase functions. This proposal complements that feature by
providing a global override mechanism.
This proposal introduces a new global_control parameter called leave_policy that, when set to "fast",
would override the default behavior for arenas that are initialized with leave_policy::automatic.
Setting the parameter would not affect arenas that are already initialized or initialized with an
explicit leave_policy. After initialization, the parallel phase API can independently modify the
arena's leave behavior at runtime, allowing workers to be retained during active parallel phases
regardless of the initial state set by the global control.
The proposal adds a new enumeration value to the existing global_control::parameter enum:
#define TBB_PREVIEW_PARALLEL_PHASE 1
#include <oneapi/tbb/global_control.h>
#define TBB_HAS_PARALLEL_PHASE 202xxx
namespace oneapi {
namespace tbb {
class global_control {
public:
enum parameter {
max_allowed_parallelism,
thread_stack_size,
terminate_on_exception,
scheduler_handle, // not a public parameter
#if TBB_PREVIEW_PARALLEL_PHASE
leave_policy, // NEW: Controls worker fast leave behavior
#endif
parameter_max
};
global_control(parameter p, size_t value);
~global_control();
static size_t active_value(parameter p);
};
}}
The leave_policy parameter would control whether arenas are initialized with automatic or fast leave policy by default:
| Value | Behavior |
|---|---|
task_arena::leave_policy::automatic (default) | Workers follow the default system-specific policy (may spin before leaving) |
task_arena::leave_policy::fast | Workers leave immediately (fast leave enabled) |
When multiple global_control objects exist for leave_policy, their logical disjunction would be
used (consistent with the terminate_on_exception parameter). This means if any
global_control(leave_policy, task_arena::leave_policy::fast) is active, fast leave would be enabled globally.
The proposed implementation would modify the thread_leave_manager::set_initial_state to include an additional check
for the global leave_policy parameter when determining the initial state for worker retention. When the former is set
to fast, the method would treat the arena as if it were initialized with leave_policy::fast.
flowchart TD
Start([🚀 Arena initialization:
<code>set_initial_state</code> called])
Start --> CheckExplicit{{"⚡ leave_policy = fast?"}}
CheckExplicit -->|"✅ Yes"| SetFastLeave([🚪 **Set state to**
**FAST_LEAVE**])
CheckExplicit -->|"❌ No (automatic)"| CheckGlobal
subgraph PROPOSED ["🆕 **PROPOSED:**"]
CheckGlobal{{"🌐 global_control::
active_value(leave_policy) = task_arena::leave_policy::fast?"}}
end
CheckGlobal -->|"✅ Yes"| SetFastLeave
CheckGlobal -->|"❌ No"| PlatformPolicy([📋 **Set state based on**
**platform policy**])
style Start fill:whitesmoke,stroke:darkslategray,stroke-width:3px,color:darkslategray,font-weight:bold
style CheckExplicit fill:papayawhip,stroke:orangered,stroke-width:2px,color:firebrick
style SetFastLeave fill:darkseagreen,stroke:forestgreen,stroke-width:3px,color:darkgreen,font-weight:bold
style PlatformPolicy fill:lavender,stroke:steelblue,stroke-width:2px,color:steelblue
style CheckGlobal fill:peachpuff,stroke:darkorange,stroke-width:3px,color:orangered,font-weight:bold
style PROPOSED fill:lightyellow,stroke:darkorange,stroke-width:4px,stroke-dasharray:5 5
linkStyle 0 stroke:orangered,stroke-width:2px
linkStyle 1 stroke:darkorange,stroke-width:2px
linkStyle 2 stroke:forestgreen,stroke-width:3px
linkStyle 3 stroke:steelblue,stroke-width:2px
The global_control::leave_policy parameter affects the initial state set by thread_leave_manager. Once
the initial state is determined, the parallel phase API independently modifies the state machine at runtime.
The global control is not consulted again after initialization.
Arena leave_policy | Global leave_policy | Initial State |
|---|---|---|
fast | any | FAST_LEAVE |
automatic | fast | FAST_LEAVE |
automatic | automatic (default) | Platform policy |
This design ensures that:
leave_policy::fast always results in fast leavestart_parallel_phase() independently
transitions the state machine to PARALLEL_PHASEleave_policy parameter only affects the initial state of arenas initialized with
leave_policy::automaticThis API would be introduced under the TBB_PREVIEW_PARALLEL_PHASE macro, consistent with the
related parallel phase feature.
While the feature is in preview state, the
parallel phase API reference
would need to be extended with documentation for the global_control::leave_policy parameter:
leave_policy parameterglobal_control header and leave_policy parameterglobal_control::leave_policy and per-arena leave_policyglobal_control::leave_policy provides application-wide control
while task_arena::leave_policy and parallel_phase provide per-arena controlOnce this feature is stabilized and moved from preview to supported status, the oneAPI specification would need to be updated. Specifically, the global_control class documentation would need to be extended with
parameter enumerationtask_arena::leave_policy and the parallel_phase APIThe implementation would use the existing thread-safe control_storage infrastructure:
global_control construction/destruction would be thread-safeactive_value() queries would be thread-safeThe proposed implementation would have minimal performance impact:
| Scenario | Impact |
|---|---|
| Arena initialization | One additional branch in thread_leave_manager::set_initial_state |
| Memory overhead | One additional control_storage object per process |
The additional branch is only evaluated once per arena initialization (not on the worker leave hot path).
control_storage object array size would change, requiring
library rebuild but no changes to existing binaries#define TBB_PREVIEW_PARALLEL_PHASE 1
#include <oneapi/tbb/global_control.h>
#include <oneapi/tbb/parallel_for.h>
int main() {
// Enable fast leave globally for the duration of this scope
tbb::global_control gc(tbb::global_control::leave_policy, tbb::task_arena::leave_policy::fast);
for (int i = 0; i < 1000; ++i) {
// Single-threaded work benefits from reduced CPU load
do_serial_work();
// Parallel work - workers leave immediately after completion
tbb::parallel_for(0, 1000000, [](int j) {
do_parallel_work(j);
});
}
}
// No global_control is active yet; this call lazily initializes the implicit arena with automatic (the default).
// Once initialized, the implicit arena's leave policy is fixed and not affected by later global_control changes.
tbb::parallel_for(0, 1000000, [](int i) { do_parallel_work(i); });
{
tbb::global_control gc1(tbb::global_control::leave_policy, tbb::task_arena::leave_policy::fast);
tbb::task_arena arena1;
// arena1 is lazily initialized on first use; since gc1 is active, it is initialized with fast leave.
arena1.execute([&] { /* ... */ });
// The implicit arena was already initialized with automatic above; gc1 does not affect it retroactively.
tbb::parallel_for(0, 1000000, [](int i) { do_parallel_work(i); });
{
tbb::global_control gc2(tbb::global_control::leave_policy, tbb::task_arena::leave_policy::automatic);
tbb::task_arena arena2;
// Both gc1 (fast) and gc2 (automatic) are active. Disjunction rule: any fast value present => fast wins.
arena2.execute([&] { /* ... */ });
}
}
tbb::task_arena arena3;
// Both gc1 and gc2 are now destroyed, so arena3 is initialized with automatic (the default).
arena3.execute([&] { /* ... */ });
tbb::task_arena arena;
tbb::global_control gc(tbb::global_control::leave_policy, tbb::task_arena::leave_policy::fast);
// Before entering the parallel phase, the arena is initialized with FAST_LEAVE due to the active global control.
// Global leave_policy is active, but parallel_phase overrides it
arena.start_parallel_phase();
arena.execute([&] {
tbb::parallel_for(/* ... */);
});
// Some serial computation
// More parallel work without worker re-acquisition overhead
arena.execute([&] {
tbb::parallel_sort(/* ... */);
});
arena.end_parallel_phase();
// After parallel_phase ends, workers leave immediately due to arena's initial FAST_LEAVE state
The following test scenarios would be required:
Functional tests:
active_value(leave_policy) returns correct valuesInteraction tests:
leave_policy::fast arenasleave_policy::automatic arenasPerformance tests:
An environment variable (e.g., TBB_LEAVE_POLICY=FAST) could control the behavior at startup.
Pros:
Cons:
The default behavior could be changed to fast leave, making delayed leave opt-in.
Pros:
Cons:
Naming: Should it convey "default override" semantics?
task_arena::leave_policy for consistencyoverride_default_leave_policy to clarify no effect on already initialized arenasScope and Granularity
global_control is setComposition: Should the global control be a logical disjunction, first-registered wins, last-set wins, etc.?
The following conditions need to be met to move the feature from experimental to fully supported: