docs/Optimizer_Layering_Annotations.md
Layering annotations are per-node metadata strings that guide graph partitioning by indicating which execution provider (EP) layer a node belongs to. They are loaded from the ONNX model's NodeProto metadata (key "layer_ann") and consumed during the partitioning phase to influence EP assignment.
Graph optimizers run in ordered levels:
Level 0 (Basic) ─► Level 1 (Extended) ─► Partitioning ─► Level 2+ (Layout, etc.)
Graph::RemoveAllLayeringAnnotations() clears all annotations.Key rule: Only Level 1 (and Level 0) optimizers need to propagate layering annotations.
When an optimizer replaces, fuses, or decomposes nodes, the original annotated node is removed and new nodes are created. If the new nodes do not carry the original annotation, partitioning loses the assignment hint for that subgraph, potentially causing incorrect EP placement.
AddNode Overload with annotation_sourceGraph::AddNode provides overloads that accept a const Node& annotation_source parameter. The new node automatically inherits the layering annotation from the source node.
// Instead of:
Node& new_node = graph.AddNode(name, op_type, description, inputs, outputs);
// Missing annotation propagation!
// Use:
Node& new_node = graph.AddNode(name, op_type, description, inputs, outputs,
original_node); // annotation_source
All standard AddNode signatures have a corresponding annotation_source variant:
// With const NodeAttributes*
Node& AddNode(name, op_type, description,
gsl::span<NodeArg* const> inputs,
gsl::span<NodeArg* const> outputs,
const Node& annotation_source,
const NodeAttributes* attributes = nullptr,
const std::string& domain = kOnnxDomain);
// With NodeAttributes&&
Node& AddNode(name, op_type, description,
gsl::span<NodeArg* const> inputs,
gsl::span<NodeArg* const> outputs,
const Node& annotation_source,
NodeAttributes&& attributes,
const std::string& domain = kOnnxDomain);
// initializer_list variants also available
DuplicateNodeAnnotationThe utility function optimizer_utils::DuplicateNodeAnnotation(src, dst) copies annotations between existing nodes. This is still used when the annotation source is conditional (e.g., when the source node pointer may be null). Prefer the AddNode overload for unconditional propagation.
Graph::AddNode(const Node& other) — the copy overload used for duplicating nodes — automatically copies annotations. No additional action is needed when duplicating a node via this overload.
Although Level 2+ optimizers do not deal with layering annotations directly (they have been cleared), they must still propagate execution provider (EP) assignments. EP assignments are the downstream result of the annotation-driven partitioning step. After partitioning, each node carries an EP assignment (e.g., CUDAExecutionProvider, CPUExecutionProvider) that determines where the node's kernel runs.
When a Level 2+ optimizer creates new nodes that replace or derive from existing ones, it must copy the EP assignment from the source node:
Node& new_node = graph.AddNode(name, op_type, description, inputs, outputs);
new_node.SetExecutionProviderType(original_node.GetExecutionProviderType());
Failing to propagate the EP assignment causes the new node to fall back to the default provider (typically CPU), silently breaking the intended placement and potentially degrading performance or correctness. This requirement predates the layering annotation feature and applies to all optimizers that run after partitioning.
Note: The
AddNodeoverload withannotation_sourcepropagates both the layering annotation and nothing else — EP assignment is still set separately. Layering annotations and EP assignments serve different stages of the pipeline and are managed independently.
// GeluFusion: fusing Div + Erf + Add + Mul + Mul into a single Gelu
Node& gelu_node = graph.AddNode(
graph.GenerateNodeName("Gelu"),
"Gelu", "fused Gelu subgraphs",
{gelu_input}, {gelu_output},
div_node); // propagate annotation from the root matched node
// STFT decomposition: each new node inherits from the original STFT node
auto [reshape_node, reshape_out] = AddNode(graph, "Reshape", ep, inputs, &stft);
auto [conv_node, conv_out] = AddNode(graph, "Conv", ep, conv_inputs, &stft);
auto [concat_node, concat_out] = AddNode(graph, "Concat", ep, concat_inputs, &stft);
Node& q_node = graph.AddNode(...);
if (src_node) {
optimizer_utils::DuplicateNodeAnnotation(*src_node, q_node);
}
graph.AddNode(...) call that creates a replacement node, use the annotation_source overload.optimizer_utils::DuplicateNodeAnnotation after the AddNode call.