tensorflow/core/ir/README.md
This directory contains the definition of the Intermediate Representation (IR) for TensorFlow graphs using MLIR.
This directory defined an MLIR dialect, the “TensorFlow Graph dialect”, that represents accurately TensorFlow graphs. Contrary to the previous TensorFlow dialect which made some opinionated choices that diverged from GraphDef and TensorFlow Graph semantics, this dialect embraces TensorFlow Graph as it is. In particular the concepts of control dependencies, requested device, assigned device, and node name are all first-class attributes on the MLIR operations in this dialect.
The main principle that drove the development of this dialect has been to ensure perfect round-trip and general compatibility with existing TensorFlow semantics, so that this solution can be deployed by default in any situation where "Graph Optimization" and Grappler transformations are involved today, regardless of TensorFlow V1 or V2. This new approach is also made possible by evolutions in MLIR that allow representing graphs in a way that wasn’t possible before (more in the Graph operation design section below).
MLIR started with a basic structure reflecting LLVM in that it defined a
Module containing a list of Functions. Each of these was defining a body
constrained to be a Control-Flow Graph (CFG): a list of Blocks, each of them
containing a list of Operations. A fundamental aspect of the CFG
representation is the notion of “control”: the abstract semantic model considers
that a single Operation executes at a given time, and the next Operation to
execute is necessarily the one listed immediately after1. The last
Operation in a Block is a Terminator: it decides what is the next Block
where the control will be transferred (think of a branch).
When MLIR started, a first dialect -- that we were referring to as “TF control
dialect” -- was developed to model TensorFlow graphs. This dialect supported
control dependencies, but didn’t allow cycles in the graph, which forced some
tricks to model TensorFlow V1 loops and in particular the NextIteration
operation. While this dialect enabled some experimentation, it wasn’t seen as
really practical and another dialect was co-existing: the “tf” dialect that
we’re using currently. This dialect was designed before TF2.0
was released,
and made strong assumptions about TensorFlow evolving towards a world where
eager execution and function execution become unified and V1 specific constructs
would be deprecated and disappear. As such control dependencies are not
supported and are instead implicit, control-flow V1 ops (such as Switch & Merge)
and deadness aren’t supported2, new device placement modelling solutions were
considered. These choices in the model enabled us to write graph transformations
as stateless DAG-to-DAG patterns that can be applied to a subgraph, without
considering the entire graph.
The combination of the TensorFlow and executor dialects allows for importing most TensorFlow graphs and the TensorFlow dialect has proven enough to implement the TF/XLA bridge, TFLite converter, and TFRT . However, the intent was for TensorFlow 2.0 to trace TensorFlow functions directly in the TensorFlow dialect, leaving the executor dialect only as a way to provide limited support for TensorFlow V1 graphs.
However, the implementation of TensorFlow 2.0 didn't break away from TensorFlow
V1 entirely, instead TensorFlow functions are wrapped above TensorFlow V1 and
expose a leaky abstraction over the classical graph. As a result, the TensorFlow
dialect never got in a position to be enabled by default in TensorFlow. In
particular there are many subtle way in which TensorFlow functions diverges from
the sequential eager interpretation. For example the following pattern has been
recommended to users who intended to call a function bar knowing that the
first argument wasn’t necessary if they only used the first result.
@tf.function
def foo(z):
x = tf.Placeholder(tf.int32)
y, _ = bar(x, z)
return y
The use of a placeholder would throw an exception in eager mode, but “works” in graph mode as long as inlining and pruning ensure the placeholder is removed before execution.
Other cases involve the need for control dependencies beyond what the auto-control-dependency tracking offers. For example the tf.recompute_grad creates control-dependencies on non-side-effecting ops to have a finer grain control of memory usage.
Finally, the error modelling in TensorFlow can also be surprising. While in
eager op-by-op mode the execution is interrupted as soon as an error occurs,
tf.function tracing does not consider error handling as side-effecting
(otherwise it would have to add a control dependency between every node!) and as
such a program like:
@tf.function
def foo(x, y, variable):
b = tf.matmul(x, y)
variable.assign(1.0)
return b
Does not guarantee that the assignment to the variable won’t occur if an error occurs while processing the matmul, so calling:
foo(1., 2., variable)
Throws an exception because tf.matmul expects rank-2 tensors, but the variable
may or may not have been assigned. As such a user may want to opt in a safer
behavior for their function:
@tf.function
def foo(x, y, variable):
b = tf.matmul(x, y)
with tf.control_dependencies([b]):
variable.assign(1.0)
return b
However, this control dependency cannot be modelled in the TensorFlow dialect:
it will be just dropped! There is no solution today to prevent the variable
assignment to be executed ahead of the matmul in the TensorFlow Dialect.
While many of these cases could be modeled with different constructs at the source level, this would be a major overhaul of TensorFlow itself, and more importantly its ecosystem. Instead, we recognize that the TensorFlow dialect as it exists today cannot support all of these use-cases, and it prevented MLIR from providing a general graph transformation solution for TensorFlow, contributing to more fragmentation instead of reducing it as promised.
The rest of this document describe how this new dialect follows a more pragmatic approach to enable MLIR deployment in TensorFlow.
This new dialect intends to allow us to replace Grappler and existing graph transformations, for TensorFlow V1 and V2 without constraints. As such the main principle is to support perfect roundtrip between TensorFlow Graph/GraphDef and MLIR.
An individual TensorFlow NodeDef is translated into an individual MLIR
operation using the following form:
%AddV2, %ctl = tfg.AddV2(%placeholder, %placeholder_1) [%ctl_1, %ctl_2]
device("GPU") assigned_device("TPU") name("add")
{some_attribute = "some attr!"}
: (tensor<*xi32>, tensor<*xi32>) -> (tensor<*xi32>)
This structure allows for a perfect round-trip to NodeDef, while still being
ergonomic when manipulating it in MLIR (compared to the tf\_executor dialect
for example). The tradeoff we are making here is that we preserve all
attributes, including the “derived” ones3, which creates some amount of
redundancy with the signature. We may consider pruning these redundant
attributes in the future in the same way as we do in the TensorFlow dialect.
A structural operation is introduced as a container: tfg.graph acts as a bag
of unordered TensorFlow operations, and carries a “version” attribute that
corresponds to the
VersionDef
present in GraphDef:
tfg.graph #tfg.version<producer = 42, min_consumer = 33> {
%arg0, %ctl_0 = tfg.placeholder() : () -> (tensor<*xi32>)
%add, %ctl_1 = tfg.AddV2(%arg0, %arg1)
: (tensor<*xi32>, tensor<*xi32>) -> (tensor<*xi32>)
%arg1, %ctl_2 = tfg.placeholder() : () -> (tensor<*xi32>)
}
Note that the AddV2 operation is using the result of a placeholder operation
that is defined later in the list. This wasn’t possible in MLIR 2 years ago when
the TensorFlow dialect was designed. It was actually
attempted to allow such unordered semantics
and break away from the CFG-centric representation, but we couldn’t reach a
consensus, and some key members of the team believed that a departure from
CFG/SSA would limit the reusability of many algorithms. On the other hand, this
choice prevented us to design a graph dialect that can just replace TensorFlow
Graph structure as-is. Since then MLIR evolved to become more general and this
feature is now available (it was motivated by the
support for HW synthesis tools).
Another recent development that made it also more friendly is the
removal of the requirement for terminators:
the tfg.graph operation above contains a single block listing operations, and
a terminator does not have any role to play. Finally, a Dialect can now
act as fallback for OpInterfaces,
which allows us to reuse more of the TensorFlow registry to provide information
to MLIR passes about TensorFlow operation without having to register them with
MLIR in the first place.
The tfg.graph operation round-trips almost perfectly to
Graph,
except for the Function Library, which I address below.
Functions in TensorFlow are stored as
FunctionDef,
which has a signature, holds attributes, identifies argument and returned
values, and finally contains a list of nodes for its body. While on the surface
this repeated NodeDef node_def field looks identical to the body of
GraphDef,
there are fundamental differences in the representation, and in particular the
format the edges are represented is different.
To understand these differences, it is important to realize that a key aspect of
FunctionsDef is that they are stored uninstantiated, and can be considered in
a similar way to a C++ template function. The signature is actually an OpDef,
and just like any regular TensorFlow operation the types of the arguments and
the results are encoded and constrained with attributes. These attributes are
only provided or inferred based on the function’s use: the call-site is
responsible for instantiating a function before it’s body can be represented as
a Graph. Because of this, the body of an uninstantiated function is modeled
differently than Graph body:
tfg.func generic @foo(%arg0 : !tfg.tensor {tfg.name = "input"},
%arg1 : !tfg.tensor {tfg.name = "another_input"})
-> (!tfg.tensor {tfg.name = "result1"},
!tfg.tensor {tfg.name = "result2"})
attributes {description = "function foo"} {
%Greater, %ctl_0 = tfg.Greater(%arg0, %arg1) name("Greater")
%G_z = tfg.get_result(%Greater) "z" : 0
%Switch, %ctl_1 = tfg.Switch(%G_z, %G_z) name("cond/Switch")
%s_true = tfg.get_result %Switch "output_true" : 0
%s_false = tfg.get_result %Switch "output_false" : 0
tfg.return(%s_true, %s_false) [%ctl_0]
}
Note how the tensor types !tfg.tensor are opaque, and every operation returns
a single tensor output and a control token. The tensor output is then unpacked
by looking up individual results by name. This is particularly visible with the
Switch operation where the two results are accessed using tfg.get_result
looking them up by name output_true:0 and output_false:0. This is required
because the OpDef can define the number of output based on the attribute present
on the NodeDef, and these attributes can in turn be dependent on the attributes
added on the function during instantiation (you can read more about it in the
description of the placeholder attribute value).
Post-instantiation, a function body is similar to the one of a graph:
tfg.func @foo(%arg0 : tensor<*xf32> {tfg.name = "input"},
%arg1 : tensor<*xf32> {tfg.name = "another_input"})
-> (tensor<*xi1> {tfg.name = "result1"},
tensor<*xi1> {tfg.name = "result2"})
attributes {description = "function foo"} {
%Greater, %ctl_0 = tfg.Greater(%arg0, %arg1) [%arg1.ctl] name("Greater")
: (tensor<*xf32>, tensor<*xf32>) -> tensor<*xi1>
%Switch:2, %ctl_1 = tfg.Switch(%Greater, %Greater) name("cond/Switch")
: (tensor<*xi1>, tensor<*xi1>) -> tensor<*xi1>
tfg.return(%Switch#0, %Switch#1) [%ctl_0]
}
The operations aren’t ordered, except for the tfg.return which is a terminator
and must be the last operation. The only remaining difference with a graph is in
the handling of the function signature (arguments and returned values), and
attributes.
There is one aspect of the modelling worth mentioning from the MLIR point of
view: FunctionDef allows for nodes in a graph to express input control
dependencies from function arguments. However, in MLIR you need an actual
SSA value to add
an edge between two operations. These values are typed and this is why
operations define a control token (like %ctl_0). We apply the same recipe for
arguments and for each of them we define a control token. We omit these “shadow
arguments” from the textual form, but in-memory the MLIR function has really 4
arguments:
tfg.func @foo(%arg0 : tensor<*xf32> {tfg.name = "input"}, %arg0.ctl : !tfg.control
%arg1 : tensor<*xf32> {tfg.name = "another_input"}, %arg1.ctl : !tfg.control)
-> (tensor<*xi1> {tfg.name = "result1"},
tensor<*xi1> {tfg.name = "result2"})
attributes {description = "function foo"} {
...
The convention is that callers are only exposed to the non-control input
(%arg0 and %arg1) while the control tokens are only intended to be visible
and used in the body. This makes it very aligned with how TensorFlow works.
Inside the body, values for the control dependencies on the arguments are
available with a .ctl suffix (i.e. %arg0.ctl and %arg1.ctl).
The basic blocks above are enough to model GraphDef, but not the entirety of
SavedModel. However, most of the use cases that we’re targeting right now are in
the scope of the existing GraphOptimization and Grappler APIs, which aren’t
really coupled to SavedModel. The user can load a SavedModel independently of
MLIR and invoke MLIR transformations on a Function or Graph from there. There is
also already a dialect to model the specific aspects of SavedModel, it is
currently wrapping around the TensorFlow executor dialect and the TensorFlow
dialect, and we may look into integrating it with the tfg dialect in the
future. For these reasons, we mostly leave out modeling the Saved Model for
future work right now.
Functional control-flow is modeled with nodes in the graph invoking functions in
the library. MLIR supports regions, which is a concept that allows attaching
subgraphs directly inside a graph, making it more friendly to optimizations. For
example a conditional operation can represent the two branches subgraph in the
TensorFlow dialect directly as follows:
%0, %1, %2 = "tf.IfRegion"(%arg0) ({
%t0 = "tf.Abs"(%arg1) : (tensor<2xf32>) -> tensor<2xf32>
%t1 = "tf.Acos"(%arg1) : (tensor<2xf32>) -> tensor<2xf32>
%t2 = "tf.Acosh"(%arg1) : (tensor<2xf32>) -> tensor<2xf32>
"tf.Yield"(%t0, %t1, %t2) : (tensor<2xf32>, tensor<2xf32>, tensor<2xf32>) -> ()
}, {
%e0 = "tf.Neg"(%arg1) : (tensor<2xf32>) -> tensor<2xf32>
%e1 = "tf.Relu"(%arg1) : (tensor<2xf32>) -> tensor<2xf32>
%e2 = "tf.Sin"(%arg1) : (tensor<2xf32>) -> tensor<2xf32>
"tf.Yield"(%e0, %e1, %e2) : (tensor<2xf32>, tensor<2xf32>, tensor<2xf32>)
}): (tensor<i1>) -> (tensor<2xf32>, tensor<2xf32>, tensor<2xf32>)
%3 = "tf.Add"(%0, %1) : (tensor<2xf32>, tensor<2xf32>) -> tensor<2xf32>
%4 = "tf.Add"(%2, %3) : (tensor<2xf32>, tensor<2xf32>) -> tensor<2xf32>
MLIR transformations in this dialect will operate on a module that will contain
at most one graph operation as well as a list of functions. This interface
will make such transformations suitable for fit within Grappler or as
GraphOptimization interchangeably.
Instead of a flat graph, an entry function will be provided when feeds/fetches are available for the main graph (PRE_PLACEMENT graph optimizations execute in Session before feeds/fetches are provided).
The executor dialect wasn’t designed to write transformation: it is designed as a wrapper around the TensorFlow dialect: the intent was for it to be a stepping stone to integrate MLIR and TensorFlow, and then disappear when TensorFlow V1 graphs would be deprecated. This new dialect embraces TensorFlow as it is instead of as I wish it would be.
In particular the executor dialect represents each TensorFlow node as an isolated “subgraph” nested under an “island” operation. This requires 3 operations and an extra region for each TensorFlow node, which is quite inefficient in memory as well as requiring extra indirection when pattern matching or updating nodes in the graph.
The existing TensorFlow dialect is suitable for representing a large subset of TensorFlow programs (like models that intend to convert to TFLite, or XLA), and for such cases we will continue to use it.
This new TensorFlow Graph Dialect could be used to replace the Executor Dialect as the standalone staging importing format. Importing from GraphDef/Graph would always go through the TensorFlow Graph Dialect before using some clustering or promotion algorithms to raise some subgraphs to the TensorFlow Dialect, just like we do now to cluster islands operations in TensorFlow Executor Dialect. The details of such mechanisms are left for future work.
<!-- Footnotes -->While the semantic model is sequential, this does not prevent an implementation to execute operation in parallel when proven safe. This is similar to how a superscalar CPU involves implicit parallelism. For example when mapping the TensorFlow dialect to TFRT, only side-effecting operations (Variable accesses for example) are sequenced. ↩
One of the first tools built with this was the TF->TFlite converter (replacing TOCO). Since V1 control-flow isn’t supported on TFLite this wasn’t a limitation. ↩
Derived attributes is a concept used in the TensorFlow dialect: since MLIR models type and shape information on each individual result produced by an operation, some attributes that are inserted for the sole purpose of typing are redundant and eliminated in MLIR. ↩