TOSA Lowerings

Introduction

Overview

This document provides pseudo-code lowerings from TensorFlow and TensorFlow Lite MLIR Dialects (https://www.tensorflow.org/mlir/dialects) to the TOSA Dialect (https://mlir.llvm.org/docs/Dialects/TOSA/).

The documentation is a work-in-progress: sections with missing legalizations are in the process of being written.

Syntax

The pseudo-code syntax used in this document is described below.

Primitive Datatypes

int8: signed 8-bit integer uint8: unsigned 8-bit integer int16: signed 16-bit integer int32: signed 32-bit integer int64: signed 32-bit integer uint32: unsigned 32-bit integer float32: IEEE-754 32-bit floating point format float64: IEEE-754 64-bit floating point format bool: boolean

Value

In pseudo-code, symbol starting with "%" indicates it’s a value. A value is evaluated by an operator at run time, and operator can consume and can only consume a list of values as operands. Note value’s tensor type is determined at compile time. Only the evaluation happens at run time One can easily construct a data flow subgraph by looking at the producer/consumer.

Tensor Type

Tensor type is an attribute determined by legalization at compile time, describing the shape and element data type. It’s noted as tensor<shape, dtype>, or shorthanded as tensor<%t.type>

Operator Prototype

In pseudocode an TOSA operator is prototyped as following format.

%<output_value> = tosa.<OPERATOR>(%<input_value>) {<attribute = …}

Value Attributes

For the purposes of brevity and clarity in this document, the pseudocode allows the following notation on value attribute.

Shorthand	Description
`%t.shape`	Shape vector for the tensor
`%t.shape[i]`	Size of dimension i for the tensor
`%t.rank`	Rank of the tensor
`%t.dtype`	Datatype of the tensor
`%t.scale`	Quantized scaling parameter (float64)
`%t.zp`	Quantized zero-point (int64)
`%t.signed`	Boolean indicating the type is signed
`%t.num_bits`	Number of bits in the datatype
`%t.num_elements`	Number of elements in the tensor
`%t.type`	Tuple of `tensor<%t.shape, %t.dtype>`
`%t.size`	For tensor lists: the number of tensors in the list

Tensor Dimension Shorthand

Where the TOSA Specification allows the use of named dimensions, the following names may be used.

Name	Description
`N`	Batch dimension
`H`	Height dimension
`W`	Width dimension
`C`	Channel dimension
`M`	Depthwise multiplier

Each of these may be prefixed with I for the input dimension or O for the output dimension or K for kernel dimensions.

Common Legalization Functions

The following pseudocode helper functions are used to cannonicalize arguments from different frameworks to the TOSA dialect.

.as_constant(): Matched as Constant

Wherever %tensor.as_constant() is specified, a constant vector will be created to hold the value in the %tensor at compile time. This only succeeds if %tensor is fed by a constant type operator. If constant matching fails, the lowering will fail and be terminated.

Common Legalization Functions

The following pseudo-code helper functions are used to cannonicalize arguments from different frameworks to the TOSA dialect.

get_padding_values_from_explicit_pad_attr()

vector<int64> get_padding_values_from_explicit_pad_attr(vector<int64> explicit_pad,
                                                         tensorflow::TensorFormat data_format_tf)
{
    int64 pad_before, pad_after
    vector<int64> computed_paddings

    for (int32 i = 0; i < 2; i++) {
        int64 dim = GetTensorSpatialDimIndex(4, data_format_tf, i)
        pad_before = explicit_pad[dim * 2]
        pad_after  = explicit_pad[dim * 2 + 1]
        computed_paddings.push_back(pad_before)
        computed_paddings.push_back(pad_after)
    }

    return computed_paddings
}

get_padding_values_from_pad_type()

Calculate explicit padding array based on pad type

vector<int64> get_padding_values_from_pad_type(tensorflow::Padding padding, tensorflow::TensorFormat data_format,
                                        uint32 first_filter_spatial_dim, type input_type, type filter_type
                                        vector strides, vector dilations)
{
    assert(padding != tensorflow::Padding::EXPLICIT);

    vector<int64> computed_padding;

    // Padding over H and W dimensions
    for (int32 i = 0; i < 2; i++) {
        int32 ifm_dim = get_tensor_spatial_dim_index(4, data_format, i);

        int32 filter_dim = first_filter_spatial_dim + i;

        int32 dim_dilation = dilations[ifm_dim];
        int32 dim_stride   = strides[ifm_dim];

        int64 op_size, pad_before_tf, pad_after_tf;

        tensorflow::GetWindowedOutputSizeVerbose(input_type.shape[ifm_dim], filter_type.shape[filter_dim],
                                                   dim_dilation, dim_stride, padding,
                                                   // Outputs
                                                   &op_size, &pad_before_tf, &pad_after_tf);
        computed_paddings.push_back(pad_before_tf);
        computed_paddings.push_back(pad_after_tf);
    }

    return computed_paddings;
}

positive_axis()

// Cannonicalize scalar axis attributes to a scalar positive axis attribute
int32 positive_axis(int32 axis, int32 rank)
{
   if (axis < 0)
       axis += rank;

   return axis;
}

compute_scale_32()

void compute_scale_32(float64 scale, int32& multiplier, int32& shift)
{
    /* Generates mantissa and shift values where mantissa is in [-1.0,-0.5] or
    [0.5, 1.0] such that
    multiplier = mantissa*2^shift */

    const float64 mantissa = std::frexp(scale, &shift);
    auto shifted_m = std::round(mantissa * (int64(1) << 31));

    assert(shifted_m <= (int64(1) << 31)); // can't be greater that 1.0
    if (shifted_m == (int64(1) << 31)) {
        shifted_m /= 2;
        shift++;
    }
    // TOSA expect right shift to be positive, and embed (1 << 31) into right
    // shift bits
    shift = (-shift) + 31;

    assert(shifted_m <= std::numeric_limits<int32>::max());

    multiplier = static_cast<int32>(shifted_m);

}

lower_batch_to_space_nd_op()

Value lower_batch_to_space_nd_op(Value %input, Value %block_shape, Value %crops, shape_t output_shape)
{

    vector <size_t> block_shape(%block_shape.rank)
    vector std::pair<size_t, size_t> crops_arr

    size_t remaining_shape_rank = %input.rank - %block.rank - 1
    size_t crops_dim = %crops.shape[0]

    for (int32 i = 0; i < crops_dim; i++) {
        crops[i] = std::make_pair(%crops.as_constant()[i * crops_dim + 0],
                                  %crops.as_constant()[i * crops_dim + 1])
    }

    // Step 1: Reshape input to
    // [block_shape[0],
    // ...
    // [block_shape[M-1],
    // [batch / prod(block_shape)]
    // [input_shape[1],
    // ...
    // [input_shape[N-1]

    vector <size_t> a1_shape(%block.rank + %input.rank)

    for (int32 i = 0; i < %block.rank; i++) {
        a1_shape[i] = %block.shape[i]
    }

    a1_shape[%block.rank] = %input.shape.[0] / %block.num_elements

    for (int32 i = 1; i < %input.rank; i++) {
        a1_shape[i + %block.rank] = %input.shape[i]
    }

    // Step 2. Permute to shape:
    // [ batch / prod(block_shape) ],
    // [ input_shape[1] ], [ block_shape[0] ]
    //  ...
    // [ input_shape[M] ], [ block_shape[M-1]
    // + remaining_input_shapes input_shape[M+1 .. N-1]
    vector <size_t> a2_perm(%block.rank + %input.rank)

    a2_perm[0] = %block.rank
    for (int32 i = 0; i < %block.rank; i++) {
        a2_perm[1 + i * 2 + 0] = %block.rank + 1 + i
        a2_perm[1 + i * 2 + 1] = i
    }

    // Step 3. Reshape to
    // [ batch / prod(block_shape) ],
    // [input_shape[1] * block_shape[0] ],
    //    ..
    // [input_shape[M * block_shape[M-1],
    // + remaining input shapes [input_shape[M+1.. N-1]]
    vector <size_t> a3_shape(%input.rank)

    %a3_shape[0] = %input.shape[0] / %block.num_elements
    for (int32 i = 0; i < %block.rank; i++) {
        a3_shape[i + 1] = %input.shape[i + 1] * %block.shape[i]
    }

    for (int32 i = 0; remaining_block_shape; i++) {
        a3_shape[1 + %block.rank + 1] = %input.shape[%block.rank + 1 + i]
    }

    // Step 4 Crop the start/end dimensions using slice
    vector <size_t> a4_begin(%input.rank), a4_size(%input.rank)

    for (int32 i = 0; i < %input.rank; i++) {
        if (i == 0 || i > crop_dims) {
           a4_begin[i] = 0
           a4_size[i] = output_shape[i]
        } else {
          a4_begin[i] = %crops[i-1].first
          a4_size[i] = crops[i - 1].first - crops[i - 1].second
        }
    }

    %a1_reshape = tosa.RESHAPE(%input) {new_shape=a1_shape}
    %a2_transpose = tosa.TRANSPOSE(%a1_reshape) {perms=a2_perm}
    %a3_reshape = tosa.RESHAPE(%a2_transpose) {new_shape=a3_shape}
    %output = tosa.SLICE(%a3_reshape) {begin=a4_begin, size=a4_size}

    return %output
}

lower_concatv2_op()

Value lower_concatv2_op(Type output_type, Value %values, int32 axis)
{
    int32 tosa_axis = positive_axis(axis)

    assert(%values.size >= 2)

    // Convert scalar inputs to a tensor
    if (%values:0.size == 0) {
       for (int32 i = 0; i < %values.size; i++) {
          %values:i = tosa.RESHAPE(%values:i) {new_shape=1}
       }
    }

    for (int32 i=0; i < %values.size(); i++) {
        %val = %values:i
        if (%val.zp != output_type.zp || %val.scale != output_type.scale) {
            float64 rescale_scale = %val.scale / output_type.scale
            %values:i = tosa.RESCALE(%val) {scale=rescale_scale, input_zp=%values:0.zp, output_zp=output_type.zp}
        }
    }

    %concat_op = tosa.CONCAT(%values:0, %values:1) {axis=tosa_axis}

    for (int32 i = 2; i < %values.size; i++) {
        %concat_op = tosa.CONCAT(%concat_op, %values:i) {axis=tosa_axis}
    }

    return %concat_op
}

lower_depth_to_space_op()

Value lower_depth_to_space_op(Value %input, size_t block_size[], Format_t data_format)
{
    assert(data_format == 'NHWC')

    vector <size_t> a2_shape = {%input.shape[0],
                                %input.shape[1],
                                %input.shape[2],
                                block_size[0],
                                block_size[1],
                                %input.shape[3] / (block_size[0] * block_size[1])}

    vector <size_t> a4_shape = {%input.shape[0],
                                %input.shape[1] * block_size[0],
                                %input.shape[2] * block_size[1],
                                %input.shape[3] / (block_size[0] * block_size[1])}

    %a2_reshape = tosa.RESHAPE(%input) {new_shape=a2_shape}
    %a3_transpose = tosa.TRANSPOSE(%a2_reshape) {perms={0, 1, 3, 2, 4, 5}}
    %output = tosa.RESHAPE(%a3_transpose) {new_shape=a4_shape}

    return %output
}

lower_elu_op()

Value lower_elu_op(Value %value)
{
    // elu(x) = x < 0 ? (exp(x) - 1) : x
    // Create constants for 0/1 and reshape to match the rank
    // of %value
    %one_const = tosa.CONST() {value={1}}
    %zero_const = tosa.CONST() {value={0}}

    vector bcast_shape
    for (int32 i = 0; i < %value.rank; i++) {
        bcast_shape.push_back(1)
    }

    %one_reshape = tosa.RESHAPE(%one_const) {new_shape=bcast_shape}
    %zero_reshape = tosa.RESHAPE(%zero_const) {new_shape=bcast_shape}

    %exp_in = tosa.EXP(%value)
    %sub = tosa.SUB(%exp_in, %one_reshape)
    %ge  = tosa.GREATER_EQUAL(%value, %zero_reshape)
    %output = tosa.SELECT(%ge, %value, %sub)
    return %output
}

lower_expand_dims()

Value lower_expand_dims(Value %input, int32 axis)
{
    vector<size_t> reshape_dims

    if (axis < 0 || axis >= %input.rank) {
        // Insert at the end of the tensor
        axis += %input.rank
        for (int32 i = 0; i < input.rank; i++) {
           reshape_dims.push_back(%input.shape[i])
        }
    } else {
        for (int32 i= 0 ; i < %input.rank; i++) {
            if (i == axis) {
                reshape_dims.push_back(1)
            }
            reshape_dims.push_back(%input.shape[i])
        }
    }

    %output = tosa.RESHAPE(%input) {new_shape=reshape_dims}
    return %output
}

lower_fake_quant_op()

Value lower_fake_quant_op(Value %inputs, type output_type, float64 min, float64 max,
                            int64 num_bits, bool narrow_range)
{
    assert(num_bits == 8 || num_bits == 16)

    int64 qmax = (1L << (num_bits - 1)) - 1;
    int64 qmin = -(1L << (num_bits - 1))

    if (narrow_range) {
       qmin = qmin + 1
    }

    float64 scale = (max - min) / float64(qmax - qmin)

    int64 zeropoint = (int64)std::round((-min) / scale + float64(qmin))

    %quantized = lower_quantize_op(%inputs.type, %inputs, 1.0 / scale, zeropoint)

    %dequantized = lower_dequantize_op(output_type, %quantized_op, scale, zeropoint)

    return %dequantized
}

lower_floor_div()

Value lower_floor_div(Value %lhs, Value %rhs)
{
    %recip = tosa.RECIPROCAL(%rhs)
    %mul = tosa.MUL(%lhs, %recip)
    %output = tosa.FLOOR(%mul)

    return %output
}

lower_floor_mod()

Value lower_floor_mod(Value %lhs, Value %rhs)
{
    %recip = tosa.RECIPROCAL(%rhs)
    %mul = tosa.MUL(%lhs, %recip)
    %floor = tosa.FLOOR(%mul)
    %output = tosa.SUB(%mul, %floor)
    return %output
}

lower_quantize_op()

Value lower_quantize_op(Type output_type, Value %input, float64 scale, int64 zeropoint)
{
    %const_scale = tosa.CONST() {value={scale}}
    %const_zp = tosa.CONST() {value={zeropoint}}
    %op1_mul_in_scale = tosa.MUL(%input, %const_scale)
    %op2_add_op1_zp = tosa.ADD(%op1_mul_in_scale, %const_zp)
    %op3_cast_op2 = tosa.CAST(%op2_add_op1_zp) // f32->%output.dtype
}

lower_dequantize_op()

Value lower_dequantize_op(Value %input, float64 scale, int64 zeropoint)
{
    %const_scale = tosa.CONST() {value={scale}}
    %const_zp = tosa.CONST() {value={(float64)zeropoint}}
    %op1_cast_in = tosa.CAST(%input) // %input.dtype->f32
    %op2_sub_op1_zp = tosa.SUB(%op1_cast_in, %const_zp)
    %op3_mul_op2_scale = tosa.MUL(%op2_sub_op1_zp, %const_scale)
}

lower_log_softmax_op()

Value lower_log_softmax_op(Value %logits)
{
    %op1 = tosa.EXP(%logits)
    %op2 = tosa.REDUCE_SUM(%op1) {axis=(%logits.rank-1)}
    %op3 = tosa.RECIPROCAL(%op2)
    %op4 = tosa.MUL(%op1, %op3)
    %op5 = tosa.LOG(%op4)

    return %op5
}

lower_pack_op()

Value lower_pack_op(Value %input[], size_t axis)
{
    size_t concat_axis = positive_axis(axis)

    size_t input_tensor_rank = %input[0].rank

    // Convert any rank 0 to rank 1 with reshape
    if (input_tensor_rank == 0) {
       for (int32 i = 0; i < %input.size; i++) {
           %input[i] = tosa.RESHAPE(%input[i], {1})
       }
   }

   vector<size_t> output_shape
   for (int32 i = 0; i < input_tensor_rank; i++) {
       output_shape.push_back(%input[0].shape[i]
   }

   output_shape[concat_axis] = output_shape[concat_axis] * %input.size

   // First pair of tensors
   %concat = tosa.CONCAT(%input[0], %input[1]) {axis=concat_axis}

   // Remaining tensors
   for (int32 i = 2; i < %input.size; i++) {
      %concat = tosa.CONCAT(%concat, %input[i]) {axis=concat_axis}
   }

   if (input_tensor_rank == 0) {
      // No reshape needed for rank 0, already done
      %output = %concat
   } else

      %reshape = tosa.RESHAPE(%concat) {new_shape=output_shape}

      if (concat_axis == input_tensor_rank) {
         // Output shape is [A, B, C, .. n] in this case,
         // need to reshape to [N, A, B, C, ..] with perm [1, 2, 3, .. 0]
         concat_axis = 0

         vector <size_t> perms
         for (int32 i = 0; i < %input[0].rank; i++)
            perms.push_back(i + 1)
         perms.push_back(0)

         %output = tosa.TRANSPOSE(%reshape) {perms=perms}
     } else {
         %output = %reshape
     }

     return %output
}

lower_reduce_op()

Value lower_reduce_op<tosa_op_t OP>(Value %input, shape_t output_shape, Value %axes, bool keep_dims, float64 input_scale=1.0f, int32 input_zp=0, float64 output_scale=1.0f, int32 output_zp=0)
{

    vector axes_vec = %axes.as_constant();

    // Special case of no axes means no transformation
    if (axes_vec.size() == 0) {
       return tosa.IDENTITY(%input)
    }

    bool is_quantized = isa<QuantizedType>(%input.dtype) ? true : false

    shape_t shape = %input.shape;
    %output = %input;

    if (is_quantized) {
        %output = tosa.RESCALE(%output) {scale=input_scale, input_zp=input_zp, output_zp=0}
    }

    for (int32 i = 0; i < axes_vec.size(); i++) {
        int32 axis = positive_axis(axes_vec[i], %input.rank);

        shape[axis] = 1;
        %output = tosa.OP(%output) {axis=axis}
    }

    if (!keep_dims) {
       %output = tosa.RESHAPE(%output) {new_shape=output_shape}
    }

    if (is_quantized) {
        %output = tosa.RESCALE(%output) {scale=output_scale, input_zp=0, output_zp=output_zp}
    }

    return %output;
}

lower_resize_op()

Value lower_resize_op(Value %images, Value %size, shape output_shape, dtype output_dtype, mode_t mode)
{
    int32 input_height  = %input.shape[1]
    int32 input_width   = %input.shape[2]
    int32 output_height = %output.shape[1]
    int32 output_width  = %output.shape[2]

    float64 in_center_h  = static_cast<float64>(input_height - 1) / 2.0
    float64 in_center_w  = static_cast<float64>(input_width - 1) / 2.0
    float64 out_center_h = static_cast<float64>(output_height - 1) / 2.0
    float64 out_center_w = static_cast<float64>(output_width - 1) / 2.0

    float64 fp_stride_y, fp_stride_x
    if (align_corner && output_height > 1)
        fp_stride_y = static_cast<float64>(input_height - 1) / static_cast<float64>(output_height - 1)
    else
        fp_stride_y = static_cast<float64>(input_height) / static_cast<float64>(output_height)
    if (align_corner && output_width > 1)
        fp_stride_x = static_cast<float64>(input_width - 1) / static_cast<float64>(output_width - 1)
    else
        fp_stride_x = static_cast<float64>(input_width) / static_cast<float64>(output_width)

    float64 fp_offset_y = fp_offset_y = 0.0f
    if (half_pixel_centers) {
        fp_offset_y = fp_stride_y * 0.5f - 0.5f
        fp_offset_x = fp_stride_x * 0.5f - 0.5f
    }

    if (dtype == float)
        %op1_resize_in = tosa.RESIZE(%input) {stride={fp_stride_y, fp_stride_x}, offset={fp_offset_y, fp_offset_x}, shift=0, resize_mode=mode}
    else {
        int32 shift = 10
        float64 unit = static_cast<float64>(1 << shift)
        int32 stride_y = fp_stride_y * unit
        int32 stride_x = fp_stride_x * unit
        int32 offset_y = fp_offset_y * unit
        int32 offset_x = fp_offset_x * unit

        %op1_resize_in = tosa.RESIZE(%input) {stride={stride_y, stride_x}, offset={offset_y, offset_x}, shift=shift, resize_mode=mode}

        if (mode == "BILINEAR") {
            %const_zero = tosa.CONST() {value={0}}
            %const_twenty = tosa.CONST() {value={20}}
            %op2_ge_op1 = tosa.GREATER_EQUAL(%op1_resize_in, %const_zero)
            %op3_abs_op1 = tosa.ABS(%op1_resize_in)
            %op4_rshift_op3 = tosa.ARITHMETIC_RIGHT_SHIFT(%op3_abs_op1, %const_twenty)
            %op5_negate_op4 = tosa.NEGATE(%op4_rshift_op3)
            %op6_select_op2_op4_op5 = tosa.SELECT(%op2_ge_op1, %op4_rshift_op3, %op5_negate_op4)
            %op7_cast_op6 = tosa.CAST(%op6_select_op2_op4_op5) // i32/i48->%output.dtype
        }
    }
}

lower_reversev2_op()

Value lower_reverse_v2_op(Value %tensor, Value %axis)
{
    Value %output = %tensor

    if (%axis.num_elements == 0) {
       %output = tosa.IDENTITY(%tensor)
    } else {
        for (int32 i = 0; i < %axis.shape[0]; i++) {
            size_t axis_val = positive_axis(%axis.as_constant()[i])
            %output = tosa.REVERSE(%output) {axis=%axis_val}
        }
    }

    return %output
}

lower_round_op()

Value lower_round_op(Value %x)
{
    %half = tosa.CONST() {value={0.5}}
    %add = tosa.ADD(%x, %half)
    %output = tosa.FLOOR(%add)

    return %output
}

lower_selectv2_op()

Value lower_selectv2_op(Value %condition, Value %t, Value %e, shape output_shape)
{
    // Reshape condition so that ranks match to support
    // broadcasting (if necessary)

    if (%condition.rank != output_shape.size) {
       vector <size_t> cond_shape = %condition.shape
       for (int32 i = 0; i < (output_shape.size - %condition.rank); i++) {
           cond_shape.push_front(1)
       }

       %condition = tosa.RESHAPE(%condition) {new_shape=cond_shape}
    }

    %output = tosa.SELECT(%condition, %t, %e)

    return %output
}

lower_shape_op()

Value lower_shape_op(Value %input)
{
    vector <size_t> input_shape = %input.shape

    %shape = tosa.CONST() {value={input_shape}}
    return %shape
}

lower_space_to_batch_nd_op()

Value lower_space_to_batch_nd_op(Value %input, Value %block_shape, Value %padding)
{

    size_t block_rank = %block.shape[0]
    size_t remaining_shape_rank = %input.rank - block_rank - 1;

    // Step 1. Pad based on paddings operand (flattened representation of [input.rank][2]-shaped array)
    vector <size_t> a1_padding
    a1_padding[0] = 0
    a1_padding[1] = 0

    for (int32 i = 0; i < %padding.shape[0]; i++) {
        a1_padding[i + 2] = %padding.as_constant()[i]
    }

    %a1_pad = tosa.PAD(%input) {padding=a1_padding}

    // Step 2. Reshape to
    // [batch + padded_shape[1] / block_shape[0], block_shape[0], ...
    //    padded_shape[M] / block_shape[M-1], block_shape[M-1]] +
    //    remaining_shape

    vector <size_t> a2_shape(1 + block_rank * 2 + remaining_shape_rank)
    a2_shape[0] = %input.shape[0]
    for (int32 i = 0; i < block_rank; i++) {
        a2_shape[1 + i * 2 + 0] = %a1_pad.shape[1 + i] / block_shape.as_constant()[i]
        a2_shape[1 + i * 2 + 1] = block_shape.as_constant()[i]
    }

    for (int32 i = 0; i < remaining_shape_rank; i++) {
        a2_shape[1 + block_rank * 2 + i] = %input.shape[1 + block_rank + i]
    }

    %a2_reshape = tosa.RESHAPE(%a1_pad) {new_shape=a2_shape}

    // Step 3 transpose to
    //  block-shape +
    //  [batch] +
    //  [padded_shape[1] / block_shape[0],
    // ...
    //  [padded_shape[M] / block_shape[M-1]] +
    //  remaining_shape
    vector <size_t> a3_perm(%a2_reshape.rank)
    size_t block_num_elems = 1

    for (int32 i = 0; i < block_rank; i++) {
        a3_perm[i] = 1 + 2 * i + 1
        a3_perm[block_rank + 1 + i] = 2 * i + 1
        block_num_elems *= %block.as_constant()[i]
    }

    a3_perm[block_rank] = 0
    for (int32 i = (1 + block_rank * 2); i < %a2_reshape.rank; i++) {
        a3_perm[i] = i
    }

    %a3_reshape = tosa.RESHAPE(%a2_reshape) {perm=a3_perm}

    // Step 4. Reshape transposed tensor to
    // [ batch * prod(block_shape)] +
    // [ padded_shape[1] / block_shape[0],
    //   ...,
    // padded_shape[M] / block_shape[M-1]] +
    // remaining_shape

    vector <size_t> a4_shape(%input.rank)
    a4_shape[0] = batch_size * block_num_elements

    for (int32 i = 0; i < block_rank; i++) {
        a4_shape[i + 1] = %a1_pad.shape[i + 1] / %block.as_constant()[i]
    }

    for (int32 i = 0; i < remaining_block_shape; i++) {
        a4_shape[1 + block_rank + i] = %input.shape[1 + block_rank + i]
    }

    %output = tosa.RESHAPE(%a3_reshape) {new_shape=a4_shape}

    return %output
}

lower_space_to_depth_op()

Value lower_space_to_depth_op(Value %input, size_t block_size[], Format_t data_format)
{
    assert(data_format == 'NHWC')

    vector <size_t> a2_shape = {%input.shape[0],
                                %input.shape[1] / block_size[0],
                                %block_size[0],
                                %input_shape[2] / block_size[1],
                                %block_size[1],
                                %input_shape[3]}
    %a2_reshape = tosa.RESHAPE(%input) {new_shape=a2_shape}
    %a3_transpose = tosa.TRANSPOSE(%a2_reshape) {perm={0, 1, 3, 2, 4, 5}}

    vector <size_t> a4_shape = {%input.shape[0],
                                %input_shape[1] / block_size[0],
                                %input_shape[2] / block_size[1],
                                %input_shape[3] * block_size[0] * block_size[1]}
    %output = tosa.RESHAPE(%a3_transpose) {new_shape=%a4_shape}
    return %output
}

lower_split_op()

Value lower_split_op(Value %value, size_t axis, size_t num_split)
{
    Value %output[]

    size_t slice_size = %value.shape[axis] / num_split

    for (int32 i = 0; i < num_split; i++) {
        vector <size_t> begin_vals, size_vals

        for (int32 j = 0; j < %value.rank; j++) {
            if (j == axis) {
               begin_vals.push_back(slice_size * i)
               size_vals.push_back(slice_size)
            } else {
               begin_vals.push_back(0)
               size_vals.push_bac(%value.shape[j])
            }

            %output[i] = tosa.SLICE(%value) {start=begin_vals, size=size_vals}
        }

    }

    %output_list = tosa.IDENTITYN(%output)
    return %output_list
}

lower_splitv_op()

Value lower_splitv_op(Value %value, vector <size_t> size_split, size_t axis)
{
   Value %output[]

   size_t curr_split_start = 0

   for (int32 i = 0; i < size_split.size(); i++) {
       vector <size_t> begin_vals, size_vals

       for (int32 j = 0; j < %value.rank; j++) {
           if (j == axis) {
              begin_vals.push_back(curr_split_start)
              size_vals.push_back(size_split[i])
           } else {
              begin_vals.push_back(0)
              size_vals.push_back(input.shape[j])
           }
       }

       %output[i] = tosa.SLICE(%value) {start=begin_vals, size=size_vals}

       curr_split_start += size_split[i]
   }

    %output_list = tosa.IDENTITYN(%output)
    return %output_list
}

lower_squeeze_op()

Value lower_squeeze_op(Value %input, vector<size_t> squeeze_dims)
{
    vector <size_t> reshape_dims

    if (squeeze_dims.size() == 0) {
       // Remove all 1-dims
       for (int32 i = 0; i < %input.rank; i++) {
           if (%input.shape[i] != 1) {
              reshape_dims.push_back(%input_shape[i])
           }
       }
    } else {
      // Remove the specified dimensions
      for (int32 i = 0; i < %input.rank; i++) {
          if (!squeeze_dims.find(i) || %input.shape[i] != -1) {
              reshape_dims.push_back(%input_shape[i])
          }
      }
    }

    %output = tosa.RESHAPE(%input) {new_shape=reshape_dims}

    return %output
}

lower_strided_slice_op()

Value lower_strided_slice_op(Value %input, Value %begin_val, Value %end_val, Value %strides_val,
                               size_t begin_mask, size_t end_mask, size_t ellipsis_mask,
                               size_t new_axis_mask, size_t shrink_axis_mask)
{
    // Note: does not implement ellipsis_mask or reverse stride at this time
    assert(ellipsis_mask == 0)

    vector <size_t> begin(%begin_val.as_constant()), end(%end_val.as_constant()), strides(%strides_val.as_constant())
    vector <size_t> a1_start, a1_size, a2_shape, a3_start, a3_size, a4_shape

    for (int32 i = 0; i < %input.rank; i++) {
        if (begin_mask & (1 << i)) {
           begin[i] = 0
        }

        if (end_mask & (1 << i)) {
           end[i] = %input.shape[i]
        }

        // Wrap around index if begin and end are negative
        if (begin[i] < 0) {
           begin[i] += %input.shape[i]
        }

        if (end[i] < 0) {
           end[i] += %input.shape[i]
        }

        a1_start[i] = begin[i]
        a1_size[i] = end[i] - begin[i]

        a2_shape[i*2 + 0] = a1_size[i] / strides[i]
        a2_shape[i*2 + 1] = strides[i]

        a3_start[i*2 + 0] = 0
        a3_start[i*2 + 1] = 0

        if (shrink_axis_mask & (1 << i)) {
           a3_size[i*2 + 0] = 1
        } else {
           a3_size[i*2 + 0] = a1_size[i] / strides[i]
        }
        a3_size[i*2 + 1] = 1

        if (!(shrink_axis_mask & (1 << i))) {
           if (new_axis_mask & (1 << i)) {
              a4_shape.push_back(1)
           a4_shape.push_back((a1_size[i] / strides[i]))
        }
    }

    // Step 1: Slice the input array
    %a1_slice = tosa.SLICE(%input) {start=a1_start, size=a1_size}

    // Step 2: Reshape the sliced array: 2x as many dimensions as %input
    %a2_reshape = tosa.RESHAPE(%a1_slice) {new_shape=a2_shape}

    // Step 3: Take a slice of the [0] index along each of the strided dimensions (even dimensions)
    %a3_slice = tosa.SLICE(%a2_reshape) {start=a3_start, size=a3_size}

    // Step 4: Reshape the now-strided tensor back down to the desired number of dimensions
    %output = tosa.RESHAPE(%a3_slice) {new_shape=a4_shape}

    return %output
}

lower_unpack_op()

Value lower_unpack_op(Value %value, size_t axis, uint64_t num)
{
    axis = positive_axis(axis)

    Value %output_arr[]

    // Step 1: transpose 'axis' to left-most dimension, if necessary
    Value %transposed_value

    if (axis != 0) {
       vector <size_t> perms

       perms.push_back(axis)
       for (int32 i = 0; i < %input.rank; i++) {
           if (i != axis)
              perms.push_back(i)
       }

       %transposed_value = tosa.TRANSPOSE(%value) {perms=perms}

   } else {
      %transposed_value = %value
   }

   // Step 2: Slice [N, A, B, C] into [N] [A, B, C]
   for (int32 i = 0; i < %transposed_value.rank; i++) {
       vector <size_t> begin_vals, size_vals, shape_vals

       begin_vals.push_back(i)
       size_vals.push_back(1)

       for (int32 j = 1; j < %transposed_value.rank; j++) {
           begin_vals.push_back(0)
           size_vals.push_back(transposed_value.shape[j])
           shape_vals.push_back(transposed_value.shape[j])
       }

       %slice = %tosa.SLICE(%transposed_value) {begin=begin_vals, size=size_vals}
       %output_arr[i] = %tosa.RESHAPE(%slice) {new_shape=shape_vals} {begin=begin_vals, size=size_vals}
   }

   // Combine array of sliced tensors into a list of tensors
   %output = tosa.IDENTITYN(%output_arr)
   return %output
}

get_transpose_conv2d_padding_values_from_pad_type()

vector<int64> get_transpose_conv2d_padding_values_from_pad_type(tensorflow::Padding padding, tensorflow::TensorFormat data_format,
                                                         uint32 first_filter_spatial_dim, type input_type, type filter_type
                                                         vector strides, vector dilations)
{
    int64 pad_before, pad_after;
    vector<int64> computed_padding

    for (int32 i = 0; i < 2; i++) {
        int64 ifm_dim = GetTensorSpatialDimIndex(4, data_format, i);
        int64 ofm_dim = GetTensorSpatialDimIndex(4, data_format, i);
        int64 filter_dim = first_filter_spatial_dim + 1

        int64 ifm_size = input_shape[ifm_dim]
        int64 ofm_size = output_dims[ofm_dim]
        int64 filter_size = filter.shape[filter_dim]
        int64 dim_dilation = dilations[i]
        int64 dim_stride = strides[i]
        int32 effective_filter_size = (filter_size - 1) * dim_dilation + 1
        int32 total_padding = ((ifm_size - 1) * dim_stride + effective_filter_size - ofm_size)
        total_padding = total_padding > 0 ? total_padding : 0

        pad_before = total_padding / 2
        pad_after = total_padding - pad_before

        computed_padding.push_back(pad_before)
    }

    return computed_padding
}

lower_fused_activation()

Value lower_fused_activation(Value %input, string activation)
{
    bool is_quantized = isa<QuantizedType>(%input.dtype) ? true : false

    if (is_quantized) {
        if (activation == "NONE") {
            return %input
        }
        else if (activation == "RELU") {
            int32 quantized_0 = %input.zp
            int32 quantized_max = %input.storage_max
            return tosa.CLAMP(%input) {min_int=quantized_0, max_int=quantized_max}
        }
        else if (activation == "RELU6") {
            int32 quantized_0 = %input.zp
            int32 quantized_6 = %input.zp + (6.0 / %input.scale)
            return tosa.CLAMP(%input) {min_int=quantized_0, max_int=quantized_6}
        }
        else if (activation == "RELU_N1_TO_1") {
            int32 quantized_n1 = %input.zp + (-1.0 / %input.scale)
            int32 quantized_1 = %input.zp + (1.0 / %input.scale)
            return tosa.CLAMP(%input) {min_int=quantized_n1, max_int=quantized_1}
        }
    }
    else {
        if (activation == "NONE") {
            return %input
        }
        else if (activation == "RELU") {
            return tosa.RELUN(%input) {max_fp=numeric_limit<float32>::max()}
        }
        else if (activation == "RELU6") {
            return tosa.RELUN(%input) {max_fp=6.0}
        }
        else if (activation == "RELU_N1_TO_1") {
            return tosa.CLAMP(%input) {min_fp=-1.0, max_fp=1.0}
        }
        else if (activation == "TANH") {
            return tosa.TANH(%input)
        }
    }
}

get_table_const_tensor()

Value get_table_const_tensor(function func)
{
    array<int16, 513> table_array
    for (int32 i = -256; i <= 256; i++) {
        table_array[i] = func(i)
    }

    return tosa.CONST() {value=table_array}
}

lower_gather_op()

Value lower_gather_op(Value %params, Value %indices, int32 batch_dims, int32 axis)
{
    assert batch_dims <= %indices.rank
    assert axis >= batch_dims

    int32 N = W = K = C = 1

    for (int32 i = 0; i < batch_dims; i++) N *= %params.shape[i]
    for (int32 i = batch_dims; i < %indices.rank; i++) W *= %indices.shape[i]
    K = %params.shape[axis]
    for (int32 i = batch_dims; i < axis; i++) C *= %params.shape[i]
    for (int32 i = (axis + 1); i < %params.rank; i++) C *= %params.shape[i]

    vector<int32> params_idx_batch, params_idx_left, params_idx_indices, params_idx_right
    for (int32 i = 0; i < %params.rank; i++) {
        if (i < batch_dims && i < axis)
            params_idx_batch.push_back(i)
        else if (i < axis)
            params_idx_left.push_back(i)
        else if (i < (axis + 1))
            params_idx_indices.push_back(i)
        else
            params_idx_right.push_back(i)
    }

    vector<int32> params_perm = {params_idx_batch, params_idx_left, params_idx_indices, params_idx_right}
    vector<int32> result_perm
    for (int32 i = 0; i < batch_dims; i++)
        result_perm.push_back(i)
    for (int32 i = 0; i < params_idx_left.size(); i++)
        result_perm.push_back(params_idx_left[i])
    for (int32 i = batch_dims; i < %indices.rank; i++)
        result_perm.push_back(i)
    for (int32 i = 0; i < params_idx_right.size(); i++)
        result_perm.push_back(params_idx_right[i])

    %const_params_perm = tosa.CONST() {value=params_perm}
    %const_result_perm = tosa.CONST() {value=result_perm}

    %op1_transpose_params = tosa.TRANSPOSE(%params, %const_params_perm)
    %op2_reshape_op1 = tosa.RESHAPE(%op1_transpose_params) {shape={N,K,C}}
    %op3_reshape_indices = tosa.RESHAPE(%indices) {shape={N,W}}
    %op4_gather_op2_op3 = tosa.GATHER(%op2_reshape_op1, %op3_reshape_indices)
    %op5_reshape_op4 = tosa.RESHAPE(%op4_gather_op2_op3) {shape={N,W,C}}
    %op6_transpose_op5 = tosa.TRANSPOSE(%op5_reshape_op4, %const_result_perm)
}

lower_gather_nd_op()

Value lower_gather_nd_op(Value %params, Value %indices)
{
    int32 N = W = K = C = ND = 1

    ND = %indices.shape[%indices.rank - 1]

    assert ND < %params.rank

    for (int32 i = 0; i < (%indices.rank - 1); i++) W *= %indices.shape[i]
    for (int32 i = 0; i < ND; i++) K = %params.shape[i]
    for (int32 i = ND; i < %params.rank; i++) C *= %params.shape[i]

    vector<int32> flatten_coeff_vec
    for (int32 i = 0; i < ND; i++) flatten_coeff_vec.push_back(i)
    flatten_coeff_vec.push_back(1)

    %const_flatten_coeff = tosa.CONST() {value=flatten_coeff_vec}
    %op1_reshape_params = tosa.RESHAPE(%params) {shape={N,K,C}}
    %op2_reshape_indices = tosa.RESHAPE(%indices) {shape={W,ND}}
    %op3_mul_op2_flatten_coeff = tosa.MUL(%op2_reshape_indices, %const_flatten_coeff)
    %op4_rsum_op3 = tosa.REDUCE_SUM(%op3_mul_op2_flatten_coeff) {axis=1}
    %op5_reshape_op4 = tosa.RESHAPE(%op4_rsum_op3) {shape={N,W}}
    %op6_gather_op1_op5 = tosa.GATHER(%op1_reshape_params, %op5_reshape_op4)
    %op7_reshape_op6 = tosa.RESHAPE(%op6_gather_op1_op5) {shape={N,W,C}}
}

lower_one_hot_op()

Value lower_one_hot_op(Value %indices, Value %depth, Value %on_value, Value %off_value, int32 axis)
{
    int32 N = W = C = 1
    int32 K = %depth.as_constant()
    int32 left_dim = right_dim = 1
    for(int32 i : %indices.rank) {
        int32 dim = %indices.shape[i]
        N *= dim
        if (i >= axis)
            right_dim *= dim
        else
            left_dim *= dim
    }

    %perm_const = tosa.CONST() {value={0, 2, 1}}
    %op1_reshape_on_value = tosa.RESHAPE(%on_value) {shape={1, 1, 1}}
    %op2_tile_op1 = tosa.TILE(%op1_reshape_on_value) {multiples={N, W, C}}
    %op3_reshape_off_value = tosa.RESHAPE(%off_value) {shape={1, 1, 1}}
    %op4_tile_op1 = tosa.TILE(%op3_reshape_off_value) {multiples={N, K, C}}
    %op5_reshape_indices = tosa.RESHAPE(%indices) {shape={N, W}}
    %op6_scatter_op4_op5_op2 = tosa.SCATTER(%op4_tile_op1, %op5_reshape_indices, %op2_tile_op1)
    %op7_reshape_op6 = tosa.RESHAPE(%op6_scatter_op4_op5_op2) {shape={left_dim, right_dim, K}}
    %op8_transpose_op7 = tosa.TRANSPOSE(%op7_reshape_op6, %perm_const)
    %op9_reshape_op8 = tosa.RESHAPE(%op8_transpose_op7) {shape=%output.shape}
}


## MLIR Passes Management

Legalization is built on multiple MLIR passes.

| MLIR Pass Name            | Input Dialect | Output Dialect | Description     |
| ------------------------- | ------------- | -------------- | --------------- |
| legalize_tf               | TensorFlow    | TOSA           | Legalize        |
:                           :               :                : TensorFlow      :
:                           :               :                : dialect to TOSA :
:                           :               :                : dialect         :
| fuse_tf_bias              | TensorFlow    | TOSA           | Mapping         |
:                           :               :                : tf.BiasAdd +    :
:                           :               :                : tf.Conv2D to    :
:                           :               :                : tosa.CONV2D     :
| legalize_tfl              | TensorFlow    | TOSA           | Legalize        |
:                           : Lite          :                : TensorFlow Lite :
:                           :               :                : dialect to TOSA :
:                           :               :                : dialect         :
| convert_tfl_uint8         | TensorFlow    | TensorFlow     | Convert         |
:                           : Lite          : Lite           : quantized uint8 :
:                           :               :                : graph to int8   :
:                           :               :                : graph           :

TF to TOSA legalization could be summarized by following pseudocode:

void legalize_tf_to_tosa(mlir::Module module) { mlir::PassManager pm

// other MLIR passes to optimize TF

pm.addPass(fuse_tf_bias)
pm.addPass(legalize_tf)

// other MLIR passes to optimize TOSA

} ```

TFLite to TOSA legalization could be summarized by following pseudocode:

void legalize_tfl_to_tosa(mlir::Module module)
{
    mlir::PassManager pm

    // other MLIR passes to optimize TFLite

    pm.addPass(convert_tfl_uint8)
    pm.addPass(legalize_tfl)

    // other MLIR passes to optimize TOSA
}

Each of the passes is described in more detail in the subsequent chapters.

TensorFlow MLIR Dialect Legalization (legalize_tf)

tf.Abs

This operator is trivially lowered to tosa.ABS

tf.AddN

TensorFlow Dialect

%output = tf.AddN(%inputs)

TOSA Lowering

%output = tosa.ADD(%inputs:0, %inputs:1)
for (int32 i = 2; i < %inputs.size; i++) {
    %output = tosa.ADD(%inputs:i, %output)
}

tf.Add

Element-wise addition.

TensorFlow Dialect

%output = tf.Add(%x, %y)

TOSA Lowering This operator is trivially lowered to tosa.ADD.

tf.Addv2

Element-wise addition.

TensorFlow Dialect

%output = tf.Addv2(%x, %y)

TOSA Lowering This operator is trivially lowered to tosa.ADD.

tf.All

Computes the "logical and" of elements across dimensions of a tensor.

TensorFlow Dialect

%output = tf.all(%input, %reduction_indices) {keep_dims}

TOSA Lowering

%output = lower_reduce_op<tosa.REDUCE_ALL>(%input, %output.shape, %reduction_indices, keep_dims)

tf.Any

Computes the "logical or" of elements across dimensions of a tensor.

TensorFlow Dialect

%output = tf.any(%input, %reduction_indices) {keep_dims}

TOSA Lowering

%output = lower_reduce_op<tosa.REDUCE_ANY>(%input, %output.shape, %reduction_indices, keep_dims)

tf.ArgMax

Returns the index with the largest value across the given axis of the input tensor.

TensorFlow Dialect

%output = tf.ArgMax(%input, %dimension)

TOSA Lowering

int64 axis = positive_axis(%dimension)
%output = tosa.ARGMAX(%input) {axis=axis}

tf.ArgMin

Returns the index with the smallest value across the given axis of the input tensor.

TensorFlow Dialect

%output = tf.ArgMin(%input, %dimension)

TOSA Lowering

No TOSA lowering defined.

tf.Assert

Asserts that the given condition is true.

TensorFlow Dialect

%output = tf.Assert(%condition, %summarize)

TOSA Lowering

No TOSA lowering defined.

tf.AssignAddVariableOp

Adds a value to the current value of a variable.

TensorFlow Dialect

%output = tf.AssignAddVariableOp(%resource, %value, %dtype)

TOSA Lowering

No TOSA lowering defined.

tf.AssignSubVariableOp

Subtracts a value to the current value of a variable.

TensorFlow Dialect

%output = tf.AssignSubVariableOp(%resource, %value, %dtype)

TOSA Lowering

No TOSA lowering defined.

tf.AssignVariableOp

Assigns a new value to a variable.

TensorFlow Dialect

%output = tf.AssignVariableOp(%resource, %value, %dtype)

TOSA Lowering

No TOSA lowering defined.

tf.AvgPool

Performs average pooling on the input.

TensorFlow Dialect

%output = tf.AvgPool(%value) {ksize, strides, padding, data_format}

TOSA Lowering

assert(data_format == "NHWC")

tosa_padding =
     get_padding_values_from_pad_type(%input, ksize, padding, data_format,
                                      FORMAT_OHWI, strides, {1, 1, 1, 1})
%output = tosa.AVG_POOL2D(%value) {ksize=ksize, strides=strides, padding=tosa_padding}

tf.BatchMatMul

Multiplies slices of two tensors in batches.

TensorFlow Dialect

%output = tf.BatchMatMul(%x, %y, %adj_x, %adj_y)

TOSA Lowering

No TOSA lowering defined.

tf.BatchMatMulV2

Multiplies slices of two tensors in batches.

TensorFlow Dialect

%output = tf.BatchMatMulV2(%x, %y, %adj_x, %adj_y)

TOSA Lowering

No TOSA lowering defined.

tf.BatchNormWithGlobalNormalization

✗ Deprecated operator.

tf.BatchToSpaceND

BatchToSpaceND for N-D tensors of type T.

TensorFlow Dialect

%output = tf.BatchToSpaceND(%input, %block_shape, %crops)

TOSA Lowering

%output = lower_batch_to_space_nd_op(%input, %block_shape, %crops, output.shape)

tf.BiasAddGrad

Training profile: TOSA lowering not yet defined.

tf.BiasAdd

Add bias to value.

TensorFlow Dialect

%output = tf.BiasAdd(%bias, %value) {data_format}

TOSA Lowering

assert(data_format == 'NHWC')
%output = tosa.ADD(%value, %bias)

tf.BitCast

Bitcasts a tensor from one type to another without copying data.

TensorFlow Dialect

%output = tf.BitCast(%input, %dtype)

TOSA Lowering

No TOSA lowering defined.

tf.BitwiseAnd

This operator is trivially lowered to tosa.BITWISE_AND.

tf.BitwiseOr

This operator is trivially lowered to tosa.BITWISE_OR.

tf.BroadcastGradientArgs

Training profile: TOSA lowering not yet defined.

tf.BroadcastTo

No TOSA lowering defined.

tf.Cast

This operator is trivially lowered to tosa.CAST.

tf.Ceil

This operator is trivially lowered to tosa.CEIL.

tf.CheckNumerics

No TOSA lowering defined.

tf.ComplexAbs

No TOSA lowering defined.

tf.Complex

No TOSA lowering defined.

tf.ConcatOffset

No TOSA lowering defined. Training profile: TOSA lowering not yet defined.

tf.Concat

No TOSA lowering defined.

tf.ConcatV2

Concatenates tensors along one dimension.

TensorFlow Dialect

%output = tf.ConcatV2(%values, %axis)

TOSA Lowering

%output = lower_concatv2_op(%values, %axis)

tf.Conj

No TOSA lowering defined.

tf.Const

This operator is trivially lowered to tosa.CONST.

tf.Conv2DBackpropFilter

No TOSA lowering defined.

tf.Conv2DBackpropInput

Computes the gradients of convolution with respect to the input.

TensorFlow Dialect

%output = tf.Conv2DBackpropInput(%input_sizes, %filter, %out_backprop) {strides, use_cudnn_on_gpu, padding, explicit_paddings, data_format, dilations}

TOSA Lowering

// Transpose filter from HWIO to OHWI
%tosa_filter = tosa.TRANSPOSE(%filter) {perms={2, 0, 1, 3}}

vector output_shape

for (int32 i = 0; i < input_sizes.size(); i++) {
   output_shape.push_back(input_size[i])
}

if (%padding == "EXPLICIT") {
   tosa_padding =
       get_padding_values_from_explicit_pad_attr(explicit_padding, data_format)
} else {
    tosa_padding =
        get_transpose_conv2d_padding_values_from_pad_type(%input_sizes, %filter, output_shape, padding, data_format, FORMAT_HWIO, strides, dilations)
}

// Create a zero bias tensor
%zero_bias = tosa.CONST() {value={0}}
%output = tosa.TRANSPOSE_CONV2D(%out_backprop) {weight=%tosa_filter, bias=%zero_bias, outpad=tosa_pading, stride=strides, dilation==dilations, out_shape=out_shape}

tf.Conv2D

Computes a 2-D convolution given 4-D input and filter tensors.

TensorFlow Dialect

%output = tf.Conv2D(%input, %filter) {strides, padding, explicit_paddings, data_format, dilations}

TOSA Lowering

assert(data_format == "NHWC")

// Transpose filter from HWIO to OHWI
%filter_tranpose = tosa.TRANSPOSE(%filter {perms={3, 0, 1, 2}}

if (padding == "EXPLICIT") {
   tosa_padding =
       get_padding_values_from_explicit_pad_attr(explicit_padding, data_format)
} else {
    %tosa_padding =
        get_padding_values_from_pad_type(%input, %filter.shape, padding, data_format,
                                         FORMAT_HWIO, strides, dilations)
}

// Create a zero bias tensor
%zero_bias = tosa.CONST() {value={0}}

%output = tosa.CONV2D(%input, %filter_transpose, %zero_bias) {padding=tosa_padding, stride=strides, dilation=dilations}

tf.Conv3D

TOSA lowering to tosa.CONV3D to be defined.

tf.Cos

No TOSA lowering defined.

tf.CrossReplicaSum

No TOSA lowering defined.

tf.DepthToSpace

DepthToSpace for tensors of type T.

TensorFlow Dialect

%output = tf.DepthToSpace(%input) {block_size, data_format}

TOSA Lowering

%output = lower_depth_to_space_op(%input, block_size, data_format)

tf.DepthwiseConv2dNative

Computes a 2-D depthwise convolution given 4-D input and filter tensors.

TensorFlow Dialect

%output = tf.DepthwiseConv2dNative(%input, %filter) {strides, padding, data_format, dilations}

TOSA Lowering

if (padding == "EXPLICIT") {
   tosa_padding =
       get_padding_values_from_explicit_pad_attr(explicit_padding, data_format)
} else {
    tosa_padding =
        get_padding_values_from_pad_type(%input, %filter.shape, padding, data_format,
                                         FORMAT_HWIO, strides, dilations)
}

bias_dim = %filter.shape[2] * %filter.shape[3]

// Create a zero-bias tensor
%zero_bias = tosa.CONST() {value={0} * bias_dim}

%output = tosa.DEPTHWISE_CONV2D(%input, %filter, %zero_bias) {stride=strides, dilation=dilations, padding=padding}

tf.DivNoNan

No TOSA lowering defined.

tf.Div

No TOSA lowering defined.

tf.DynamicStitch

No TOSA lowering defined.

tf.Einsum

No TOSA lowering defined.

tf.Elu

Computes exponential linear: exp(features) - 1 if <0, features otherwise

TensorFlow Dialect

%output = tf.Elu(%features)

TOSA Lowering

%output = lower_elu_op(%features)

tf.EmptyTensorList

No TOSA lowering defined.

tf.Equal

Returns the truth value of (x == y) element-wise with broadcasting.

TensorFlow Dialect

%output = tf.Equal(%x, %y)

TOSA Lowering This operator is trivially lowered to tosa.EQUAL.

tf.Exp

This operator is trivially lowered to tosa.EXP.

tf.ExpandDims

Inserts a dimension of 1 into a tensor’s shape

TensorFlow Dialect

%output = tf.ExpandDims(%input, %axis)

TOSA Lowering

%output = lower_expand_dims(%input, %axis.to_constant())

tf.FakeQuantWithMinMaxArgs

Fake-quantize the 'inputs' tensor, type float to 'outputs' tensor of same type.

TensorFlow Dialect

%output = tf.FakeQuantWithMinMaxArgs(%inputs) {min, max, num_bits, narrow_range}

TOSA Lowering

%output = lower_fake_quant_op(%inputs, %min, %max, %num_bits, %narrow_range)

tf.FakeQuantWithMinMaxVars

Fake-quantize the 'inputs' tensor of type float via global flats scalars min.

TensorFlow Dialect

%output = tf.FakeQuantWithMinMaxVars(%inputs, %min, %max) {num_bits, narrow_range}

TOSA Lowering

%output = lower_fake_quant_op(%inputs, %output.type, %min.to_constant(), %max.to_constant(), num_bits, narrow_range)

tf.FakeQuantWithMinMaxVarsPerChannel

Fake-quantize the 'inputs' tensor of type float and one of the shapes [d].

TensorFlow Dialect

%output = tf.FakeQuantWithMinMaxVarsPerChannel(%inputs, %min, %max) {num_bits, narrow_range}

No TOSA lowering defined.

tf.Fill

Creates a tensor filled with a scalar value

TensorFlow Dialect

%output = tf.Fill(%dims, %value)

TOSA Lowering

int64 total_size = 1

for (int32 i = 0; i < %dims.shape[0]; i++) {
    total_size *= %dims[i]
}

vector<%value.dtype> fill_arr(total_size, %value)

%output = tosa.CONST() {value={fill_arr}}

tf.FloorDiv

Returns x // y element-wise.

TensorFlow Dialect

%output = tf.FloorDiv(%x, %y)

TOSA Lowering

%output = lower_floor_div(%lhs, %rhs)

tf.FloorMod

Returns element-wise remainder of division when x < 0 xor x < y is true.

TensorFlow Dialect

%output = tf.FloorMod(%x, %y)

TOSA Lowering

%output = lower_floor_mod(%lhs, %rhs)

tf.Floor

This operator is trivially lowered to tosa.FLOOR.

tf.FusedBatchNormGrad

Training profile: TOSA lowering not yet defined.

tf.FusedBatchNormGradV2

Training profile: TOSA lowering not yet defined.

tf.FusedBatchNormGradV3

Training profile: TOSA lowering not yet defined.

tf.FusedBatchNorm

Batch normalization.

TensorFlow Dialect

%output = tf.FusedBatchNorm(%x, %scale, %offset, %mean, %variance) {epsilon, data_format, is_training}


assert(data_format == 'NHWC')
assert(is_training == false)

%epsilon_const = tosa.CONST() {value={epsilon}}

%op1 = tosa.SUB(%x, %bmean)
%op2 = tosa.ADD(%variance, %epsilon_const)
%op3 = tosa.RSQRT(%op2)
%op4 = tosa.MUL(%op1, %op3)
%op5 = tosa.MUL(%op4, %scale)
%output = tosa.ADD(%op5, %offset)

tf.FusedBatchNormV3

Batch normalization.

TensorFlow Dialect

%output = tf.FusedBatchNormV3(%x, %scale, %offset, %mean, %variance) {epsilon, data_format, is_training}

TOSA Lowering

assert(data_format == 'NHWC')
assert(is_training == false)

%epsilon_const = tosa.CONST() {value={epsilon}}

%op1 = tosa.SUB(%x, %bmean)
%op2 = tosa.ADD(%variance, %epsilon_const)
%op3 = tosa.RSQRT(%op2)
%op4 = tosa.MUL(%mean, %op3)
%op5 = tosa.MUL(%op4, %scale)
%output = tosa.ADD(%op5, %offset)

tf.GatherNd

Gather slices from params into a Tensor with shape specified by indices.

TensorFlow Dialect

%output = tf.GatherNd(%params, %indices)

TOSA Lowering

%output = lower_gather_nd_op(%params, %indices)

tf.Gather

Gathers slices from params according to indices.

TensorFlow Dialect

%output = tf.Gather(%params, %indices)

TOSA Lowering

%output = lower_gather_op(%params, %indices, 0, 0)

tf.GatherV2

Gathers slices from params axis according to indices.

TensorFlow Dialect

%output = tf.GatherV2(%params, %indices, %axis) {batch_dims}

TOSA Lowering

%output = lower_gather_op(%params, %indices, batch_dims, %axis.to_constant())

tf.GreaterEqual

Returns the truth value of (x >= y) element-wise with broadcasting.

TensorFlow Dialect

%output = tf.GreaterEqual(%x, %y)

TOSA Lowering This operator is trivially lowered to tosa.GREATER_EQUAL.

tf.Greater

RetruReturns the truth value of (x > y) element-wise with broadcasting.

TensorFlow Dialect

%output = tf.Greater(%x, %y)

TOSA Lowering This operator is trivially lowered to tosa.GREATER.

tf.HashTableV2

No TOSA lowering defined.

tf.IdentityN

Returns a list of tensors with the same shapes and contents as the input.

TensorFlow Dialect

%output = tf.IdentityN(%input)

TOSA Lowering

%output = tosa.IDENTITYN(%input)

tf.Identity

Returns a tensor with the same shape and contents as the input.

TensorFlow Dialect

%output = tf.Identity(%input)

TOSA Lowering

%output = tosa.IDENTITY(%input)

tf.If

No TOSA lowering defined.

tf.Imag

No TOSA lowering defined.

tf.InfeedDequeueTuple

No TOSA lowering defined.

tf.Invert

This operator is trivially lowered to tosa.BITWISE_NOT.

tf.InvertPermutation

No TOSA lowering defined.

tf.IsFinite

No TOSA lowering defined.

tf.IteratorGetNext

No TOSA lowering defined.

tf.L2Loss

Training profile: TOSA lowering not yet defined.

tf.LRN

No TOSA lowering defined.

tf.LeakyRelu

Computes rectified linear: max(features, features * alpha).

TensorFlow Dialect

%output = tf.LeakyRelu(%features) {alpha}

TOSA Lowering

%alpha_tensor = tosa.CONST() {value={alpha}}
%features_alpha = tosa.MUL(%features, %alpha_tensor)
%greater = tosa.GREATER(%features, %features_alpha)
%output = tosa.SELECT(%greater, %features, %features_alpha)

tf.LeftShift

Computes the bitwise left-shift of x by y bits, element-wise.

TensorFlow Dialect

%output = tf.LeftShift(%x, %y)

TOSA Lowering This operator is trivially lowered to tosa.LOGICAL_LEFT_SHIFT.

tf.LegacyCall

No TOSA lowering defined.

tf.LessEqual

Returns the truth value of (x ⇐ y) element-wise with broadcasting.

TensorFlow Dialect

%output = tf.LessEqual(%x, %y)

TOSA Lowering

%output_greater = tosa.GREATER(%x, %y)
%output = tosa.LOGICAL_NOT(%output_greater)

tf.Less

Returns the truth value of (x < y) element-wise with broadcasting.

TensorFlow Dialect

%output = tf.LessEqual(%x, %y)

TOSA Lowering

%output_greater_equal = tosa.GREATER_EQUAL(%x, %y)
%output = tosa.LOGICAL_NOT(%output_greater_equal)

tf.LiNSpace

No TOSA lowering defined.

tf.Log1p

No TOSA lowering defined.

tf.Log

This operator is trivially lowered to tosa.LOG.

tf.LogSoftmax

Computes log softmax activations.

TensorFlow Dialect

%output = tf.LogSoftmax(%logits)

TOSA Lowering

%output = lower_log_softmax_op(%logits)

tf.LogicalAnd

Returns the truth value of x AND y, element-wise.

TensorFlow Dialect

%output = tf.LogicalAnd(%x, %y)

TOSA Lowering This operator is trivially lowered to tosa.LOGICAL_AND.

tf.LogicalNot

This operator is trivially lowered to tosa.LOGICAL_NOT.

tf.LogicalOr

Returns the truth value of x OR y, element-wise.

TensorFlow Dialect

%output = tf.LogicalOr(%x, %y)

TOSA Lowering This operator is trivially lowered to tosa.LOGICAL_OR.

tf.LookupTableFindV2

No TOSA lowering defined.

tf.LookupTableInputV2

No TOSA lowering defined.

tf.LookupTableSizeV2

No TOSA lowering defined.

tf.MatMul

Multiply the matrix a by the matrix b

TensorFlow Dialect

%output = tf.MatMul(%a, %b)

TOSA Lowering

%output = tosa.MATMUL(%a, %b)

tf.MatrixDiag

No TOSA lowering defined.

tf.MatrixDiagV2

No TOSA lowering defined.

tf.MatrixDiagV3

No TOSA lowering defined.

tf.MatrixSetDiag

No TOSA lowering defined.

tf.MatrixSetDiagV2

No TOSA lowering defined.

tf.MatrixSetDiagV3

No TOSA lowering defined.

tf.Max

Computes the maximum of elements across dimensions of a tensor.

TensorFlow Dialect

%output = tf.Max(%input, %reduction_indices) {keep_dims}

TOSA Lowering

%output = lower_reduce_op<tosa.REDUCE_MAX>(%input, %output.shape, %reduction_indices, keep_dims)

tf.MaxPoolGrad

Training profile: TOSA lowering not yet defined.

tf.MaxPool

Performs max pooling on the input.

TensorFlow Dialect

%output = tf.MaxPool(%input) {ksize, strides, padding, data_format}

TOSA Lowering

assert(data_format == "NHWC")

tosa_padding =
     get_padding_values_from_pad_type(%input, ksize, padding, data_format,
                                      FORMAT_OHWI, strides, {1, 1, 1, 1})
%output = tosa.MAX_POOL2D(%value) {ksize=ksize, strides=strides, padding=tosa_padding}

tf.Maximum

This operator is trivially lowered to tosa.MAXIMUM.

tf.Mean

Computes the mean of elements across dimensions of a tensor.

TensorFlow Dialect

%output = tf.Mean(%input, %reduction_indices) {keep_dims}

TOSA Lowering

int32 num_elements_on_axis = 1
for (int32 axis : %reduction_indices) {
    num_elements_on_axis *= %input.shape[axis]
}
float32 div_scale = 1.0 / num_elements_on_axis

%cst_div_scale = tosa.CONST() {value={div_scale}}
%op1_rsum_in = lower_reduce_op<tosa.REDUCE_SUM>(%input, %output.shape, %reduction_indices, keep_dims)
%op2_mul_op1 = tosa.MUL(%op1_rsum_in, %cst_div_scale)

tf.Min

Computes the minimum of elements across dimensions of a tensor.

TensorFlow Dialect

%output = tf.Min(%input, %reduction_indices) {keep_dims}

TOSA Lowering

%output = lower_reduce_op<tosa.REDUCE_MIN>(%input, %output.shape, %reduction_indices, keep_dims)

tf.Minimum

This operator is trivially lowered to tosa.MAXIMUM.

tf.MirrorPad

No TOSA lowering defined.

tf.MlirPassthroughOp

No TOSA lowering defined.

tf.MulNoNan

No TOSA lowering defined.

tf.Mul

Returns the product of x and y, element-wise.

TensorFlow Dialect

%output = tf.Mul(%x, %y)

TOSA Lowering This operator is trivially lowered to tosa.MUL.

tf.Neg

This operator is trivially lowered to tosa.NEGATE.

tf.NoOp

No TOSA lowering defined.

tf.NonMaxSuppressionV4

No TOSA lowering defined.

tf.NonMaxSuppressionV5

No TOSA lowering defined.

tf.NotEqual

Returns the truth value of (x != y) element-wise with broadcasting.

TensorFlow Dialect

%output = tf.NotEqual(%x, %y)

TOSA Lowering

%equal = tosa.EQUAL(%x, %y)
%output = tosa.NOT(%equal)

tf.OneHot

OneHot operator.

TensorFlow Lite Dialect

%output = tf.OneHot(%indices, %depth, %on_value, %off_value) {axis}

TOSA Lowering

%output = lower_one_hot_op(%indices, %depth, %on_value, %off_value, axis)

tf.OutputEnqueueTuple

No TOSA lowering defined.

tf.Pack

Packs a list of N rank-R tensors into one rank-(R+1) tensor.

TensorFlow Dialect

%output = tf.Pack(%values) {axis}

TOSA Lowering

%output = lower_pack_op(%values, axis)

tf.Pad

This operator is trivially lowered to tosa.PAD.

tf.PadV2

No TOSA lowering defined.

tf.ParseExampleV2

No TOSA lowering defined.

tf.PartitionedCall

No TOSA lowering defined.

tf.Placeholder

Not seen in practice. No lowering needed.

tf.PlaceholderWithDefault

Not seen in practice. No lowering needed.

tf.Pow

This operator is trivially lowered to tosa.POW.

tf.PreventGradient

Training profile: TOSA lowering not yet defined.

tf.Prod

Computes the product of elements across dimensions of a tensor.

TensorFlow Dialect

%output = tf.Prod(%input, %reduction_indices) {keep_dims}

TOSA Lowering

%output = lower_reduce_op<tosa.REDUCE_PRODUCT>(%input, %output.shape, %reduction_indices, keep_dims)

tf.QuantizeAndDequantize

No TOSA lowering defined.

tf.QuantizeAndDequantizeV2

No TOSA lowering defined.

tf.QuantizeAndDequantizeV3

No TOSA lowering defined.

tf.RFFT

No TOSA lowering defined.

tf.RandomShuffle

No TOSA lowering defined.

tf.RandomStandardNormal

No TOSA lowering defined.

tf.RandomUniform

No TOSA lowering defined.

tf.Range

No TOSA lowering defined.

tf.Rank

Returns the rank of the tensor.

TensorFlow Dialect

%output = tf.Rank(%input)

TOSA Lowering

%output = tosa.CONST() {value={%input.rank}}

tf.ReadVariableOp

No TOSA lowering defined.

tf.RealDiv

Returns x / y element-wise for real types.

TensorFlow Dialect

%output = tf.RealDiv(%x, %y)

TOSA Lowering

%recip = tosa.RECIPROCAL(%y)
%output = tosa.MUL(%x, %recip)

tf.Real

No TOSA lowering defined.

tf.Reciprocal

This operator is trivially lowered to tosa.RECIPROCAL.

tf.Relu6

Computes rectified linear 6: min(max(features, 0), 6).

TensorFlow Dialect

%output = tf.Relu6(%features)

TOSA Lowering

%output = tosa.RELUN(%features) {max_val=6}

tf.ReluGrad

Training profile: TOSA lowering not yet defined.

tf.Relu

Computes rectified linear 6: max(features, 0)

TensorFlow Dialect

%output = tf.Relu(%features)

TOSA Lowering

%output = tosa.RELUN(%features) {max_val=0}

tf.Reshape

Reshapes a tensor.

TensorFlow Dialect

%output = tf.Reshape(%tensor, %shape)

TOSA Lowering

%output = tosa.RESHAPE(%tensor) {new_shape=%shape.as_constant}

tf.ResizeBilinear

Resizes images to size using bilinear interpolation.

TensorFlow Dialect

%output = tf.ResizeBilinear(%images, %size) {align_corners, half_pixel_centers}

inferred from output shape. TOSA Lowering

%output = lower_resize_op(%images, %size, float, "BILINEAR")

tf.ResizeNearestNeighbor

Resizes images to size using nearest neighbor interpolation.

TensorFlow Dialect

%output = tf.ResizeNearestNeighbor(%images, %size) {align_corners, half_pixel_centers}

inferred from output shape. TOSA Lowering

%output = lower_resize_op(%images, %size, %output, float, "NEAREST_NEIGHBOR")

tf.ResourceApplyAdam

Training profile: TOSA lowering not yet defined.

tf.ResourceApplyGradientDescent

Training profile: TOSA lowering not yet defined.

tf.ResourceApplyKerasMomentum

Training profile: TOSA lowering not yet defined.

tf.ResourceGather

Training profile: TOSA lowering not yet defined.

tf.ResourceScatterUpdate

Training profile: TOSA lowering not yet defined.

tf.ReverseSequence

No TOSA lowering defined.

tf.ReverseV2

Reverses specific dimensions of a tensor.

TensorFlow Dialect

%output = tf.ReverseV2(%tensor, %axis)

TOSA Lowering

%output = lower_reversev2_op(%tensor, %axis)

tf.RightShift

Computes the bitwise left-shift of x by y bits, element-wise.

TensorFlow Dialect

%output = tf.LeftShift(%x, %y)

TOSA Lowering

if (is_unsigned(%x.dtype)) {
  %output = tosa.LOGICAL_RIGHT_SHIFT(%x, %y)
} else {
  %output = tosa.ARITHMETIC_RIGHT_SHIFT(%x, %y)
}

tf.Round

Rounds the values of a tensor to the nearest integer, element-wise.

TensorFlow Dialect

%output = tf.Round(%x)

TOSA Lowering

%output = lower_round_op(%x)

tf.RsqrtGrad

Training profile: TOSA lowering not yet defined.

tf.Rsqrt

This operator is trivially lowered to tosa.RSQRT.

tf.SegmentMax

No TOSA lowering defined.

tf.SegmentMean

No TOSA lowering defined.

tf.SegmentMin

No TOSA lowering defined.

tf.SegmentProd

No TOSA lowering defined.

tf.SegmentSum

No TOSA lowering defined.

tf.Select

No TOSA lowering defined.

tf.SelectV2

Selects elements from t or e depending on condition.

TensorFlow Dialect

%output = tf.SelectV2(%condition, %t, %e)

TOSA Lowering

%output = lower_selectv2_op(%condition, %t, %e, %output.shape)

tf.ShapeN

No TOSA lowering defined.

tf.Shape

Returns the shape of a tensor.

TensorFlow Dialect

%output = tf.Shape(%input)

TOSA Lowering

%output = lower_shape_op(%input)

tf.Sigmoid

This operator is trivially lowered to tosa.SIGMOID.

tf.Sign

No TOSA lowering defined.

tf.Sin

No TOSA lowering defined.

tf.Size

No TOSA lowering defined.

tf.Slice

Returns a slice from input.

TensorFlow Dialect

%output = tf.Slice(%input, %begin, %size)

TOSA Lowering

vector <size_t> output_size
try {
  output_size = %size.as_constant()
} except(ConversionFailed) {
  output_size = %output.shape
}

%output = tosa.SLICE(%input) {start=begin, size=output_size}

tf.Snapshot

No TOSA lowering defined.

tf.SoftmaxCrossEntropyWithLogits

Training profile: TOSA lowering not yet defined.

tf.Softmax

Computes softmax activations

TensorFlow Dialect

%output = tf.Softmax(%logits)

TOSA Lowering

%op1 = tosa.EXP(%logits)
%op2 = tosa.REDUCE_SUM(op1) {reduce_axis=(%logits.rank - 1)}
%op3 = tosa.RECIPROCAL(%op2)
%output = tosa.MUL(%op1, %op3)

tf.Softplus

No TOSA lowering defined.

tf.SpaceToBatchND

SpaceToBatch for N-D tensors of type T.

TensorFlow Dialect

%output = tf.SpaceToBatchND(%input, %block_shape, %paddings)

TOSA Lowering

%output = lower_space_to_batch_nd_op(%input, %block_shape, %paddings)

tf.SpaceToDepth

SpaceToDepth for tensors of type T.

TensorFlow Dialect

%output = tf.SpaceToDepth(%input) {block_size, data_format}

TOSA Lowering

%output = lower_space_to_depth_op(%input, block_size, data_format)

tf.SparseMatMul

No TOSA lowering defined.

tf.SparseSoftmaxCrossEntropyWithLogits

No TOSA lowering defined.

tf.SparseToDense

No TOSA lowering defined.

tf.Split

Splits a tensor into num_split tensors along one dimension

TensorFlow Dialect

%output = tf.Split(%split_dim, %value) {num_split}

TOSA Lowering

%output = lower_split_op(%value, %split_dim.as_constant(), num_split)

tf.SplitV

Splits a tensor into num_split tensors along one dimension

TensorFlow Dialect

%output = tf.SplitV(%value, %size_splits, %split_dim) {num_split}

TOSA Lowering

%output = lower_splitv_op(%value, %size_splits.as_constant(), %split_dim.as_constant())

tf.Sqrt

No TOSA lowering defined.

tf.Square

Computes the square of x, element-wise.

TensorFlow Dialect

%output = tf.Square(%x)

TOSA Lowering

%output = tosa.MUL(%x, %x)

tf.SquareDifference

Computes (x-y)*(x-y) element-wise

TensorFlow Dialect

%output = tf.SquareDifference(%x, %y)

TOSA Lowering

%diff = tosa.SUB(%x, %y)
%output = tosa.MUL(%diff, %diff)

tf.Squeeze

Removes dimensions of size 1 from the shape of a tensor.

TensorFlow Dialect

%output = tf.Squeeze(%input) {squeeze_dims}

TOSA Lowering

%output = lower_squeeze_op(%input, squeeze_dims)

tf.StatefulPartitionedCall

No TOSA lowering defined.

tf.StopGradient

Training profile: TOSA lowering not yet defined.

tf.StridedSliceGrad

Training profile: TOSA lowering not yet defined.

tf.StridedSlice

Return a strided slice from input.

TensorFlow Dialect

%output = tf.StridedSlice(%input, %begin, %end, %strides) {begin_mask, end_mask, ellipsis_mask, new_axis_mask, shrink_axis_mask}

TOSA Lowering

%output = lower_strided_slice_op(%input, %begin, %end, %strides, begin_mask, end_mask, ellipsis_mask, new_axis_mask, shrink_axis_mask)

tf.Sub

This operator is trivially lowered to tosa.SUB.

tf.Sum

Computes the sum of elements across dimensions of a tensor.

TensorFlow Dialect

%output = tf.Sum(%input, %reduction_indices) {keep_dims}

TOSA Lowering

%output = lower_reduce_op<tosa.REDUCE_SUM>(%input, %output.shape, %reduction_indices, keep_dims)

tf.TPUCompilationResult

No TOSA lowering defined.

tf.TPUCopyWithLayout

No TOSA lowering defined.

tf.TPUExecuteAndUpdateVariables

No TOSA lowering defined.

tf.TPUExecute

No TOSA lowering defined.

tf.TPUGetLayout

No TOSA lowering defined.

tf.TPUReplicateMetadata

No TOSA lowering defined.

tf.TPUReplicatedInput

No TOSA lowering defined.

tf.TPUReplicatedOutput

No TOSA lowering defined.

tf.TPUReshardVariables

No TOSA lowering defined.

tf.TanhGrad

Training profile: TOSA lowering not yet defined.

tf.Tanh

This operator is trivially lowered to tosa.TANH.

tf.TensorListFromTensor

No TOSA lowering defined.

tf.TensorListGetItem

No TOSA lowering defined.

tf.TensorListLength

No TOSA lowering defined.

tf.TensorListPushBack

No TOSA lowering defined.

tf.TensorListReserve

No TOSA lowering defined.

tf.TensorListResize

No TOSA lowering defined.

tf.TensorListSetItem

No TOSA lowering defined.

tf.TensorListStack

No TOSA lowering defined.

tf.TensorScatterUpdate

No TOSA lowering defined.

tf.Tile

Constructs a tensor by tiling a given tensor.

TensorFlow Dialect

%output = tf.Tile(%input, %multiples)

TOSA Lowering

%output = tosa.TILE(%input) {multiples=%multiples.as_constant()}

tf.ToBool

No TOSA lowering defined.

tf.TopKV2

No TOSA lowering defined.

tf.Transpose

Shuffle dimensions of x according to a permutation.

TensorFlow Dialect

%output = tf.Transpose(%x, %perm)

TOSA Lowering

%output = tosa.TRANSPOSE(%x) {perm=%perm.as_constant()}

tf.TruncateDiv

No TOSA lowering defined.

tf.Unique

No TOSA lowering defined.

tf.Unpack

Unpacks a given dimension of a rank-R tensor into num rank-(R-1) tensors.

TensorFlow Dialect

%output = tf.Unpack(%value) {axis, num}

TOSA Lowering

%output = lower_unpack_op(%value, axis, num)

tf.UnsortedSegmentMax

No TOSA lowering defined.

tf.UnsortedSegmentMin

No TOSA lowering defined. === tf.UnsortedSegmentProd

No TOSA lowering defined. === tf.UnsortedSegmentSum

No TOSA lowering defined.

tf.VarHandle

No TOSA lowering defined.

tf.VariableShape

No TOSA lowering defined.

tf.Where

No TOSA lowering defined.

tf.While

No TOSA lowering defined.

tf.Xdivy

No TOSA lowering defined.

tf.XlaDynamicUpdateSlice

No TOSA lowering defined.

tf.XlaSharding

No TOSA lowering defined.

tf.ZerosLike

Returns a tensor of zeros with the same shape and type as x.

TensorFlow Dialect

%output = tf.ZerosLike(%x)

TOSA Lowering

%output = tosa.CONST() {value={0} * %x.num_elements}

TensorFlow Lite MLIR Dialect Legalization (legalize_tfl)

tfl.abs

This operator is trivially lowered to tosa.ABS

tfl.add_n

add_n operator.

TensorFlow Lite Dialect

%sum = tfl.add_n(%inputs)

TOSA Lowering

%output = tosa.ADD(%inputs:0, %inputs:1)
for (int32 i = 2 i < %inputs.size i++) {
    %output = tosa.ADD(%inputs:i, %output)
}

tfl.add

Element-wise addition operation.

TensorFlow Lite Dialect

%output = tfl.add(%lhs, %rhs)

TOSA Lowering

If input/output tensors are all non-quantized typed,

Legalization:

%result = tosa.ADD(%lhs, %rhs)

If input/output tensors are all quantized typed,

Prepare:

float64 max_scale_2x = 2.0 * max(%lhs.scale, %rhs.scale)
float64 lhs_scale = float64(1 << input_shift) * %lhs.scale / max_scale_2x
float64 rhs_scale = float64(1 << input_shift) * %rhs.scale / max_scale_2x
float64 output_scale = max_scale_2x / (%output.scale * float64(1 << input_shift))

Legalization:

%op1_rescale_lhs = tosa.RESCALE(%lhs) {scale=lhs_scale, input_zp=%lhs.zp, output_zp=0} // %lhs.dtype->i32
%op2_rescale_rhs = tosa.RESCALE(%rhs) {scale=rhs_scale, input_zp=%rhs.zp, output_zp=0} // %rhs.dtype->i32
%op3_add_op1_op2 = tosa.ADD(%op1_rescale_lhs, %op2_rescale_rhs)
%op4_rescale_op3 = tosa.RESCALE(%op3_add_op1_op2) {scale=output_scale} // i32->%output.dtype

tfl.arg_max

ArgMax operator.

TensorFlow Lite Dialect

%output = tfl.arg_max(%input, %dim)

TOSA Lowering

%result = tosa.ARGMAX(%input) {axis=positive_axis(%dim_const.as_constant(), %input.rank)}

tfl.arg_min

No TOSA lowering defined.

tfl.average_pool_2d

Average_pool_2d operator.

TensorFlow Lite Dialect

%output = tfl.average_pool_2d(%input) {filter_height, filter_width, padding, stride_h, stride_w, fused_activation_function}

TOSA Lowering

Prepare:

tosa_padding =
     get_padding_values_from_pad_type(padding, NHWC, 1,
                                      %input.type, tensor<{filter_height, filter_width}, tosa.int32>,
                                      {1, stride_h, stride_w, 1}, {1, 1, 1, 1})

If input/output tensors are all non-quantized typed,

Legalization:

%avgpool2d = tosa.AVG_POOL2D(%input) {kernel={filter_height, filter_width}, stride={stride_h, stride_w}, padding=tosa_padding}
if(fused_activation != NONE) {
    %result = convert_fused_activation(%avgpool2d, fused_activation)
}
else {
    %result = %avgpool2d
}

If input/output tensors are all quantized typed,

Legalization:

%avgpool2d = tosa.AVG_POOL2D(%input) {kernel={filter_height, filter_width}, stride={stride_h, stride_w}, padding=tosa_padding, quantization_info={input_zp=%input.zp, output_zp=%output.zp}}
if(fused_activation != NONE) {
    %result = convert_fused_activation(%avgpool2d, fused_activation)
}
else {
    %result = %avgpool2d
}

tfl.basic_lstm

No TOSA lowering defined.

tfl.batch_to_space_nd

BatchToSpaceNd operator.

TensorFlow Lite Dialect

%output = tfl.batch_to_space_nd(%input, %block_shape, %indices)

TOSA Lowering

%result = convert_batch_to_space_nd_op(%input, %block_shape, %indices)

tfl.cast

This operator is trivially lowered to tosa.CAST

tfl.ceil

Ceil operator.

TensorFlow Lite Dialect

%y = tfl.ceil(%x)

TOSA Lowering

If input/output tensors are all non-quantized typed,

%result = tosa.CEIL(%x)

tfl.concatenation

Concatenation operator.

TensorFlow Lite Dialect

%output = tfl.concatenation(%values) {axis}

TOSA Lowering

%result = lower_concatv2_op(%values, axis)

tfl.pseudo_const

This operator is trivially lowered to tosa.CONST

tfl.conv_2d

Convolution operator.

TensorFlow Lite Dialect

%output = tfl.conv_2d(%input, %filter, %bias) {dilation_h_factor, dilation_w_factor, fused_activation_function, padding, stride_h, stride_w}

TOSA Lowering

If input/output tensors are all non-quantized typed,

Prepare:

tosa_padding =
     get_padding_values_from_pad_type(padding, NHWC, 1,
                                      %input.type, %filter.type,
                                      {1, stride_h, stride_w, 1}, {1, dilation_h_factor, dilation_w_factor, 1})

Legalization:

%conv2d = tosa.CONV2D(%input, %filter, %bias) {padding=tosa_padding, stride={stride_h, stride_w}, dilation={dilation_h_factor, dilation_w_factor}}
if(fused_activation != NONE) {
    %result = convert_fused_activation(%conv2d, fused_activation_function)
}
else {
    %result = %conv2d
}

If input/output tensors are all quantized typed,

Prepare:

float64 output_rescale_scale = (%input.scale * %filter.scale) / %output.scale

tosa_padding =
     get_padding_values_from_pad_type(padding, NHWC, 1,
                                      %input.type, %filter.type,
                                      {1, stride_h, stride_w, 1}, {1, dilation_h_factor, dilation_w_factor, 1})

Legalization:

%conv2d = tosa.CONV2D(%input, %filter, %bias) {padding=tosa_padding, stride={stride_h, stride_w}, dilation={dilation_h_factor, dilation_w_factor}, quantization_info={input_zp=%input.zp, weight_zp=%filter.zp}}
%rescale = tosa.RESCALE(%conv2d) {scale=output_rescale_scale, input_zp=0, output_zp=%output.zp} // %conv2d.dtype->%output.dtype
if(fused_activation != NONE) {
    %result = convert_fused_activation(%rescale, fused_activation_function)
}
else {
    %result = %rescale
}

tfl.convolution_2d_transpose_bias

No TOSA lowering defined.

tfl.cos

No TOSA lowering defined.

tfl.densify

No TOSA lowering defined.

tfl.depth_to_space

DepthToSpace operator.

TensorFlow Dialect

%output = tfl.depth_to_space(%input) {block_size}

TOSA Lowering

%output = lower_depth_to_space_op(%input, block_size, "NHWC")

tfl.depthwise_conv_2d

Depthwise-separable convolution operator.

TensorFlow Lite Dialect

%output = tfl.depthwise_conv_2d(%input, %filter, %bias) {dilation_h_factor, dilation_w_factor, fused_activation_function, padding, stride_h, stride_w, depth_multiplier}

TOSA Lowering

If input/output tensors are all non-quantized typed,

Prepare:

tosa_padding =
     get_padding_values_from_pad_type(padding, NHWC, 1,
                                      %input.type, %filter.type,
                                      {1, stride_h, stride_w, 1}, {1, dilation_h_factor, dilation_w_factor, 1})

Legalization:

%depthwise_conv2d = tosa.DEPTHWISE_CONV2D(%input, %filter, %bias) {padding=tosa_padding, stride={stride_h, stride_w}, dilation={dilation_h_factor, dilation_w_factor}}
if(fused_activation != NONE) {
    %result = convert_fused_activation(%depthwise_conv2d, fused_activation_function)
}
else {
    %result = %depthwise_conv2d
}

If input/output tensors are all quantized typed,

Prepare:

float64 output_rescale_scale = (%input.scale * %filter.scale) / %output.scale

tosa_padding =
     get_padding_values_from_pad_type(padding, NHWC, 1,
                                      %input.type, %filter.type,
                                      {1, stride_h, stride_w, 1}, {1, dilation_h_factor, dilation_w_factor, 1})

Legalization:

%depthwise_conv2d = tosa.DEPTHWISE_CONV2D(%input, %filter, %bias) {padding=tosa_padding, stride={stride_h, stride_w}, dilation={dilation_h_factor, dilation_w_factor}, quantization_info={input_zp=%input.zp, weight_zp=%filter.zp}}
%rescale = tosa.RESCALE(%conv2d) {scale=output_rescale_scale, input_zp=0, output_zp=%output.zp} // %depthwise_conv2d.dtype->%output.dtype
if(fused_activation != NONE) {
    %result = convert_fused_activation(%rescale, fused_activation_function)
}
else {
    %result = %rescale
}

tfl.dequantize

Dequantize operator.

TensorFlow Lite Dialect

%output = tfl.dequantize(%input)

TOSA Lowering

%result = lower_dequantize_op(%input, %input.scale, %input.zp)

tfl.div

Division operator.

TensorFlow Lite Dialect

%output = tfl.div(%lhs, %rhs)

TOSA Lowering

If input/output tensors are all non-quantized typed,

%rcp = tosa.RECIPROCAL(%rhs)
%mul = tosa.MUL(%lhs, %rcp)

tfl.elu

Exponential Linear Unit operator.

TensorFlow Lite Dialect

%y = tfl.elu(%x)

TOSA Lowering

If input/output tensors are all non-quantized typed,

%rcp = lower_elu_op(%x)

tfl.embedding_lookup

Embedding lookup operator.

TensorFlow Lite Dialect

%output = tfl.embedding_lookup(%lookup, %value)

tfl.equal

This operator is trivially lowered to tosa.EQUAL

tfl.exp

Natural exponentiation operator.

TensorFlow Lite Dialect

%y = tfl.exp(%x)

TOSA Lowering

If input/output tensors are all non-quantized typed,

%result = tosa.EXP(%x)

tfl.expand_dims

Inserts a dimension of 1 into a tensor’s shape.

TensorFlow Lite Dialect

%output = tfl.expand_dims(%input, %dim)

TOSA Lowering

%result = lower_expand_dims(%input, %dim.as_constant())

tfl.external_const

No TOSA lowering defined.

tfl.fake_quant

FakeQuant operator

TensorFlow Lite Dialect

%output = tfl.fake_quant(%input) {min, max, num_bits, narrow_range}

TOSA Lowering

%result = convert_fake_quant_op(%input, min, max, num_bits, narrow_range)

tfl.fill

Fill the tensor with given value.

TensorFlow Lite Dialect

%res = tfl.fill(%dims, %value)

TOSA Lowering

Prepare:

total_size = 1
dim_vec = %dim.as_constant()
for(int32 i = 0 i < dim_vec.size() i++) {
    total_size *= dim_vec[i]
}
filled_val = %value.as_constant()[0]
output_type = tensor<dim_vec, filled_val.dtype>

Legalization:

%result = tosa.CONST() {value={filled_val} * total_size}

tfl.floor_div

Floor div operator.

TensorFlow Lite Dialect

%output = tfl.floor_div(%lhs, %rhs)

TOSA Lowering

If input/output tensors are all non-quantized typed,

%recip = tosa.RECIPROCAL(%rhs)
%mul = tosa.MUL(%lhs, %recip)
%result = tosa.FLOOR(%mul)

tfl.floor_mod

Division remainder.

TensorFlow Lite Dialect

%output = tfl.floor_mod(%lhs, %rhs)

TOSA Lowering

If input/output tensors are all non-quantized typed,

%recip = tosa.RECIPROCAL(%rhs)
%mul = tosa.MUL(%lhs, %recip)
%floor = tosa.FLOOR(%mul)
%result = tosa.SUB(%mul, %floor)

tfl.floor

This operator is trivially lowered to tosa.FLOOR

tfl.fully_connected

Fully connected op.

TensorFlow Lite Dialect

%output = tfl.fully_connected(%input, %filter, %bias) {fused_activation_function}

TOSA Lowering

If input/output tensors are all non-quantized typed,

Prepare:

// input[N, IC] x filter[OC, IC] + bias[OC] -> output[N, OC]
auto input_reshape_shape = {%input.num_elements / %filter.shape[1], %filter.shape[1]}

Legalization:

if(!(%bias)) {
    %bias_val = tosa.CONST() {value={0} * %filter.shape[3]}
}
else {
    %bias_val = %bias
}
if(%input.rank != 2) {
    %input_val = tosa.RESHAPE(%input) {shape=input_reshape_shape}
}
else {
    %input_val = %input
}
%fc = tosa.FULLY_CONNECTED(%input_val, %filter, %bias_val)
if(fused_activation != NONE) {
    %result = convert_fused_activation(%fc, fused_activation_function)
}
else {
    %result = %fc
}

If input/output tensors are all quantized typed,

Prepare:

auto input_reshape_shape = {%input.num_elements / %filter.shape[1], %filter.shape[1]}
float64 output_rescale_scale = (%input.scale * %filter.scale) / %output.scale

Legalization:

if(!(%bias)) {
    %bias_val = tosa.CONST() {value={0} * %filter.shape[3]}
}
else {
    %bias_val = %bias
}
if(%input.rank != 2) {
    %input_val = tosa.RESHAPE(%input) {shape=input_reshape_shape}
}
else {
    %input_val = %input
}
%fc = tosa.FULLY_CONNECTED(%input_val, %filter, %bias_val)
%rescale = tosa.RESCALE(%fc) {scale=output_rescale_scale, input_zp=0, output_zp=%output.zp} // %fc.dtype->%output.dtype
if(fused_activation != NONE) {
    %result = convert_fused_activation(%rescale, fused_activation_function)
}
else {
    %result = %rescale
}

tfl.gather_nd

Gather_nd operator.

TensorFlow Dialect

%output = tfl.gather_nd(%params, %indices)

TOSA Lowering

%output = lower_gather_nd_op(%params, %indices)

tfl.gather

Gather operator.

TensorFlow Dialect

%output = tfl.gather(%params, %indices) {axis}

TOSA Lowering

%output = lower_gather_op(%params, %indices, 0, axis)

tfl.greater_equal

This operator is trivially lowered to tosa.GREATER_EQUAL

tfl.greater

This operator is trivially lowered to tosa.GREATER

tfl.hard_swish

Hardswish activation function.

TensorFlow Lite Dialect

%output = tfl.hard_swish(%input)

TOSA Lowering

If input/output tensors are all non-quantized typed,

%const_3 = tosa.CONST() {value={3.0}}
%const_rcp6 = tosa.CONST() {value={1.0 / 6.0}}
%op1_add_in_3 = tosa.ADD(%input, %const_3)
%op2_relun_op1 = tosa.RELUN(%op1_add_in_3) {max=6.0}
%op3_mul_in_op2 = tosa.MUL(%input, %op2_relun_op1)
%op4_mul_op3_rcp6 = tosa.MUL(%op3, %const_rcp6)

If input/output tensors are all quantized typed,

Prepare:

float64 input_sample_grain = 1.0 / 64.0
auto hardswish_func = [input_sample_grain](int32 x) -> int32 {
    float64 v = (float64)x * input_sample_grain
    float64 w = v + 3.0
    w = (w < 0.0) ? 0.0 : ((w > 6.0) ? 6.0 : w)
    v = (v * w) / 6.0
    return std::lround(32768.0 * v)
}
float64 input_rescale_scale = (%input.scale * 128.0) / input_sample_grain
float64 output_rescale_scale = 1.0 / (128.0 * 32768.0 * %output.scale)
int32 quantized_3 = (int32)(std::ceil(3.0 / %input.scale)) + %input.zp

Legalization:

%table_const = get_table_const_tensor(hardswish_func)
%const_3 = tosa.CONST() {value={quantized_3}}
%op1_rescale_in = tosa.RESCALE(%input) {scale=input_rescale_scale, input_zp=%input.zp, output_zp=0} // %input.dtype->i16
%op2_table_op1 = tosa.TABLE(%op1_rescale_in, %table_const)
%op3_rescale_op2 = tosa.RESCALE(%op2_table_op1) {scale=output_rescale_scale, input_zp=0, output_zp=%output.zp} // i32->%output.dtype
%op4_rescale_in = tosa.RESCALE(%input {scale=1.0, input_zp=0, output_zp=0} // %input.dtype->i32
%op5_ge_op4 = tosa.GREATER_EQUAL(%op4_rescale_in, %const_3)
%op6_select_op5_in_op3 = tosa.SELECT(%op5_ge_op4, %input, %op3_rescale_op2)

tfl.l2_normalization

No TOSA lowering defined.

tfl.lstm

No TOSA lowering defined.

tfl.leaky_relu

Leaky Relu Operator.

TensorFlow Lite Dialect

%output = tfl.leaky_relu(%input) {alpha}

TOSA Lowering

If input/output tensors are all non-quantized typed,

Legalization:

%const_0 = tosa.CONST() {value={0.0}}
%const_alpha = tosa.CONST() {value={alpha}}
%op1_mul_in_alpha = tosa.MUL(%input, %const_alpha)
%op2_ge_in_0 = tosa.GREATER_EQUAL(%input, %const_0)
%op3_select_op2_in_op1 = tosa.SELECT(%op2_ge_in_0, %input, $op1_mul_in_alpha)

If input/output tensors are all quantized typed,

Prepare:

float32 scaled_alpha = (%input.scale * alpha) / %output.scale
float32 scaled_identity = %input.scale / %output.scale

Legalization:

%const_0 = tosa.CONST() {value={0}}
%op1_rescale_in = tosa.RESCALE(%input) {scale=1.0, input_zp=%input.zp} // %input.dtype->i32
%op2_ge_in_0 = tosa.GREATER_EQUAL(%input, %const_0)
%op3_rescale_in_alpha = tosa.RESCALE(%input) {scale=scaled_alpha, input_zp=%input.zp, output_zp=%output_zp} // %input.dtype->%output.dtype
%op4_rescale_in_identity = tosa.RESCALE(%input) {scale=scaled_identity, input_zp=%input.zp, output_zp=%output_zp} // %input.dtype->%output.dtype
%op5_select_op2_op3_op4 = tosa.SELECT(%op2_ge_in_0, %op4_rescale_in_identity, %op3_rescale_in_alpha)

tfl.less_equal

Less_equal operator.

TensorFlow Lite Dialect

%output = tfl.less_equal(%lhs, %rhs)

TOSA Lowering

If input/output tensors are all non-quantized typed,

Legalization:

%op1_greater_lhs_rhs = tosa.GREATER(%lhs, %rhs)
%op2_not_op1 = tosa.LOGICAL_NOT(%op1_greater_lhs_rhs)

If input/output tensors are all quantized typed,

Legalization:

assert (%lhs.scale == %rhs.scale) && (%lhs.zp == %rhs.zp)

%op1_rescale_lhs = tosa.RESCALE(%lhs) {scale=1.0, input_zp=%lhs.zp, output_zp=0} // %lhs.dtype->i32
%op2_rescale_rhs = tosa.RESCALE(%rhs) {scale=1.0, input_zp=%rhs.zp, output_zp=0} // %rhs.dtype->i32
%op3_greater_op1_op2 = tosa.GREATER(%op1_rescale_lhs, %op2_rescale_rhs)
%op4_not_op3 = tosa.LOGICAL_NOT(%op3_greater_op1_op2)

tfl.less

Less operator.

TensorFlow Lite Dialect

%output = tfl.less(%lhs, %rhs)

TOSA Lowering

If input/output tensors are all non-quantized typed,

Legalization:

%op1_ge_lhs_rhs = tosa.GREATER_EQUAL(%lhs, %rhs)
%op2_not_op1 = tosa.LOGICAL_NOT(%op1_ge_lhs_rhs)

If input/output tensors are all quantized typed,

Legalization:

assert (%lhs.scale == %rhs.scale) && (%lhs.zp == %rhs.zp)

%op1_rescale_lhs = tosa.RESCALE(%lhs) {scale=1.0, input_zp=%lhs.zp, output_zp=0} // %lhs.dtype->i32
%op2_rescale_rhs = tosa.RESCALE(%rhs) {scale=1.0, input_zp=%rhs.zp, output_zp=0} // %rhs.dtype->i32
%op3_ge_op1_op2 = tosa.GREATER_EQUAL(%op1_rescale_lhs, %op2_rescale_rhs)
%op4_not_op3 = tosa.LOGICAL_NOT(%op3_ge_op1_op2)

tfl.local_response_normalization

No TOSA lowering defined.

tfl.log

No TOSA lowering defined.

tfl.log_softmax

Log softmax operator.

TensorFlow Lite Dialect

%output = tfl.log_softmax(%input)

TOSA Lowering

If input/output tensors are all non-quantized typed,

Legalization:

%output = lower_log_softmax_op(%logits)

No TOSA lowering defined if input/output tensors are all quantized typed.

tfl.logical_and

This operator is trivially lowered to tosa.LOGICAL_AND

tfl.logical_not

This operator is trivially lowered to tosa.LOGICAL_NOT

tfl.logical_or

This operator is trivially lowered to tosa.LOGICAL_OR

tfl.logistic

Logistic operator.

TensorFlow Lite Dialect

%y = tfl.logistic(%x)

TOSA Lowering

If input/output tensors are all non-quantized typed,

Legalization:

%op1_sigmoid_in = tosa.SIGMOID(%x)

If input/output tensors are all quantized typed,

Prepare:

float64 input_sample_grain = 1.0 / 16.0
auto sigmoid_func = [input_sample_grain](int32 x) -> int32 {
  float64 v = static_cast<float64>(x) * input_sample_grain
  v = 1.0 / (1.0 + std::exp(-v))
  return std::lround(32768.0 * v)
}

float32 input_rescale_scale = (%x.scale * 128.0) / input_sample_grain
float32 output_rescale_scale = 1.0 / (%y.scale * 32768.0 * 128.0);

Legalization:

%table_const = get_table_const_tensor(sigmoid_func)
%op1_rescale_in = tosa.RESCALE(%x) {scale=input_rescale_scale, input_zp=%x.zp, output_zp=0} // %x.dtype->i16
%op2_table_op1 = tosa.TABLE(%op1_rescale_in, %table_const)
%op3_rescale_op2 = tosa.RESCALE(%op2_table_op1) {scale=output_rescale_scale, input_zp=0, output_zp=%y.zp} // %int32->%y.dtype

tfl.matrix_diag

No TOSA lowering defined.

tfl.matrix_set_diag

No TOSA lowering defined.

tfl.max_pool_2d

Max Pool 2d op.

TensorFlow Lite Dialect

%output = tfl.max_pool_2d(%input) {filter_height, filter_width, padding, stride_h, stride_w, fused_activation_function}

TOSA Lowering

Prepare:

tosa_padding =
     get_padding_values_from_pad_type(padding, NHWC, 1,
                                      %input.type, tensor<{filter_height, filter_width}, tosa.int32>,
                                      {1, stride_h, stride_w, 1}, {1, 1, 1, 1})

If input/output tensors are all non-quantized typed,

Legalization:

%maxpool2d = tosa.MAX_POOL2D(%input) {kernel={filter_height, filter_width}, stride={stride_h, stride_w}, padding=tosa_padding}
if(fused_activation != NONE) {
    %result = convert_fused_activation(%maxpool2d, fused_activation)
}
else {
    %result = %maxpool2d
}

If input/output tensors are all quantized typed,

Legalization:

%maxpool2d = tosa.MAX_POOL2D(%input) {kernel={filter_height, filter_width}, stride={stride_h, stride_w}, padding=tosa_padding, quantization_info={input_zp=%input.zp, output_zp=%output.zp}}
if(fused_activation != NONE) {
    %result = convert_fused_activation(%maxpool2d, fused_activation)
}
else {
    %result = %maxpool2d
}

tfl.max_pooling_with_argmax_2d

No TOSA lowering defined.

tfl.max_unpooling_2d

No TOSA lowering defined.

tfl.maximum

This operator is trivially lowered to tosa.MAXIMUM

tfl.mean

Mean operator.

TensorFlow Lite Dialect

%output = tfl.mean(%input, %axis) {keep_dims}

TOSA Lowering

Prepare:

int32 num_elements_on_axis = 1
for (int32 axis : %reduction_indices) {
    num_elements_on_axis *= %input.shape[axis]
}
float32 div_scale = 1.0 / num_elements_on_axis

If input/output tensors are all non-quantized typed,

Legalization:

%cst_div_scale = tosa.CONST() {value={div_scale}}
%op1_rsum_in = lower_reduce_op<tosa.REDUCE_SUM>(%input, %output.shape, %axis, keep_dims)
%op2_mul_op1 = tosa.MUL(%op1_rsum_in, %cst_div_scale)

If input/output tensors are all quantized typed,

Legalization:

%rsum = lower_reduce_op<tosa.REDUCE_SUM>(%op1_rescale_in, %output.shape, %reduction_indices, keep_dims, 1.0f, %input_zp, div_scale * %input.scale / %output.scale, %output.zp)

tfl.minimum

This operator is trivially lowered to tosa.MINIMUM

tfl.mirror_pad

No TOSA lowering defined.

tfl.mul

Mul operator.

TensorFlow Lite Dialect

%output = tfl.mul(%lhs, %rhs)

TOSA Lowering

If input/output tensors are all non-quantized typed,

Legalization:

%op1_mul_in = tosa.MUL(%lhs, %rhs)

If input/output tensors are all quantized typed,

Legalization:

%op1_rescale_lhs = tosa.RESCALE(%lhs) {scale=1.0f, input_zp=%lhs.zp, output_zp=0} // %lhs.dtype->i32
%op2_rescale_rhs = tosa.RESCALE(%rhs) {scale=1.0f, input_zp=%rhs.zp, output_zp=0} // %rhs.dtype->i32
%op3_mul_op1_op2 = tosa.MUL(%op1_rescale_lhs, %op2_rescale_rhs)
%op4_rescale_op3 = tosa.RESCALE(%op3_mul_op1_op2) {scale=%lhs.scale * %rhs.scale / %output.scale, input_zp=0, output_zp=%output.zp} // i32->%output.dtype

tfl.neg

This operator is trivially lowered to tosa.NEGATE

tfl.non_max_suppression_v4

No TOSA lowering defined.

tfl.non_max_suppression_v5

No TOSA lowering defined.

tfl.not_equal

Not_equal operator.

TensorFlow Lite Dialect

%output = tfl.not_equal(%lhs, %rhs)

TOSA Lowering

If input/output tensors are all non-quantized typed,

Legalization:

%op1_equal_lhs_rhs = tosa.EQUAL(%lhs, %rhs)
%op2_not_op1 = tosa.LOGICAL_NOT(%op1_equal_lhs_rhs)

If input/output tensors are all quantized typed,

Legalization:

assert (%lhs.scale == %rhs.scale) && (%lhs.zp == %rhs.zp)

%op1_rescale_lhs = tosa.RESCALE(%lhs) {scale=1.0f, input_zp=%lhs.zp, output_zp=0} // %lhs.dtype->i32
%op2_rescale_rhs = tosa.RESCALE(%rhs) {scale=1.0f, input_zp=%rhs.zp, output_zp=0} // %rhs.dtype->i32
%op3_equal_op1_op2 = tosa.EQUAL(%op1_rescale_lhs, %op2_rescale_rhs)
%op4_not_op3 = tosa.LOGICAL_NOT(%op3_equal_op1_op2) // i32->%output.dtype

tfl.NumericVerify

No TOSA lowering defined.

tfl.one_hot

OneHot operator.

TensorFlow Lite Dialect

%output = tfl.one_hot(%indices, %depth, %on_value, %off_value) {axis}

TOSA Lowering

%output = lower_one_hot_op(%indices, %depth, %on_value, %off_value, axis)

tfl.prelu

No TOSA lowering defined.

tfl.pack

Packs a list of tensors along a dimension into one tensor.

TensorFlow Dialect

%output = tf.pack(%values) {axis}

TOSA Lowering

%output = lower_pack_op(%values, axis)

tfl.pad

This operator is trivially lowered to tosa.PAD

tfl.padv2

No TOSA lowering defined.

tfl.pow

No TOSA lowering defined.

tfl.pseudo_qconst

This operator is trivially lowered to tosa.CONST

tfl.quantize

Quantize operator

TensorFlow Lite Dialect

%output = tfl.quantize(%input)

TOSA Lowering

Legalization:

if (isa<QuantizedType>(%input.dtype)) {
    %op1_rescale_in = tosa.RESCALE(%input) {scale=%input.scale / %output.scale, input_zp=%input.zp, output_zp=%output.zp}
}
else {
    %output = lower_quantize_op(%output.dtype, %input, %output.zp, %output.scale)
}

tfl.range

No TOSA lowering defined.

tfl.rank

Rank operator

TensorFlow Lite Dialect

%output = tfl.rank(%input)

TOSA Lowering

Legalization:

%const = tosa.CONST() {value={%input.rank}}

tfl.reduce_any

Computes the "logical or" of elements across dimensions of a tensor.

TensorFlow Lite Dialect

%output = tfl.reduce_any(%input, %reduction_indices) {keep_dims}

TOSA Lowering

Legalization:

%op1_rsum_in = lower_reduce_op<tosa.REDUCE_ANY>(%input, %output.shape, %reduction_indices, keep_dims)

tfl.reduce_max

Max-reduction operator.

TensorFlow Lite Dialect

%output = tfl.reduce_max(%input, %axes) {keep_dims}

TOSA Lowering

Legalization:

%op1_rsum_in = lower_reduce_op<tosa.REDUCE_MAX>(%input, %output.shape, %reduction_indices, keep_dims)

tfl.reduce_min

Computes the min reduction along the specified axes.

TensorFlow Lite Dialect

%output = tfl.reduce_min(%input, %axes) {keep_dims}

TOSA Lowering

Legalization:

%op1_rsum_in = lower_reduce_op<tosa.REDUCE_MIN>(%input, %output.shape, %reduction_indices, keep_dims)

tfl.reduce_prod

Prod-reduction operator.

TensorFlow Lite Dialect

%output = tfl.reduce_prod(%input, %axes) {keep_dims}

TOSA Lowering

If input/output tensors are all float typed,

Legalization:

%op1_rsum_in = lower_reduce_op<tosa.REDUCE_PROD>(%input, %output.shape, %reduction_indices, keep_dims)

tfl.relu_n1_to_1

No TOSA lowering defined.

tfl.relu6

Relu6 operator.

TensorFlow Lite Dialect

%y = tfl.relu6(%x)

TOSA Lowering

If input/output tensors are all non-quantized typed,

Legalization:

%op1_relun_in = tosa.RELUN(%input) {max_int=0, max_fp=6.0}

If input/output tensors are all quantized typed,

Legalization:

%op1_rescale_in = tosa.RESCALE(%lhs) {scale=%x.scale / %y.scale, input_zp=%x.zp, output_zp=0} // %x.dtype->i32
%op2_relun_op1 = tosa.RELUN(%op1_rescale_in) {max_int=(6.0 / %y.scale), max_fp=0.0}
%op3_rescale_op2 = tosa.RESCALE(%op2_relun_op1) {scale=1.0, input_zp=0, output_zp=%y.zp // i32->%y.dtype

tfl.relu

Relu operator.

TensorFlow Lite Dialect

%y = tfl.relu(%x)

TOSA Lowering

If input/output tensors are all non-quantized typed,

Legalization:

%op1_relun_in = tosa.RELUN(%input) {max_int=0, max_fp=std::numeric_limits<float>::max()}

If input/output tensors are all quantized typed,

Legalization:

%op1_rescale_in = tosa.RESCALE(%lhs) {scale=%x.scale / %y.scale, input_zp=%x.zp, output_zp=0} // %x.dtype->i32
%op2_relun_op1 = tosa.RELUN(%op1_rescale_in) {max_int=std::numeric_limits<int32>::max(), max_fp=0.0}
%op3_rescale_op2 = tosa.RESCALE(%op2_relun_op1) {scale=1.0, input_zp=0, output_zp=%y.zp // i32->%y.dtype

tfl.reshape

This operator is trivially lowered to tosa.RESHAPE

tfl.resize_bilinear

ResizeBilinear Op.

TensorFlow Lite Dialect

%output = tfl.resize_bilinear(%input, %size) {aligned_corners, half_pixel_centers}

TOSA Lowering

%output = lower_resize_op(%input, %size, %input.dtype, "BILINEAR")

tfl.resize_nearest_neighbor

ResizeBilinear Op.

TensorFlow Lite Dialect

%output = tfl.resize_bilinear(%input, %size) {aligned_corners, half_pixel_centers}

TOSA Lowering

%output = lower_resize_op(%input, %size, %input.dtype, "NEAREST_NEIGHBOR")

tfl.reverse_sequence

No TOSA lowering defined.

tfl.reverse_v2

ReverseV2 Operator.

TensorFlow Lite Dialect

%output = tfl.reverse_v2(%input, %axis)

TOSA Lowering

%output = lower_reversev2_op(%tensor, %axis)

tfl.round

Round operator.

TensorFlow Lite Dialect

%output = tfl.round(%input)

TOSA Lowering

%const_half = tosa.CONST() {value={0.5}}
%op1_add_in_half = tosa.ADD(%input, %const_half)
%op2_floor_op1 = tosa.FLOOR(%op1_add_in_half)

tfl.rsqrt

No TOSA lowering defined.

tfl.svdf

No TOSA lowering defined.

tfl.segment_sum

No TOSA lowering defined.

tfl.select

This operator is trivially lowered to tosa.SELECT

tfl.select_v2

This operator is trivially lowered to tosa.SELECT

tfl.shape

Shape operator

TensorFlow Lite Dialect

%output = tfl.shape(%input)

TOSA Lowering

Legalization:

%const = tosa.CONST() {value=%input.shape}

tfl.sin

No TOSA lowering defined.

tfl.slice

This operator is trivially lowered to tosa.SLICE

tfl.softmax

Softmax operator.

TensorFlow Lite Dialect

%output = tfl.softmax(%input)

TOSA Lowering

If input/output tensors are all non-quantized typed,

Legalization:

%op1_exp_in = tosa.EXP(%input)
%op2_rsum_op1 = tosa.REDUCE_SUM(%op1_exp_in) {axis=(%input.rank-1)}
%op3_rcp_op2 = tosa.RECIPROCAL(%op2)
%op4_mul_op1_op3 = tosa.MUL(%op1, %op3)

If input/output tensors are all quantized typed,

Prepare:

float64 exp_sample_grain = 1.0 / 16.0
auto exp_func = [exp_sample_grain](int32 x) -> int32 {
  double v = static_cast<float64>(x) * exp_sample_grain
  v = v < 0.0 ? std::exp(v) : 1.0
  return std::lround(32768.0 * v)
}

float64 one_over_one_plus_x_sample_grain = 1.0 / 256.0
auto one_over_one_plus_x_func = [one_over_one_plus_x_sample_grain](int32 x) -> int32 {
  double v = static_cast<float64>(x) * one_over_one_plus_x_sample_grain
  v = v < 0.0 ? 1.0 : 1.0 / (1.0 + v)
  return std::lround(32768.0 * v)
}

float64 op4_rescale_scale = (%input.scale * 128.0) / exp_sample_grain
float64 op19_rescale_scale = 1.0 / (%output.scale * 256.0)

Legalization:

%const_exp_table = get_table_const_tensor(exp_func)
%const_one_over_one_plus_x_table = get_table_const_tensor(one_over_one_plus_x_func)
%const_3 = tosa.CONST() {value={3}}
%const_34 = tosa.CONST() {value={12+20-8}}
%const_2_to_31 = tosa.CONST() {value={1<<31}}
%const_16 = tosa.CONST() {value={16}}

%op1_rescale_in = tosa.RESCALE(%lhs) {scale=1.0f, input_zp=%x.zp, output_zp=0} // %x.dtype->i32
%op2_rmax_op1 = tosa.REDUCE_MAX(%op1_rescale_in) {axis=(%input.rank-1)}
%op3_sub_op1_op2 = tosa.SUB(%op1_rescale_in, %op2_relun_op1)
%op4_rescale_op3 = tosa.RESCALE(%op3_sub_op1_op2) {scale=op4_rescale_scale, input_zp=0, output_zp=0} // i32->i16
%op5_table_op4 = tosa.TABLE(%op4_rescale_op3, %const_exp_table)
%op6_rshift_op5_3 = tosa.ARITHMETIC_RIGHT_SHIFT(%op5_table_op4, %const_3)
%op7_rsum_op6 = tosa.REDUCE_SUM(%op6_rshift_op5_3) {axis=(%input.rank-1)}
%op8_clz_op7 = tosa.CLZ(%op7_rsum_op6)
%op9_sub_34_op8 = tosa.SUB(%const_34, %op8_clz_op7)
%op10_lshift_op7_op8 = tosa.LOGICAL_LEFT_SHIFT(%op7_rsum_op6, %op8_clz_op7)
%op11_sub_op10 = tosa.SUB(%op10_lshift_op7_op8, %const_2_to_31)
%op12_rshift_op11_16 = tosa.ARITHMETIC_RIGHT_SHIFT(%op11_sub_op10, %const_16)
%op13_cast_op12 = tosa.CAST(%op12_rshift_op11_16) // i32->i16
%op14_table_op13 = tosa.TABLE(%op13_cast_op12, %const_one_over_one_plus_x_table)
%op15_rescale_op14 = tosa.RESCALE(%op14_table_op13) {scale=1.0/128.0, input_zp=0, output_zp=0} // i32->i16
%op16_rescale_op5 = tosa.RESCALE(%op5_table_op4) {scale=1.0/128.0, input_zp=0, output_zp=0} // i32->i16
%op17_mul_op16_op15 = tosa.MUL(%op15_rescale_op14, %op16_rescale_op5)
%op18_rshift_op17_op9 = tosa.ARITHMETIC_RIGHT_SHIFT(%op17_mul_op16_op15, %op9_sub_34_op8)
%op19_rescale_op18 = tosa.RESCALE(%op18_rshift_op17_op9) {scale=op19_rescale_scale, input_zp=0, output_zp=%output.zp}

tfl.space_to_batch_nd

SpaceToBatchNd operator.

TensorFlow Dialect

%output = tfl.space_to_batch_nd(%input, %block_shape, %paddings)

TOSA Lowering

%output = lower_space_to_batch_nd_op(%input, %block_shape, %paddings)

tfl.space_to_depth

SpaceToDepth operator.

TensorFlow Dialect

%output = tfl.space_to_depth(%input) {block_size}

TOSA Lowering

%output = lower_space_to_depth_op(%input, block_size, "NHWC")

tfl.pseudo_sparse_const

No TOSA lowering defined.

tfl.pseudo_sparse_qconst

No TOSA lowering defined.

tfl.sparse_to_dense

No TOSA lowering defined.

tfl.split

Splits a tensor into num_split tensors along one dimension.

TensorFlow Dialect

%output = tfl.split(%split_dim, %value) {num_split}

TOSA Lowering

%output = lower_split_op(%value, %split_dim.as_constant(), num_split)

tfl.split_v

Splits a tensor into num_split tensors along one dimension.

TensorFlow Dialect

%output = tfl.split_v(%value, %size_splits, %split_dim) {num_splits}

TOSA Lowering

%output = lower_splitv_op(%value, %size_splits.as_constant(), %split_dim.as_constant())

tfl.sqrt

No TOSA lowering defined.

tfl.square

Square operator.

TensorFlow Lite Dialect

%y = tfl.square(%x)

TOSA Lowering

If input/output tensors are all non-quantized typed,

Legalization:

%op1_mul_in = tosa.MUL(%x, %x)

If input/output tensors are all quantized typed,

Legalization:

%op1_rescale_x = tosa.RESCALE(%x) {scale=1.0f, input_zp=%x.zp, output_zp=0} // %x.dtype->i32
%op2_mul_op1_op1 = tosa.MUL(%op1_rescale_x, %op1_rescale_x)
%op3_rescale_op2 = tosa.RESCALE(%op2_mul_op1_op1) {scale=%(x.scale * %x.scale) / %output.scale, input_zp=0, output_zp=%y.zp} // i32->%y.dtype

tfl.squared_difference

Squared difference operator.

TensorFlow Lite Dialect

%output = tfl.squared_difference(%lhs, %rhs)

TOSA Lowering

Legalization:

%op1_sub_in = tosa.SUB(%lhs, %rhs)
%op2_mul_op1 = tosa.MUL(%op1_sub_in, %op1_sub_in)

tfl.squeeze

Removes dimensions of size 1 from the shape of a tensor.

TensorFlow Dialect

%output = tfl.squeeze(%input) {squeeze_dims}

TOSA Lowering

%output = lower_squeeze_op(%input, squeeze_dims)

tfl.strided_slice

StridedSlice Op.

TensorFlow Dialect

%output = tfl.strided_slice(%input, %begin, %end, %strides) {begin_mask, end_mask, ellipsis_mask, new_axis_mask, shrink_axis_mask}

TOSA Lowering

%output = lower_strided_slice_op(%input, %begin, %end, %strides, begin_mask, end_mask, ellipsis_mask, new_axis_mask, shrink_axis_mask)

tfl.sub

This operator is trivially lowered to tosa.SUB

tfl.sum

Sum operator.

TensorFlow Lite Dialect

%output = tfl.sum(%input, %axis) {keep_dims}

TOSA Lowering

If input/output tensors are all non-quantized typed,

Legalization:

%op1_rsum_in = lower_reduce_op<tosa.REDUCE_SUM>(%input, %output.shape, %axis, keep_dims)

If input/output tensors are all quantized typed,

Legalization:

%rsum = lower_reduce_op<tosa.REDUCE_SUM>(%op1_rescale_in, %output.shape, %reduction_indices, keep_dims, 1.0f, %input_zp, (%input.scale / %output.scale), %output.zp)

tfl.tanh

Hyperbolic tangent operator.

TensorFlow Lite Dialect

%y = tfl.tanh(%x)

TOSA Lowering

If input/output tensors are all non-quantized typed,

Legalization:

%op1_tanh_in = tosa.TANH(%x)

If input/output tensors are all quantized typed,

Prepare:

float64 input_sample_grain = 1.0 / 32.0
auto tanh_func = [input_sample_grain](int32 x) -> int32 {
  float64 v = static_cast<float64>(x) * input_sample_grain
  v = std::exp(-2.0 * v)
  v = (1.0 - v) / (1.0 + v)
  return std::lround(32768.0 * v)
}

float32 input_rescale_scale = (%x.scale * 128.0) / input_sample_grain
float32 output_rescale_scale = 1.0 / (%y.scale * 32768.0 * 128.0);

Legalization:

%table_const = get_table_const_tensor(tanh_func)
%op1_rescale_in = tosa.RESCALE(%x) {scale=input_rescale_scale, input_zp=%x.zp, output_zp=0} // %x.dtype->i16
%op2_table_op1 = tosa.TABLE(%op1_rescale_in, %table_const)
%op3_rescale_op2 = tosa.RESCALE(%op2_table_op1) {scale=output_rescale_scale, input_zp=0, output_zp=%y.zp} // %int32->%y.dtype

tfl.tile

This operator is trivially lowered to tosa.TILE

tfl.topk_v2

No TOSA lowering defined.

tfl.transpose_conv

Transpose convolution operator.

TensorFlow Lite Dialect

%output = tfl.transpose_conv(%output_shape, %weights, %input) {padding, stride_h, stride_w}

TOSA Lowering

Prepare:

tosa_padding =
    get_transpose_conv2d_padding_values_from_pad_type(%input.type, %weights.type, %output_shape, padding, "NHWC", FORMAT_HWIO, {stride_h, stride_w}, {1, 1})

If input/output tensors are all non-quantized typed,

Legalization:

%bias = tosa.CONST() {value={0.0} * %output.shape[3]}
%conv2d = tosa.TRANSPOSE_CONV2D(%input, %weight, %bias) {padding=tosa_padding, stride={stride_h, stride_w}, dilation={1, 1}}

If input/output tensors are all quantized typed,

Prepare:

float64 output_rescale_scale = (%input.scale * %weights.scale) / %output.scale

Legalization:

%bias = tosa.CONST() {value={0} * %output.shape[3]}
%conv2d = tosa.TRANSPOSE_CONV2D(%input, %weight, %bias) {padding=tosa_padding, stride={stride_h, stride_w}, dilation={1, 1}}
%rescale = tosa.RESCALE(%conv2d) {scale=output_rescale_scale, input_zp=0, output_zp=%output.zp} // %conv2d.dtype->%output.dtype

tfl.transpose

This operator is trivially lowered to tosa.TRANSPOSE

tfl.unidirectional_sequence_lstm

No TOSA lowering defined.

tfl.unidirectional_sequence_rnn

No TOSA lowering defined.

tfl.unique

No TOSA lowering defined.

tfl.unpack

Unpacks a tensor along a dimension into multiple tensors.

TensorFlow Dialect

%output = tfl.unpack(%input) {num, axis}

TOSA Lowering

%output = lower_unpack_op(%input, axis, num)

tfl.where

No TOSA lowering defined.

tfl.while

No TOSA lowering defined.

tfl.yield

This operator is trivially lowered to tosa.YIELD

tfl.zeros_like

ZerosLike operator.

TensorFlow Dialect

%output = tfl.zeros_like(%input)

TOSA Lowering

%output = tosa.CONST() {value={0} * %input.num_elements}

fuse_tf_bias

Legalize (tf.Conv2D + tf.BiasAdd) to tosa.CONV2D. This is currently the only N:1 mapping in TOSA legalization.

From:

%conv2d = tf.Conv2D(%input, %filter) {...}
%bias_add = tf.BiasAdd(%conv2d, %bias)

To:

%conv2d = tosa.CONV2D(%input, %filter, %bias)

convert_tfl_uint8

This pass does three things:

Convert const from quantized uint8 to quantized int8, with value within remapped as well.
If input placeholders is quantized uint8 typed, insert "tosa.RESCALE() {scale=1.0, input_zp=input_zp, output_zp=input_zp-128} // qu8->qi8" in between
If output tensor is quantized uint8 typed, insert "tosa.RESCALE() {scale=1.0, input_zp=output_zp+128, output_zp=output_zp} // qi8->qu8" in between