Operator Changelog

This file is automatically generated from the def files via this script. Do not modify directly and instead edit operator definitions.

For an operator input/output's differentiability, it can be differentiable, non-differentiable, or undefined. If a variable's differentiability is not specified, that variable has undefined differentiability.

ai.onnx (default)

Version 1 of the default ONNX operator set

<a name="Abs-1"></a>Abs-1</a>

Absolute takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the absolute is, y = abs(x), is applied to the tensor elementwise.

Version

This version of the operator has been available since version 1 of the default ONNX operator set.

Attributes

<dl> <dt><tt>consumed_inputs</tt> : list of ints</dt> <dd>legacy optimization attribute.</dd> </dl>

Inputs

<dl> <dt><tt>X</tt> : T</dt> <dd>Input tensor</dd> </dl>

Outputs

<dl> <dt><tt>Y</tt> : T</dt> <dd>Output tensor</dd> </dl>

Type Constraints

<dl> <dt><tt>T</tt> : tensor(float16), tensor(float), tensor(double)</dt> <dd>Constrain input and output types to float tensors.</dd> </dl>

<a name="Add-1"></a>Add-1</a>

Performs element-wise binary addition (with limited broadcast support).

If necessary the right-hand-side argument will be broadcasted to match the shape of left-hand-side argument. When broadcasting is specified, the second tensor can either be of element size 1 (including a scalar tensor and any tensor with rank equal to or smaller than the first tensor), or having its shape as a contiguous subset of the first tensor's shape. The starting of the mutually equal shape is specified by the argument "axis", and if it is not set, suffix matching is assumed. 1-dim expansion doesn't work yet.

For example, the following tensor shapes are supported (with broadcast=1):

shape(A) = (2, 3, 4, 5), shape(B) = (,), i.e. B is a scalar tensor
shape(A) = (2, 3, 4, 5), shape(B) = (1, 1), i.e. B is an 1-element tensor
shape(A) = (2, 3, 4, 5), shape(B) = (5,)
shape(A) = (2, 3, 4, 5), shape(B) = (4, 5)
shape(A) = (2, 3, 4, 5), shape(B) = (3, 4), with axis=1
shape(A) = (2, 3, 4, 5), shape(B) = (2), with axis=0

Attribute broadcast=1 needs to be passed to enable broadcasting.

Version

This version of the operator has been available since version 1 of the default ONNX operator set.

Attributes

<dl> <dt><tt>axis</tt> : int</dt> <dd>If set, defines the broadcast dimensions. See doc for details.</dd> <dt><tt>broadcast</tt> : int (default is 0)</dt> <dd>Pass 1 to enable broadcasting</dd> <dt><tt>consumed_inputs</tt> : list of ints</dt> <dd>legacy optimization attribute.</dd> </dl>

Inputs

<dl> <dt><tt>A</tt> : T</dt> <dd>First operand, should share the type with the second operand.</dd> <dt><tt>B</tt> : T</dt> <dd>Second operand. With broadcasting can be of smaller size than A. If broadcasting is disabled it should be of the same size.</dd> </dl>

Outputs

<dl> <dt><tt>C</tt> : T</dt> <dd>Result, has same dimensions and type as A</dd> </dl>

Type Constraints

<dl> <dt><tt>T</tt> : tensor(float16), tensor(float), tensor(double)</dt> <dd>Constrain input and output types to float tensors.</dd> </dl>

<a name="And-1"></a>And-1</a>

Returns the tensor resulted from performing the and logical operation elementwise on the input tensors A and B.

If broadcasting is enabled, the right-hand-side argument will be broadcasted to match the shape of left-hand-side argument. See the doc of Add for a detailed description of the broadcasting rules.

Version

This version of the operator has been available since version 1 of the default ONNX operator set.

Attributes

<dl> <dt><tt>axis</tt> : int</dt> <dd>If set, defines the broadcast dimensions.</dd> <dt><tt>broadcast</tt> : int (default is 0)</dt> <dd>Enable broadcasting</dd> </dl>

Inputs

<dl> <dt><tt>A</tt> : T</dt> <dd>Left input tensor for the logical operator.</dd> <dt><tt>B</tt> : T</dt> <dd>Right input tensor for the logical operator.</dd> </dl>

Outputs

<dl> <dt><tt>C</tt> : T1</dt> <dd>Result tensor.</dd> </dl>

Type Constraints

<dl> <dt><tt>T</tt> : tensor(bool)</dt> <dd>Constrain input to boolean tensor.</dd> <dt><tt>T1</tt> : tensor(bool)</dt> <dd>Constrain output to boolean tensor.</dd> </dl>

<a name="ArgMax-1"></a>ArgMax-1</a>

Version

This version of the operator has been available since version 1 of the default ONNX operator set.

Attributes

<dl> <dt><tt>axis</tt> : int (default is 0)</dt> <dd>The axis in which to compute the arg indices.</dd> <dt><tt>keepdims</tt> : int (default is 1)</dt> <dd>Keep the reduced dimension or not, default 1 means keep reduced dimension.</dd> </dl>

Inputs

<dl> <dt><tt>data</tt> : T</dt> <dd>An input tensor.</dd> </dl>

Outputs

<dl> <dt><tt>reduced</tt> : tensor(int64)</dt> <dd>Reduced output tensor with integer data type.</dd> </dl>

Type Constraints

<a name="ArgMin-1"></a>ArgMin-1</a>

Version

This version of the operator has been available since version 1 of the default ONNX operator set.

Attributes

Inputs

<dl> <dt><tt>data</tt> : T</dt> <dd>An input tensor.</dd> </dl>

Outputs

<dl> <dt><tt>reduced</tt> : tensor(int64)</dt> <dd>Reduced output tensor with integer data type.</dd> </dl>

Type Constraints

<a name="AveragePool-1"></a>AveragePool-1</a>

output_spatial_shape[i] = floor((input_spatial_shape[i] + pad_shape[i] - kernel_spatial_shape[i]) / strides_spatial_shape[i] + 1)

* pad_shape[i] is sum of pads along axis i

auto_pad is a DEPRECATED attribute. If you are using them currently, the output spatial shape will be following:

VALID: output_spatial_shape[i] = ceil((input_spatial_shape[i] - kernel_spatial_shape[i] + 1) / strides_spatial_shape[i])
SAME_UPPER or SAME_LOWER: output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides_spatial_shape[i])

And pad shape will be following if SAME_UPPER or SAME_LOWER:

pad_shape[i] = (output_spatial_shape[i] - 1) * strides_spatial_shape[i] + kernel_spatial_shape[i] - input_spatial_shape[i]

The output of each pooling window is divided by the number of elements exclude pad.

Version

This version of the operator has been available since version 1 of the default ONNX operator set.

Attributes

Inputs

Outputs

Type Constraints

<dl> <dt><tt>T</tt> : tensor(float16), tensor(float), tensor(double)</dt> <dd>Constrain input and output types to float tensors.</dd> </dl>

<a name="BatchNormalization-1"></a>BatchNormalization-1</a>

If the pads parameter is provided the shape of the output is calculated via the following equation:

output_shape[i] = stride[i] * (input_size[i] - 1) + output_padding[i] + ((kernel_shape[i] - 1) * dilations[i] + 1) - pads[start_i] - pads[end_i]

output_shape can also be explicitly specified in which case pads values are auto generated using these equations:

total_padding[i] = stride[i] * (input_size[i] - 1) + output_padding[i] + ((kernel_shape[i] - 1) * dilations[i] + 1) - output_shape[i]
If (auto_pads != SAME_UPPER): pads[start_i] = total_padding[i]/2; pads[end_i] = total_padding[i] - (total_padding[i]/2)
Else: pads[start_i] = total_padding[i] - (total_padding[i]/2); pads[end_i] = (total_padding[i]/2).

Version

This version of the operator has been available since version 1 of the default ONNX operator set.

Attributes

Inputs (2 - 3)

<dl> <dt><tt>X</tt> : T</dt> <dd>Input data tensor from previous layer; has size (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and width. Note that this is for the 2D image. Otherwise the size is (N x C x D1 x D2 ... x Dn)</dd> <dt><tt>W</tt> : T</dt> <dd>The weight tensor that will be used in the convolutions; has size (C x M/group x kH x kW), where C is the number of channels, and kH and kW are the height and width of the kernel, and M is the number of feature maps. For more than 2 dimensions, the weight shape will be (C x M/group x k1 x k2 x ... x kn), where (k1 x k2 x ... x kn) is the dimension of the kernel. The number of channels in the output should be equal to W.shape[1] * group (assuming zero based indices of the shape array)</dd> <dt><tt>B</tt> (optional) : T</dt> <dd>Optional 1D bias to be added to the convolution, has size of M.</dd> </dl>

Outputs

<dl> <dt><tt>Y</tt> : T</dt> <dd>Output data tensor that contains the result of the convolution. The output dimensions are functions of the kernel size, stride size, pad lengths and group count. The number of channels in the output should be equal to W.shape[1] * group (assuming zero based indices of the shape array)</dd> </dl>

Type Constraints

<dl> <dt><tt>T</tt> : tensor(float16), tensor(float), tensor(double)</dt> <dd>Constrain input and output types to float tensors.</dd> </dl>

<a name="DepthToSpace-1"></a>DepthToSpace-1</a>

Version

This version of the operator has been available since version 1 of the default ONNX operator set.

Attributes

<dl> <dt><tt>blocksize</tt> : int (required)</dt> <dd>Blocks of [blocksize, blocksize] are moved.</dd> </dl>

Inputs

<dl> <dt><tt>input</tt> : T</dt> <dd>Input tensor of [N,C,H,W], where N is the batch axis, C is the channel or depth, H is the height and W is the width.</dd> </dl>

Outputs

<dl> <dt><tt>output</tt> : T</dt> <dd>Output tensor of [N, C/(blocksize * blocksize), H * blocksize, W * blocksize].</dd> </dl>

Type Constraints

<a name="Div-1"></a>Div-1</a>

Performs element-wise binary division (with limited broadcast support).

For example, the following tensor shapes are supported (with broadcast=1):

shape(A) = (2, 3, 4, 5), shape(B) = (,), i.e. B is a scalar tensor
shape(A) = (2, 3, 4, 5), shape(B) = (1, 1), i.e. B is an 1-element tensor
shape(A) = (2, 3, 4, 5), shape(B) = (5,)
shape(A) = (2, 3, 4, 5), shape(B) = (4, 5)
shape(A) = (2, 3, 4, 5), shape(B) = (3, 4), with axis=1
shape(A) = (2, 3, 4, 5), shape(B) = (2), with axis=0

Attribute broadcast=1 needs to be passed to enable broadcasting.

Version

This version of the operator has been available since version 1 of the default ONNX operator set.

Attributes

Inputs

Outputs

<dl> <dt><tt>C</tt> : T</dt> <dd>Result, has same dimensions and type as A</dd> </dl>

Type Constraints

<dl> <dt><tt>T</tt> : tensor(float16), tensor(float), tensor(double)</dt> <dd>Constrain input and output types to float tensors.</dd> </dl>

<a name="Dropout-1"></a>Dropout-1</a>

Version

This version of the operator has been available since version 1 of the default ONNX operator set.

Attributes

<dl> <dt><tt>consumed_inputs</tt> : list of ints</dt> <dd>legacy optimization attribute.</dd> <dt><tt>is_test</tt> : int (default is 0)</dt> <dd>(int, default 0) if nonzero, run dropout in test mode where the output is simply Y = X.</dd> <dt><tt>ratio</tt> : float (default is 0.5)</dt> <dd>(float, default 0.5) the ratio of random dropout</dd> </dl>

Inputs

<dl> <dt><tt>data</tt> : T</dt> <dd>The input data as Tensor.</dd> </dl>

Outputs (1 - 2)

<dl> <dt><tt>output</tt> : T</dt> <dd>The output.</dd> <dt><tt>mask</tt> (optional) : T</dt> <dd>The output mask. If is_test is nonzero, this output is not filled.</dd> </dl>

Type Constraints

<dl> <dt><tt>T</tt> : tensor(float16), tensor(float), tensor(double)</dt> <dd>Constrain input and output types to float tensors.</dd> </dl>

<a name="Elu-1"></a>Elu-1</a>

Elu takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the function f(x) = alpha * (exp(x) - 1.) for x < 0, f(x) = x for x >= 0., is applied to the tensor elementwise.

Version

This version of the operator has been available since version 1 of the default ONNX operator set.

Attributes

<dl> <dt><tt>alpha</tt> : float (default is 1.0)</dt> <dd>Coefficient of ELU default to 1.0.</dd> <dt><tt>consumed_inputs</tt> : list of ints</dt> <dd>legacy optimization attribute.</dd> </dl>

Inputs

<dl> <dt><tt>X</tt> : T</dt> <dd>Input tensor</dd> </dl>

Outputs

<dl> <dt><tt>Y</tt> : T</dt> <dd>Output tensor</dd> </dl>

Type Constraints

<dl> <dt><tt>T</tt> : tensor(float16), tensor(float), tensor(double)</dt> <dd>Constrain input and output types to float tensors.</dd> </dl>

<a name="Equal-1"></a>Equal-1</a>

<a name="GRU-1"></a>GRU-1</a>

Computes an one-layer GRU. This operator is usually supported via some custom implementation such as CuDNN.

Notations:

X - input tensor

z - update gate

r - reset gate

h - hidden gate

t - time step (t-1 means previous time step)

W[zrh] - W parameter weight matrix for update, reset, and hidden gates

R[zrh] - R recurrence weight matrix for update, reset, and hidden gates

Wb[zrh] - W bias vectors for update, reset, and hidden gates

Rb[zrh] - R bias vectors for update, reset, and hidden gates

WB[zrh] - W parameter weight matrix for backward update, reset, and hidden gates

RB[zrh] - R recurrence weight matrix for backward update, reset, and hidden gates

WBb[zrh] - W bias vectors for backward update, reset, and hidden gates

RBb[zrh] - R bias vectors for backward update, reset, and hidden gates

H - Hidden state

num_directions - 2 if direction == bidirectional else 1

Activation functions:

Relu(x)                - max(0, x)

Tanh(x)                - (1 - e^{-2x})/(1 + e^{-2x})

Sigmoid(x)             - 1/(1 + e^{-x})

(NOTE: Below are optional)

Affine(x)              - alpha*x + beta

LeakyRelu(x)           - x if x >= 0 else alpha * x

ThresholdedRelu(x)     - x if x >= alpha else 0

ScaledTanh(x)          - alpha*Tanh(beta*x)

HardSigmoid(x)         - min(max(alpha*x + beta, 0), 1)

Elu(x)                 - x if x >= 0 else alpha*(e^x - 1)

Softsign(x)            - x/(1 + |x|)

Softplus(x)            - log(1 + e^x)

Equations (Default: f=Sigmoid, g=Tanh):

- zt = f(Xt*(Wz^T) + Ht-1*Rz + Wbz + Rbz)

- rt = f(Xt*(Wr^T) + Ht-1*Rr + Wbr + Rbr)

- ht = g(Xt*(Wh^T) + (rt (.) Ht-1)*Rh + Rbh + Wbh) # default, when linear_before_reset = 0

- ht = g(Xt*(Wh^T) + (rt (.) (Ht-1*Rh + Rbh) + Wbh) # when linear_before_reset != 0

- Ht = (1 - zt) (.) ht + zt (.) Ht-1

Version

This version of the operator has been available since version 1 of the default ONNX operator set.

Attributes

<dl> <dt><tt>activation_alpha</tt> : list of floats</dt> <dd>Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM.</dd> <dt><tt>activation_beta</tt> : list of floats</dt> <dd>Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM.</dd> <dt><tt>activations</tt> : list of strings</dt> <dd>A list of 2 (or 4 if bidirectional) activation functions for update, reset, and hidden gates. The activation functions must be one of the activation functions specified above. Optional: See the equations for default if not specified.</dd> <dt><tt>clip</tt> : float</dt> <dd>Cell clip threshold. Clipping bounds the elements of a tensor in the range of [-threshold, +threshold] and is applied to the input of activations. No clip if not specified.</dd> <dt><tt>direction</tt> : string (default is foward)</dt> <dd>Specify if the RNN is forward, reverse, or bidirectional. Must be one of forward (default), reverse, or bidirectional.</dd> <dt><tt>hidden_size</tt> : int</dt> <dd>Number of neurons in the hidden layer</dd> <dt><tt>output_sequence</tt> : int (default is 0)</dt> <dd>The sequence output for the hidden is optional if 0. Default 0.</dd> </dl>

Inputs (3 - 6)

<dl> <dt><tt>X</tt> : T</dt> <dd>The input sequences packed (and potentially padded) into one 3-D tensor with the shape of `[seq_length, batch_size, input_size]`.</dd> <dt><tt>W</tt> : T</dt> <dd>The weight tensor for the gates. Concatenation of `W[zrh]` and `WB[zrh]` (if bidirectional) along dimension 0. This tensor has shape `[num_directions, 3*hidden_size, input_size]`.</dd> <dt><tt>R</tt> : T</dt> <dd>The recurrence weight tensor. Concatenation of `R[zrh]` and `RB[zrh]` (if bidirectional) along dimension 0. This tensor has shape `[num_directions, 3*hidden_size, hidden_size]`.</dd> <dt><tt>B</tt> (optional) : T</dt> <dd>The bias tensor for the gates. Concatenation of `[Wb[zrh], Rb[zrh]]` and `[WBb[zrh], RBb[zrh]]` (if bidirectional) along dimension 0. This tensor has shape `[num_directions, 6*hidden_size]`. Optional: If not specified - assumed to be 0</dd> <dt><tt>sequence_lens</tt> (optional) : T1</dt> <dd>Optional tensor specifying lengths of the sequences in a batch. If not specified - assumed all sequences in the batch to have length `seq_length`. It has shape `[batch_size]`.</dd> <dt><tt>initial_h</tt> (optional) : T</dt> <dd>Optional initial value of the hidden. If not specified - assumed to be 0. It has shape `[num_directions, batch_size, hidden_size]`.</dd> </dl>

Outputs

<dl> <dt><tt>Y</tt> (optional) : T</dt> <dd>A tensor that concats all the intermediate output values of the hidden. It has shape `[seq_length, num_directions, batch_size, hidden_size]`. It is optional if `output_sequence` is 0.</dd> <dt><tt>Y_h</tt> : T</dt> <dd>The last output value of the hidden. It has shape `[num_directions, batch_size, hidden_size]`.</dd> </dl>

Type Constraints

<dl> <dt><tt>T</tt> : tensor(float16), tensor(float), tensor(double)</dt> <dd>Constrain input and output types to float tensors.</dd> <dt><tt>T1</tt> : tensor(int32)</dt> <dd>Constrain seq_lens to integer tensor.</dd> </dl>

<a name="Gather-1"></a>Gather-1</a>

Given data tensor of rank r >= 1, and indices tensor of rank q, gather entries of the axis dimension of data (by default outer-most one as axis=0) indexed by indices, and concatenates them in an output tensor of rank q + (r - 1). Example 1:

  data = [
      [1.0, 1.2],
      [2.3, 3.4],
      [4.5, 5.7],
  ]
  indices = [
      [0, 1],
      [1, 2],
  ]
  output = [
      [
          [1.0, 1.2],
          [2.3, 3.4],
      ],
      [
          [2.3, 3.4],
          [4.5, 5.7],
      ],
  ]

Example 2:

  data = [
      [1.0, 1.2, 1.9],
      [2.3, 3.4, 3.9],
      [4.5, 5.7, 5.9],
  ]
  indices = [
      [0, 2],
  ]
  axis = 1,
  output = [
      [[1.0, 1.9]],
      [[2.3, 3.9]],
      [[4.5, 5.9]],
  ]

Version

This version of the operator has been available since version 1 of the default ONNX operator set.

Attributes

<dl> <dt><tt>axis</tt> : int (default is 0)</dt> <dd>Which axis to gather on. Negative value means counting dimensions from the back. Accepted range is [-r, r-1]</dd> </dl>

Inputs

<dl> <dt><tt>data</tt> : T</dt> <dd>Tensor of rank r >= 1.</dd> <dt><tt>indices</tt> : Tind</dt> <dd>Tensor of int32/int64 indices, of any rank q. All index values are expected to be within bounds. It is an error if any of the index values are out of bounds.</dd> </dl>

Outputs

<dl> <dt><tt>output</tt> : T</dt> <dd>Tensor of rank q + (r - 1).</dd> </dl>

Type Constraints

<a name="Gemm-1"></a>Gemm-1</a>

General Matrix multiplication: https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms#Level_3 Compute Y = alpha * A * B + beta * C, where input tensor A has dimension (M X K), input tensor B has dimension (K X N), input tensor C and output tensor Y have dimension (M X N). If attribute broadcast is non-zero, input tensor C will be broadcasted to match the dimension requirement. A will be transposed before doing the computation if attribute transA is non-zero, same for B and transB.

Version

This version of the operator has been available since version 1 of the default ONNX operator set.

Attributes

<dl> <dt><tt>alpha</tt> : float (default is 1.0)</dt> <dd>Scalar multiplier for the product of input tensors A * B, the default value is 1.0.</dd> <dt><tt>beta</tt> : float (default is 1.0)</dt> <dd>Scalar multiplier for input tensor C, the default value is 1.0.</dd> <dt><tt>broadcast</tt> : int (default is 0)</dt> <dd>Whether C should be broadcasted</dd> <dt><tt>transA</tt> : int (default is 0)</dt> <dd>Whether A should be transposed</dd> <dt><tt>transB</tt> : int (default is 0)</dt> <dd>Whether B should be transposed</dd> </dl>

<dl> <dt><tt>T</tt> : tensor(float16), tensor(float), tensor(double)</dt> <dd>Constrain input and output types to float tensors.</dd> </dl>

<a name="Greater-1"></a>Greater-1</a>

Returns the tensor resulted from performing the greater logical operation elementwise on the input tensors A and B.

If broadcasting is enabled, the right-hand-side argument will be broadcasted to match the shape of left-hand-side argument. See the doc of Add for a detailed description of the broadcasting rules.

Version

This version of the operator has been available since version 1 of the default ONNX operator set.

Attributes

<dl> <dt><tt>axis</tt> : int</dt> <dd>If set, defines the broadcast dimensions.</dd> <dt><tt>broadcast</tt> : int (default is 0)</dt> <dd>Enable broadcasting</dd> </dl>

Inputs

<dl> <dt><tt>A</tt> : T</dt> <dd>Left input tensor for the logical operator.</dd> <dt><tt>B</tt> : T</dt> <dd>Right input tensor for the logical operator.</dd> </dl>

Outputs

<dl> <dt><tt>C</tt> : T1</dt> <dd>Result tensor.</dd> </dl>

Type Constraints

<dl> <dt><tt>T</tt> : tensor(float16), tensor(float), tensor(double)</dt> <dd>Constrain input to float tensors.</dd> <dt><tt>T1</tt> : tensor(bool)</dt> <dd>Constrain output to boolean tensor.</dd> </dl>

<a name="HardSigmoid-1"></a>HardSigmoid-1</a>

HardSigmoid takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the HardSigmoid function, y = max(0, min(1, alpha * x + beta)), is applied to the tensor elementwise.

Version

This version of the operator has been available since version 1 of the default ONNX operator set.

Attributes

<dl> <dt><tt>alpha</tt> : float (default is 0.2)</dt> <dd>Value of alpha default to 0.2</dd> <dt><tt>beta</tt> : float (default is 0.5)</dt> <dd>Value of beta default to 0.5</dd> <dt><tt>consumed_inputs</tt> : list of ints</dt> <dd>legacy optimization attribute.</dd> </dl>

Inputs

<dl> <dt><tt>X</tt> : T</dt> <dd>Input tensor</dd> </dl>

Outputs

<dl> <dt><tt>Y</tt> : T</dt> <dd>Output tensor</dd> </dl>

Type Constraints

<dl> <dt><tt>T</tt> : tensor(float16), tensor(float), tensor(double)</dt> <dd>Constrain input and output types to float tensors.</dd> </dl>

<a name="Hardmax-1"></a>Hardmax-1</a>

The operator computes the hardmax (1 for the first maximum value, and 0 for all others) values for each layer in the batch of the given input. The input is a 2-D tensor (Tensor<float>) of size (batch_size x input_feature_dimensions). The output tensor has the same shape and contains the hardmax values of the corresponding input.

Input does not need to explicitly be a 2D vector; rather, it will be coerced into one. For an arbitrary n-dimensional tensor input \in [a_0, a_1, ..., a_{k-1}, a_k, ..., a_{n-1}] and k is the axis provided, then input will be coerced into a 2-dimensional tensor with dimensions [a_0 * ... * a_{k-1}, a_k * ... * a_{n-1}]. For the default case where axis=1, this means the input tensor will be coerced into a 2D tensor of dimensions [a_0, a_1 * ... * a_{n-1}], where a_0 is often the batch size. In this situation, we must have a_0 = N and a_1 * ... * a_{n-1} = D. Each of these dimensions must be matched correctly, or else the operator will throw errors.

Version

This version of the operator has been available since version 1 of the default ONNX operator set.

Attributes

<dl> <dt><tt>axis</tt> : int (default is 1)</dt> <dd>Describes the axis of the inputs when coerced to 2D; defaults to one because the 0th axis most likely describes the batch_size</dd> </dl>

Inputs

<dl> <dt><tt>input</tt> : T</dt> <dd>The input tensor that's coerced into a 2D matrix of size (NxD) as described above.</dd> </dl>

Outputs

<dl> <dt><tt>output</tt> : T</dt> <dd>The output values with the same shape as input tensor (the original size without coercion).</dd> </dl>

<dl> <dt><tt>T</tt> : tensor(float16), tensor(float), tensor(double)</dt> <dd>Constrain input and output types to float tensors.</dd> </dl>

<a name="LRN-1"></a>LRN-1</a>

Local Response Normalization proposed in the AlexNet paper. It normalizes over local input regions. The local region is defined across the channels. For an element X[n, c, d1, ..., dk] in a tensor of shape (N x C x D1 x D2, ..., Dk), its region is {X[n, i, d1, ..., dk] | max(0, c - floor((size - 1) / 2)) <= i <= min(C - 1, c + ceil((size - 1) / 2))}.

square_sum[n, c, d1, ..., dk] = sum(X[n, i, d1, ..., dk] ^ 2), where max(0, c - floor((size - 1) / 2)) <= i <= min(C - 1, c + ceil((size - 1) / 2)).

Y[n, c, d1, ..., dk] = X[n, c, d1, ..., dk] / (bias + alpha / size * square_sum[n, c, d1, ..., dk] ) ^ beta

Version

This version of the operator has been available since version 1 of the default ONNX operator set.

Attributes

<dl> <dt><tt>alpha</tt> : float (default is 0.0001)</dt> <dd>Scaling parameter.</dd> <dt><tt>beta</tt> : float (default is 0.75)</dt> <dd>The exponent.</dd> <dt><tt>bias</tt> : float (default is 1.0)</dt> <dd></dd> <dt><tt>size</tt> : int (required)</dt> <dd>The number of channels to sum over</dd> </dl>

Inputs

Outputs

<dl> <dt><tt>Y</tt> : T</dt> <dd>Output tensor, which has the shape and type as input tensor</dd> </dl>

Type Constraints

<dl> <dt><tt>T</tt> : tensor(float16), tensor(float), tensor(double)</dt> <dd>Constrain input and output types to float tensors.</dd> </dl>

<a name="LSTM-1"></a>LSTM-1</a>

Computes an one-layer LSTM. This operator is usually supported via some custom implementation such as CuDNN.

Notations:

X - input tensor

i - input gate

o - output gate

f - forget gate

c - cell gate

t - time step (t-1 means previous time step)

W[iofc] - W parameter weight matrix for input, output, forget, and cell gates

R[iofc] - R recurrence weight matrix for input, output, forget, and cell gates

Wb[iofc] - W bias vectors for input, output, forget, and cell gates

Rb[iofc] - R bias vectors for input, output, forget, and cell gates

P[iof] - P peephole weight vector for input, output, and forget gates

WB[iofc] - W parameter weight matrix for backward input, output, forget, and cell gates

RB[iofc] - R recurrence weight matrix for backward input, output, forget, and cell gates

WBb[iofc] - W bias vectors for backward input, output, forget, and cell gates

RBb[iofc] - R bias vectors for backward input, output, forget, and cell gates

PB[iof] - P peephole weight vector for backward input, output, and forget gates

H - Hidden state

num_directions - 2 if direction == bidirectional else 1

Activation functions:

Relu(x)                - max(0, x)

Tanh(x)                - (1 - e^{-2x})/(1 + e^{-2x})

Sigmoid(x)             - 1/(1 + e^{-x})

(NOTE: Below are optional)

Affine(x)              - alpha*x + beta

LeakyRelu(x)           - x if x >= 0 else alpha * x

ThresholdedRelu(x)     - x if x >= alpha else 0

ScaledTanh(x)          - alpha*Tanh(beta*x)

HardSigmoid(x)         - min(max(alpha*x + beta, 0), 1)

Elu(x)                 - x if x >= 0 else alpha*(e^x - 1)

Softsign(x)            - x/(1 + |x|)

Softplus(x)            - log(1 + e^x)

Equations (Default: f=Sigmoid, g=Tanh, h=Tanh):

- it = f(Xt*(Wi^T) + Ht-1*Ri + Pi (.) Ct-1 + Wbi + Rbi)

- ft = f(Xt*(Wf^T) + Ht-1*Rf + Pf (.) Ct-1 + Wbf + Rbf)

- ct = g(Xt*(Wc^T) + Ht-1*Rc + Wbc + Rbc)

- Ct = ft (.) Ct-1 + it (.) ct

- ot = f(Xt*(Wo^T) + Ht-1*Ro + Po (.) Ct + Wbo + Rbo)

- Ht = ot (.) h(Ct)

Version

This version of the operator has been available since version 1 of the default ONNX operator set.

Attributes

<dl> <dt><tt>activation_alpha</tt> : list of floats</dt> <dd>Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.For example with LeakyRelu, the default alpha is 0.01.</dd> <dt><tt>activation_beta</tt> : list of floats</dt> <dd>Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.</dd> <dt><tt>activations</tt> : list of strings</dt> <dd>A list of 3 (or 6 if bidirectional) activation functions for input, output, forget, cell, and hidden. The activation functions must be one of the activation functions specified above. Optional: See the equations for default if not specified.</dd> <dt><tt>clip</tt> : float</dt> <dd>Cell clip threshold. Clipping bounds the elements of a tensor in the range of [-threshold, +threshold] and is applied to the input of activations. No clip if not specified.</dd> <dt><tt>direction</tt> : string (default is forward)</dt> <dd>Specify if the RNN is forward, reverse, or bidirectional. Must be one of forward (default), reverse, or bidirectional.</dd> <dt><tt>hidden_size</tt> : int</dt> <dd>Number of neurons in the hidden layer</dd> <dt><tt>input_forget</tt> : int (default is 0)</dt> <dd>Couple the input and forget gates if 1, default 0.</dd> <dt><tt>output_sequence</tt> : int (default is 0)</dt> <dd>The sequence output for the hidden is optional if 0. Default 0.</dd> </dl>

Inputs (3 - 8)

<dl> <dt><tt>X</tt> : T</dt> <dd>The input sequences packed (and potentially padded) into one 3-D tensor with the shape of `[seq_length, batch_size, input_size]`.</dd> <dt><tt>W</tt> : T</dt> <dd>The weight tensor for the gates. Concatenation of `W[iofc]` and `WB[iofc]` (if bidirectional) along dimension 0. The tensor has shape `[num_directions, 4*hidden_size, input_size]`.</dd> <dt><tt>R</tt> : T</dt> <dd>The recurrence weight tensor. Concatenation of `R[iofc]` and `RB[iofc]` (if bidirectional) along dimension 0. This tensor has shape `[num_directions, 4*hidden_size, hidden_size]`.</dd> <dt><tt>B</tt> (optional) : T</dt> <dd>The bias tensor for input gate. Concatenation of `[Wb[iofc], Rb[iofc]]`, and `[WBb[iofc], RBb[iofc]]` (if bidirectional) along dimension 0. This tensor has shape `[num_directions, 8*hidden_size]`. Optional: If not specified - assumed to be 0.</dd> <dt><tt>sequence_lens</tt> (optional) : T1</dt> <dd>Optional tensor specifying lengths of the sequences in a batch. If not specified - assumed all sequences in the batch to have length `seq_length`. It has shape `[batch_size]`.</dd> <dt><tt>initial_h</tt> (optional) : T</dt> <dd>Optional initial value of the hidden. If not specified - assumed to be 0. It has shape `[num_directions, batch_size, hidden_size]`.</dd> <dt><tt>initial_c</tt> (optional) : T</dt> <dd>Optional initial value of the cell. If not specified - assumed to be 0. It has shape `[num_directions, batch_size, hidden_size]`.</dd> <dt><tt>P</tt> (optional) : T</dt> <dd>The weight tensor for peepholes. Concatenation of `P[iof]` and `PB[iof]` (if bidirectional) along dimension 0. It has shape `[num_directions, 3*hidde_size]`. Optional: If not specified - assumed to be 0.</dd> </dl>

Outputs (0 - 3)

<dl> <dt><tt>Y</tt> (optional) : T</dt> <dd>A tensor that concats all the intermediate output values of the hidden. It has shape `[seq_length, num_directions, batch_size, hidden_size]`. It is optional if `output_sequence` is 0.</dd> <dt><tt>Y_h</tt> (optional) : T</dt> <dd>The last output value of the hidden. It has shape `[num_directions, batch_size, hidden_size]`.</dd> <dt><tt>Y_c</tt> (optional) : T</dt> <dd>The last output value of the cell. It has shape `[num_directions, batch_size, hidden_size]`.</dd> </dl>

Type Constraints

<a name="LeakyRelu-1"></a>LeakyRelu-1</a>

LeakyRelu takes input data (Tensor<T>) and an argument alpha, and produces one output data (Tensor<T>) where the function f(x) = alpha * x for x < 0, f(x) = x for x >= 0, is applied to the data tensor elementwise.

Version

This version of the operator has been available since version 1 of the default ONNX operator set.

Attributes

<dl> <dt><tt>alpha</tt> : float (default is 0.01)</dt> <dd>Coefficient of leakage default to 0.01.</dd> <dt><tt>consumed_inputs</tt> : list of ints</dt> <dd>legacy optimization attribute.</dd> </dl>

Inputs

<dl> <dt><tt>X</tt> : T</dt> <dd>Input tensor</dd> </dl>

Outputs

<dl> <dt><tt>Y</tt> : T</dt> <dd>Output tensor</dd> </dl>

Type Constraints

<dl> <dt><tt>T</tt> : tensor(float16), tensor(float), tensor(double)</dt> <dd>Constrain input and output types to float tensors.</dd> </dl>

<a name="Less-1"></a>Less-1</a>

Returns the tensor resulted from performing the less logical operation elementwise on the input tensors A and B.

If broadcasting is enabled, the right-hand-side argument will be broadcasted to match the shape of left-hand-side argument. See the doc of Add for a detailed description of the broadcasting rules.

Version

This version of the operator has been available since version 1 of the default ONNX operator set.

Attributes

<dl> <dt><tt>axis</tt> : int</dt> <dd>If set, defines the broadcast dimensions.</dd> <dt><tt>broadcast</tt> : int (default is 0)</dt> <dd>Enable broadcasting</dd> </dl>

Inputs

<dl> <dt><tt>A</tt> : T</dt> <dd>Left input tensor for the logical operator.</dd> <dt><tt>B</tt> : T</dt> <dd>Right input tensor for the logical operator.</dd> </dl>

Outputs

<dl> <dt><tt>C</tt> : T1</dt> <dd>Result tensor.</dd> </dl>

<dl> <dt><tt>T</tt> : tensor(float16), tensor(float), tensor(double)</dt> <dd>Constrain input and output types to float tensors.</dd> </dl>

<a name="Loop-1"></a>Loop-1</a>

Generic Looping construct. This loop has multiple termination conditions:

Trip count. Iteration count specified at runtime. Set by specifying the input M. Optional. Set to empty string to omit. Note that a static trip count (specified at graph construction time) can be specified by passing in a constant node for input M.
Loop termination condition. This is an input to the op that determines whether to run the first iteration and also a loop-carried dependency for the body graph. The body graph must yield a value for the condition variable, whether this input is provided or not.

This table summarizes the operating modes of this operator with equivalent C-style code:

Operator inputs defined as (max_trip_count, condition_var).

  input ("", ""):
      for (int i=0; ; ++i) {
        cond = ... // Note this value is ignored, but is required in the body
      }

  input ("", cond) // Note this is analogous to a while loop
      bool cond = ...;
      for (int i=0; cond; ++i) {
        cond = ...;
      }

  input ("", 1) // Note this is analogous to a do-while loop
      bool cond = true
      for (int i=0; cond; ++i) {
        cond = ...;
      }

  input (trip_count, "") // Note this is analogous to a for loop
      int trip_count = ...
      for (int i=0; i < trip_count; ++i) {
        cond = ...; // ignored
      }

  input (trip_count, cond)
      int trip_count = ...;
      bool cond = ...;
      for (int i=0; i < trip_count && cond; ++i) {
        cond = ...;
      }

Sample usage - cond as well as trip count

  graph predict-net {
    %a = Constant[value = <Scalar Tensor [3]>]()
    %b = Constant[value = <Scalar Tensor [6]>]()
    %keepgoing = Constant[value = <Scalar Tensor [1]>]()
    %max_trip_count = Constant[value = <Scalar Tensor [10]>]()
    %keepgoing_out, %b_out, %user_defined_vals = Loop[body = <graph body-net>](%max_trip_count, %keepgoing, %b)
    return
  }

  graph body-net (
    %i[INT32, scalar]
    %keepgoing[BOOL, scalar]
    %b[INT32, scalar]
  ) {
    %my_local = Add(%a, %b)
    %b_out = Sub(%a, %b)
    %keepgoing_out = Greater(%my_local, %b_out)
    %user_defined_vals = Add(%b, %b)
    return %keepgoing_out, %b_out, %user_defined_vals
  }

Sample equivalent C code

  {
    /* User-defined code (enclosing scope) */
    int a = 3, b = 6;
    bool keepgoing = true; // Analogous to input cond
    /* End user-defined code */

    /* Implicitly-defined code */
    const int max_trip_count = 10; // Analogous to input M
    int user_defined_vals[]; // Imagine this is resizable
    /* End implicitly-defined code */
    for (int i=0; i < max_trip_count && keepgoing; ++i) {
      /* User-defined code (loop body) */
      int my_local = a + b; // Reading values in the enclosing scope is fine
      b = a - b; // writes fine if we specify b as a loop-carried dependency
      keepgoing = my_local > b; // keepgoing is a loop-carried dependency
      user_defined_vals[i] = b + b;
      /* End user-defined code */
    }
    // my_local = 123; // Can't do this. my_local was defined in the body

    // These below values are live-out from the loop and therefore accessible
    b_out; user_defined_vals; keepgoing_out;
  }

There are several things of note in this code snippet:

Values from the enclosing scope (i.e. variable a here) are in scope and can be referenced in the inputs of the loop.
Any variables which you wish to make available in the enclosing scope (i.e. the variables b and keepgoing) must be declared as either loop-carried dependencies (both at the op inputs and output and at the body net input and output) or scan_outputs.
Values created in the body cannot be accessed in the enclosing scope.

Note that the semantics of this op support "diagonal" or "wavefront" execution. (See Step 3 here for an example: https://devblogs.nvidia.com/optimizing-recurrent-neural-networks-cudnn-5/). Frontends should emit multi-layer RNNs as a series of While operators (with time being the inner looping dimension), with each successive layer consuming the scan_outputs from the previous layer, possibly going through several point-wise operators (e.g. dropout, residual connections, linear layer).

Version

This version of the operator has been available since version 1 of the default ONNX operator set.

Attributes

<dl> <dt><tt>body</tt> : graph (required)</dt> <dd>The graph run each iteration. It has 2+N inputs: (iteration_num, condition, loop carried dependencies...). It has 1+N+K outputs: (condition, loop carried dependencies..., scan_outputs...). Each scan_output is created by concatenating the value of the specified output value at the end of each iteration of the loop. It is an error if the dimensions or data type of these scan_outputs change across loop iterations.</dd> </dl>

Inputs (3 - ∞)

<dl> <dt><tt>M</tt> (optional) : I</dt> <dd>A maximum trip-count for the loop specified at runtime. Optional. Pass empty string to skip.</dd> <dt><tt>cond</tt> (optional) : B</dt> <dd>A boolean termination condition. Optional. Pass empty string to skip.</dd> <dt><tt>v_initial</tt> (variadic, heterogeneous) : V</dt> <dd>The initial values of any loop-carried dependencies (values that change across loop iterations)</dd> </dl>

Outputs (1 - ∞)

<dl> <dt><tt>v_final_and_scan_outputs</tt> (variadic, heterogeneous) : V</dt> <dd>Final N loop carried dependency values then K scan_outputs</dd> </dl>

Type Constraints

Outputs

<dl> <dt><tt>max</tt> : T</dt> <dd>Output tensor. Same dimension as inputs.</dd> </dl>

Type Constraints

<dl> <dt><tt>T</tt> : tensor(float16), tensor(float), tensor(double)</dt> <dd>Constrain input and output types to float tensors.</dd> </dl>

<a name="MaxPool-1"></a>MaxPool-1</a>

output_spatial_shape[i] = floor((input_spatial_shape[i] + pad_shape[i] - kernel_spatial_shape[i]) / strides_spatial_shape[i] + 1)

* pad_shape[i] is sum of pads along axis i

auto_pad is a DEPRECATED attribute. If you are using them currently, the output spatial shape will be following:

VALID: output_spatial_shape[i] = ceil((input_spatial_shape[i] - kernel_spatial_shape[i] + 1) / strides_spatial_shape[i])
SAME_UPPER or SAME_LOWER: output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides_spatial_shape[i])

And pad shape will be following if SAME_UPPER or SAME_LOWER:

pad_shape[i] = (output_spatial_shape[i] - 1) * strides_spatial_shape[i] + kernel_spatial_shape[i] - input_spatial_shape[i]

The output of each pooling window is maximum number of elements exclude pad.

Version

This version of the operator has been available since version 1 of the default ONNX operator set.

Attributes

<a name="Mul-1"></a>Mul-1</a>

Performs element-wise binary multiplication (with limited broadcast support).

For example, the following tensor shapes are supported (with broadcast=1):

shape(A) = (2, 3, 4, 5), shape(B) = (,), i.e. B is a scalar tensor
shape(A) = (2, 3, 4, 5), shape(B) = (1, 1), i.e. B is an 1-element tensor
shape(A) = (2, 3, 4, 5), shape(B) = (5,)
shape(A) = (2, 3, 4, 5), shape(B) = (4, 5)
shape(A) = (2, 3, 4, 5), shape(B) = (3, 4), with axis=1
shape(A) = (2, 3, 4, 5), shape(B) = (2), with axis=0

<dl> <dt><tt>T</tt> : tensor(bool)</dt> <dd>Constrain input/output to boolean tensors.</dd> </dl>

<a name="Or-1"></a>Or-1</a>

Returns the tensor resulted from performing the or logical operation elementwise on the input tensors A and B.

If broadcasting is enabled, the right-hand-side argument will be broadcasted to match the shape of left-hand-side argument. See the doc of Add for a detailed description of the broadcasting rules.

Version

This version of the operator has been available since version 1 of the default ONNX operator set.

Attributes

<dl> <dt><tt>axis</tt> : int</dt> <dd>If set, defines the broadcast dimensions.</dd> <dt><tt>broadcast</tt> : int (default is 0)</dt> <dd>Enable broadcasting</dd> </dl>

Inputs

<dl> <dt><tt>A</tt> : T</dt> <dd>Left input tensor for the logical operator.</dd> <dt><tt>B</tt> : T</dt> <dd>Right input tensor for the logical operator.</dd> </dl>

Outputs

<dl> <dt><tt>C</tt> : T1</dt> <dd>Result tensor.</dd> </dl>

Type Constraints

<dl> <dt><tt>T</tt> : tensor(bool)</dt> <dd>Constrain input to boolean tensor.</dd> <dt><tt>T1</tt> : tensor(bool)</dt> <dd>Constrain output to boolean tensor.</dd> </dl>

<a name="PRelu-1"></a>PRelu-1</a>

Version

This version of the operator has been available since version 1 of the default ONNX operator set.

Attributes

<dl> <dt><tt>consumed_inputs</tt> : list of ints</dt> <dd>legacy optimization attribute.</dd> </dl>

Inputs

<dl> <dt><tt>X</tt> : T</dt> <dd>Input tensor</dd> <dt><tt>slope</tt> : T</dt> <dd>Slope tensor. If `Slope` is of size 1, the value is sharedacross different channels</dd> </dl>

Outputs

<dl> <dt><tt>Y</tt> : T</dt> <dd>Output tensor</dd> </dl>

Type Constraints

<dl> <dt><tt>T</tt> : tensor(float16), tensor(float), tensor(double)</dt> <dd>Constrain input and output types to float tensors.</dd> </dl>

<a name="Pad-1"></a>Pad-1</a>

Given data tensor, paddings, mode, and value. Example: Insert 0 paddings to the beginning of the second dimension. data = [ [1.0, 1.2], [2.3, 3.4], [4.5, 5.7], ] paddings = [0, 0, 2, 0] output = [ [ [0.0, 0.0, 1.0, 1.2], [0.0, 0.0, 2.3, 3.4], [0.0, 0.0, 4.5, 5.7], ], ]

Version

This version of the operator has been available since version 1 of the default ONNX operator set.

Attributes

<dl> <dt><tt>mode</tt> : string (default is constant)</dt> <dd>Three modes: constant(default), reflect, edge</dd> <dt><tt>paddings</tt> : list of ints (required)</dt> <dd>List of integers indicate the padding element count at the beginning and end of each axis, for 2D it is the number of pixel. `paddings` rank should be double of the input's rank. `paddings` format should be as follow [x1_begin, x2_begin...x1_end, x2_end,...], where xi_begin the number of pixels added at the beginning of axis `i` and xi_end, the number of pixels added at the end of axis `i`.</dd> <dt><tt>value</tt> : float (default is 0.0)</dt> <dd>One float, indicates the value to be filled, default is 0</dd> </dl>

Inputs

<dl> <dt><tt>data</tt> : T</dt> <dd>Input tensor.</dd> </dl>

Outputs

<dl> <dt><tt>output</tt> : T</dt> <dd>Tensor after padding.</dd> </dl>

Type Constraints

<dl> <dt><tt>T</tt> : tensor(float16), tensor(float), tensor(double)</dt> <dd>Constrain input and output types to float tensors.</dd> </dl>

<a name="Pow-1"></a>Pow-1</a>

Pow takes input data (Tensor<T>) and exponent Tensor, and produces one output data (Tensor<T>) where the function f(x) = x^exponent, is applied to the data tensor elementwise.

For example, the following tensor shapes are supported (with broadcast=1):

shape(A) = (2, 3, 4, 5), shape(B) = (,), i.e. B is a scalar tensor
shape(A) = (2, 3, 4, 5), shape(B) = (1, 1), i.e. B is an 1-element tensor
shape(A) = (2, 3, 4, 5), shape(B) = (5,)
shape(A) = (2, 3, 4, 5), shape(B) = (4, 5)
shape(A) = (2, 3, 4, 5), shape(B) = (3, 4), with axis=1
shape(A) = (2, 3, 4, 5), shape(B) = (2), with axis=0

Attribute broadcast=1 needs to be passed to enable broadcasting.

Version

This version of the operator has been available since version 1 of the default ONNX operator set.

Attributes

<dl> <dt><tt>axis</tt> : int</dt> <dd>If set, defines the broadcast dimensions. See doc for details.</dd> <dt><tt>broadcast</tt> : int (default is 0)</dt> <dd>Pass 1 to enable broadcasting</dd> </dl>

Inputs

<dl> <dt><tt>X</tt> : T</dt> <dd>Input tensor of any shape, base of the exponent.</dd> <dt><tt>Y</tt> : T</dt> <dd>Input tensor of any shape broadcastable to X shape, the exponent component.</dd> </dl>

Outputs

<dl> <dt><tt>Z</tt> : T</dt> <dd>Output tensor (same size as X)</dd> </dl>

Type Constraints

<dl> <dt><tt>T</tt> : tensor(float16), tensor(float), tensor(double)</dt> <dd>Constrain input and output types to float tensors.</dd> </dl>

<a name="RNN-1"></a>RNN-1</a>

Computes an one-layer simple RNN. This operator is usually supported via some custom implementation such as CuDNN.

Notations:

X - input tensor

i - input gate

t - time step (t-1 means previous time step)

Wi - W parameter weight matrix for input gate

Ri - R recurrence weight matrix for input gate

Wbi - W parameter bias vector for input gate

Rbi - R parameter bias vector for input gate

WBi - W parameter weight matrix for backward input gate

RBi - R recurrence weight matrix for backward input gate

WBbi - WR bias vectors for backward input gate

RBbi - RR bias vectors for backward input gate

H - Hidden state

num_directions - 2 if direction == bidirectional else 1

Activation functions:

Relu(x)                - max(0, x)

Tanh(x)                - (1 - e^{-2x})/(1 + e^{-2x})

Sigmoid(x)             - 1/(1 + e^{-x})

(NOTE: Below are optional)

Affine(x)              - alpha*x + beta

LeakyRelu(x)           - x if x >= 0 else alpha * x

ThresholdedRelu(x)     - x if x >= alpha else 0

ScaledTanh(x)          - alpha*Tanh(beta*x)

HardSigmoid(x)         - min(max(alpha*x + beta, 0), 1)

Elu(x)                 - x if x >= 0 else alpha*(e^x - 1)

Softsign(x)            - x/(1 + |x|)

Softplus(x)            - log(1 + e^x)

Equations (Default: f=Tanh):

- Ht = f(Xt*(Wi^T) + Ht-1*Ri + Wbi + Rbi)

Version

This version of the operator has been available since version 1 of the default ONNX operator set.

Attributes

Inputs (3 - 6)

Outputs (0 - 2)

<dl> <dt><tt>Y</tt> (optional) : T</dt> <dd>A tensor that concats all the intermediate output values of the hidden. It has shape `[seq_length, num_directions, batch_size, hidden_size]`. It is optional if `output_sequence` is 0.</dd> <dt><tt>Y_h</tt> (optional) : T</dt> <dd>The last output value of the hidden. It has shape `[num_directions, batch_size, hidden_size]`.</dd> </dl>

Type Constraints

<a name="RandomNormal-1"></a>RandomNormal-1</a>

Generate a tensor with random values drawn from a normal distribution. The shape of the tensor is specified by the shape argument and the parameter of the normal distribution specified by mean and scale.

The data type is specified by the 'dtype' argument. The 'dtype' argument must be one of the data types specified in the 'DataType' enum field in the TensorProto message.

Version

This version of the operator has been available since version 1 of the default ONNX operator set.

Attributes

<dl> <dt><tt>dtype</tt> : int (default is 1)</dt> <dd>The data type for the elements of the output tensor. Default is TensorProto::FLOAT.</dd> <dt><tt>mean</tt> : float (default is 0.0)</dt> <dd>The mean of the normal distribution.</dd> <dt><tt>scale</tt> : float (default is 1.0)</dt> <dd>The standard deviation of the normal distribution.</dd> <dt><tt>seed</tt> : float</dt> <dd>(Optional) Seed to the random generator, if not specified we will auto generate one.</dd> <dt><tt>shape</tt> : list of ints (required)</dt> <dd>The shape of the output tensor.</dd> </dl>

Inputs

Outputs

<dl> <dt><tt>output</tt> : T</dt> <dd>Output tensor of random values drawn from normal distribution</dd> </dl>

Type Constraints

<dl> <dt><tt>T</tt> : tensor(float16), tensor(float), tensor(double)</dt> <dd>Constrain output types to float tensors.</dd> </dl>

<a name="RandomNormalLike-1"></a>RandomNormalLike-1</a>

Generate a tensor with random values drawn from a normal distribution. The shape of the output tensor is copied from the shape of the input tensor, and the parameters of the normal distribution are specified by mean and scale.

The data type is specified by the 'dtype' argument, or copied from the input tensor if not provided. The 'dtype' argument must be one of the data types specified in the 'DataType' enum field in the TensorProto message, and be valid as an output type.

Version

This version of the operator has been available since version 1 of the default ONNX operator set.

Attributes

<dl> <dt><tt>dtype</tt> : int</dt> <dd>(Optional) The data type for the elements of the output tensor, if not specified, we will use the data type of the input tensor.</dd> <dt><tt>mean</tt> : float (default is 0.0)</dt> <dd>The mean of the normal distribution.</dd> <dt><tt>scale</tt> : float (default is 1.0)</dt> <dd>The standard deviation of the normal distribution.</dd> <dt><tt>seed</tt> : float</dt> <dd>(Optional) Seed to the random generator, if not specified we will auto generate one.</dd> </dl>

Inputs

<dl> <dt><tt>input</tt> : T1</dt> <dd>Input tensor to copy shape and optionally type information from.</dd> </dl>

Outputs

<dl> <dt><tt>output</tt> : T2</dt> <dd>Output tensor of random values drawn from normal distribution</dd> </dl>

Type Constraints

<dl> <dt><tt>T1</tt> : tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(float16), tensor(float), tensor(double), tensor(string), tensor(bool), tensor(complex64), tensor(complex128)</dt> <dd>Constrain to any tensor type. If the dtype attribute is not provided this must be a valid output type.</dd> <dt><tt>T2</tt> : tensor(float16), tensor(float), tensor(double)</dt> <dd>Constrain output types to float tensors.</dd> </dl>

<a name="RandomUniform-1"></a>RandomUniform-1</a>

Generate a tensor with random values drawn from a uniform distribution. The shape of the tensor is specified by the shape argument and the range by low and high.

The data type is specified by the 'dtype' argument. The 'dtype' argument must be one of the data types specified in the 'DataType' enum field in the TensorProto message.

Version

This version of the operator has been available since version 1 of the default ONNX operator set.

Attributes

<dl> <dt><tt>dtype</tt> : int (default is 1)</dt> <dd>The data type for the elements of the output tensor. If not specified, default is TensorProto::FLOAT.</dd> <dt><tt>high</tt> : float (default is 1.0)</dt> <dd>Upper boundary of the output values.</dd> <dt><tt>low</tt> : float (default is 0.0)</dt> <dd>Lower boundary of the output values.</dd> <dt><tt>seed</tt> : float</dt> <dd>(Optional) Seed to the random generator, if not specified we will auto generate one.</dd> <dt><tt>shape</tt> : list of ints (required)</dt> <dd>The shape of the output tensor.</dd> </dl>

<dl> <dt><tt>X</tt> : T</dt> <dd>Input tensor</dd> </dl>

Outputs

<dl> <dt><tt>Y</tt> : T</dt> <dd>Output tensor</dd> </dl>

Type Constraints

<dl> <dt><tt>T</tt> : tensor(float16), tensor(float), tensor(double)</dt> <dd>Constrain input and output types to float tensors.</dd> </dl>

<a name="ReduceL1-1"></a>ReduceL1-1</a>

Computes the L1 norm of the input tensor's element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equal 0, then the resulted tensor have the reduced dimension pruned. Input tensors of rank zero are valid. Reduction over an empty set of values yields 0.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Version

This version of the operator has been available since version 1 of the default ONNX operator set.

Attributes

<dl> <dt><tt>axes</tt> : list of ints</dt> <dd>A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor.</dd> <dt><tt>keepdims</tt> : int (default is 1)</dt> <dd>Keep the reduced dimension or not, default 1 means keep reduced dimension.</dd> </dl>

Inputs

<dl> <dt><tt>data</tt> : T</dt> <dd>An input tensor.</dd> </dl>

Outputs

<dl> <dt><tt>reduced</tt> : T</dt> <dd>Reduced output tensor.</dd> </dl>

Type Constraints

This version of the operator has been available since version 1 of the default ONNX operator set.

Inputs

<dl> <dt><tt>data</tt> : T</dt> <dd>An input tensor.</dd> </dl>

Outputs

<dl> <dt><tt>shape</tt> : T1</dt> <dd>Shape of the input tensor</dd> </dl>

Type Constraints

<a name="Sigmoid-1"></a>Sigmoid-1</a>

Sigmoid takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the sigmoid function, y = 1 / (1 + exp(-x)), is applied to the tensor elementwise.

Version

This version of the operator has been available since version 1 of the default ONNX operator set.

Attributes

<dl> <dt><tt>consumed_inputs</tt> : list of ints</dt> <dd>legacy optimization attribute.</dd> </dl>

Inputs

<dl> <dt><tt>X</tt> : T</dt> <dd>Input tensor</dd> </dl>

Outputs

<dl> <dt><tt>Y</tt> : T</dt> <dd>Output tensor</dd> </dl>

Type Constraints

<dl> <dt><tt>T</tt> : tensor(float16), tensor(float), tensor(double)</dt> <dd>Constrain input and output types to float tensors.</dd> </dl>

<a name="Size-1"></a>Size-1</a>

Takes a tensor as input and outputs a int64 scalar that equals to the total number of elements of the input tensor.

Version

This version of the operator has been available since version 1 of the default ONNX operator set.

Inputs

<dl> <dt><tt>data</tt> : T</dt> <dd>An input tensor.</dd> </dl>

Outputs

<dl> <dt><tt>size</tt> : T1</dt> <dd>Total number of elements of the input tensor</dd> </dl>

Type Constraints

<a name="Slice-1"></a>Slice-1</a>

Produces a slice of the input tensor along multiple axes. Similar to numpy: https://numpy.org/doc/stable/reference/routines.indexing.html Slices uses axes, starts and ends attributes to specify the start and end dimension for each axis in the list of axes, it uses this information to slice the input data tensor. If a negative value is passed for any of the start or end indices, it represent number of elements before the end of that dimension. If the value passed to start or end is larger than the n (the number of elements in this dimension), it represents n. For slicing to the end of a dimension with unknown size, it is recommended to pass in INT_MAX. If axes are omitted, they are set to [0, ..., ndim-1]. Example 1: data = [ [1, 2, 3, 4], [5, 6, 7, 8], ] axes = [0, 1] starts = [1, 0] ends = [2, 3] result = [ [5, 6, 7], ] Example 2: data = [ [1, 2, 3, 4], [5, 6, 7, 8], ] starts = [0, 1] ends = [-1, 1000] result = [ [2, 3, 4], ]

Version

This version of the operator has been available since version 1 of the default ONNX operator set.

Attributes

<dl> <dt><tt>axes</tt> : list of ints</dt> <dd>Axes that `starts` and `ends` apply to. It's optional. If not present, will be treated as [0, 1, ..., len(`starts`) - 1].</dd> <dt><tt>ends</tt> : list of ints (required)</dt> <dd>Ending indices (exclusive) of corresponding axis in axes`</dd> <dt><tt>starts</tt> : list of ints (required)</dt> <dd>Starting indices of corresponding axis in `axes`</dd> </dl>

Inputs

<dl> <dt><tt>data</tt> : T</dt> <dd>Tensor of data to extract slices from.</dd> </dl>

Outputs

<dl> <dt><tt>output</tt> : T</dt> <dd>Sliced data tensor.</dd> </dl>

<dl> <dt><tt>input</tt> : T</dt> <dd>Input tensor of [N,C,H,W], where N is the batch axis, C is the channel or depth, H is the height and W is the width.</dd> </dl>

Outputs

<dl> <dt><tt>output</tt> : T</dt> <dd>Output tensor of [N, C * blocksize * blocksize, H/blocksize, W/blocksize].</dd> </dl>

<dl> <dt><tt>axes</tt> : list of ints</dt> <dd>List of non-negative integers, indicate the dimensions to squeeze.</dd> </dl>

Inputs

<dl> <dt><tt>data</tt> : T</dt> <dd>Tensors with at least max(dims) dimensions.</dd> </dl>

Outputs

<dl> <dt><tt>squeezed</tt> : T</dt> <dd>Reshaped tensor with same data as input.</dd> </dl>

Type Constraints

<a name="Sub-1"></a>Sub-1</a>

Performs element-wise binary subtraction (with limited broadcast support).

For example, the following tensor shapes are supported (with broadcast=1):

shape(A) = (2, 3, 4, 5), shape(B) = (,), i.e. B is a scalar tensor
shape(A) = (2, 3, 4, 5), shape(B) = (1, 1), i.e. B is an 1-element tensor
shape(A) = (2, 3, 4, 5), shape(B) = (5,)
shape(A) = (2, 3, 4, 5), shape(B) = (4, 5)
shape(A) = (2, 3, 4, 5), shape(B) = (3, 4), with axis=1
shape(A) = (2, 3, 4, 5), shape(B) = (2), with axis=0

<dl> <dt><tt>perm</tt> : list of ints</dt> <dd>A list of integers. By default, reverse the dimensions, otherwise permute the axes according to the values given.</dd> </dl>

Inputs

<dl> <dt><tt>data</tt> : T</dt> <dd>An input tensor.</dd> </dl>

Outputs

<dl> <dt><tt>transposed</tt> : T</dt> <dd>Transposed output.</dd> </dl>

Type Constraints

<dl> <dt><tt>input</tt> : T</dt> <dd>The tensor to split</dd> </dl>

Outputs (1 - ∞)

<dl> <dt><tt>outputs</tt> (variadic) : T</dt> <dd>One or more outputs forming list of tensors after splitting</dd> </dl>

Type Constraints

Version 3 of the default ONNX operator set

<a name="GRU-3"></a>GRU-3</a>