Back to Ncnn

Operators

docs/developer-guide/operators.md

latest75.0 KB
Original Source

AbsVal

y = abs(x)
  • one_blob_only
  • support_inplace

ArgMax

y = argmax(x, out_max_val, topk)
  • one_blob_only
param idnametypedefaultdescription
0out_max_valint0
1topkint1

BatchNorm

y = (x - mean) / sqrt(var + eps) * slope + bias
  • one_blob_only
  • support_inplace
param idnametypedefaultdescription
0channelsint0
1epsfloat0.f
weighttypeshape
slope_datafloat[channels]
mean_datafloat[channels]
var_datafloat[channels]
bias_datafloat[channels]

Bias

y = x + bias
  • one_blob_only
  • support_inplace
param idnametypedefaultdescription
0bias_data_sizeint0
weighttypeshape
bias_datafloat[channels]

BinaryOp

This operation is used for binary computation, and the calculation rule depends on the broadcasting rule.

C = binaryop(A, B)

if with_scalar = 1:

  • one_blob_only
  • support_inplace
param idnametypedefaultdescription
0op_typeint0Operation type as follows
1with_scalarint0with_scalar=0 B is a matrix, with_scalar=1 B is a scalar
2bfloat0.fWhen B is a scalar, B = b

Operation type:

  • 0 = ADD
  • 1 = SUB
  • 2 = MUL
  • 3 = DIV
  • 4 = MAX
  • 5 = MIN
  • 6 = POW
  • 7 = RSUB
  • 8 = RDIV
  • 9 = RPOW
  • 10 = ATAN2
  • 11 = RATAN2

BNLL

y = log(1 + e^(-x)) , x > 0
y = log(1 + e^x),     x < 0
  • one_blob_only
  • support_inplace

Cast

y = cast(x)
  • one_blob_only
  • support_packing
param idnametypedefaultdescription
0type_fromint0
1type_toint0

Element type:

  • 0 = auto
  • 1 = float32
  • 2 = float16
  • 3 = int8
  • 4 = bfloat16

CELU

if x < 0    y = (exp(x / alpha) - 1.f) * alpha
else        y = x
  • one_blob_only
  • support_inplace
param idnametypedefaultdescription
0alphafloat1.f

Clip

y = clamp(x, min, max)
  • one_blob_only
  • support_inplace
param idnametypedefaultdescription
0minfloat-FLT_MAX
1maxfloatFLT_MAX

Concat

y = concat(x0, x1, x2, ...) by axis
param idnametypedefaultdescription
0axisint0

Convolution

x2 = pad(x, pads, pad_value)
x3 = conv(x2, weight, kernel, stride, dilation) + bias
y = activation(x3, act_type, act_params)
  • one_blob_only
param idnametypedefaultdescription
0num_outputint0
1kernel_wint0
2dilation_wint1
3stride_wint1
4pad_leftint0
5bias_termint0
6weight_data_sizeint0
8int8_scale_termint0
9activation_typeint0
10activation_paramsarray[ ]
11kernel_hintkernel_w
12dilation_hintdilation_w
13stride_hintstride_w
14pad_topintpad_left
15pad_rightintpad_left
16pad_bottomintpad_top
18pad_valuefloat0.f
19dynamic_weightint0
weighttypeshape
weight_datafloat/fp16/int8[kernel_w, kernel_h, num_input, num_output]
bias_datafloat[num_output]
weight_data_int8_scalesfloat[num_output]
bottom_blob_int8_scalesfloat[1]
top_blob_int8_scalesfloat[1]

Convolution1D

x2 = pad(x, pads, pad_value)
x3 = conv1d(x2, weight, kernel, stride, dilation) + bias
y = activation(x3, act_type, act_params)
  • one_blob_only
param idnametypedefaultdescription
0num_outputint0
1kernel_wint0
2dilation_wint1
3stride_wint1
4pad_leftint0
5bias_termint0
6weight_data_sizeint0
9activation_typeint0
10activation_paramsarray[ ]
15pad_rightintpad_left
18pad_valuefloat0.f
19dynamic_weightint0
weighttypeshape
weight_datafloat/fp16/int8[kernel_w, num_input, num_output]
bias_datafloat[num_output]

Convolution3D

x2 = pad(x, pads, pad_value)
x3 = conv3d(x2, weight, kernel, stride, dilation) + bias
y = activation(x3, act_type, act_params)
  • one_blob_only
param idnametypedefaultdescription
0num_outputint0
1kernel_wint0
2dilation_wint1
3stride_wint1
4pad_leftint0
5bias_termint0
6weight_data_sizeint0
9activation_typeint0
10activation_paramsarray[ ]
11kernel_hintkernel_w
12dilation_hintdilation_w
13stride_hintstride_w
14pad_topintpad_left
15pad_rightintpad_left
16pad_bottomintpad_top
17pad_behindintpad_front
18pad_valuefloat0.f
21kernel_dintkernel_w
22dilation_dintdilation_w
23stride_dintstride_w
24pad_frontintpad_left
weighttypeshape
weight_datafloat/fp16/int8[kernel_w, kernel_h, kernel_d, num_input, num_output]
bias_datafloat[num_output]

ConvolutionDepthWise

x2 = pad(x, pads, pad_value)
x3 = conv(x2, weight, kernel, stride, dilation, group) + bias
y = activation(x3, act_type, act_params)
  • one_blob_only
param idnametypedefaultdescription
0num_outputint0
1kernel_wint0
2dilation_wint1
3stride_wint1
4pad_leftint0
5bias_termint0
6weight_data_sizeint0
7groupint1
8int8_scale_termint0
9activation_typeint0
10activation_paramsarray[ ]
11kernel_hintkernel_w
12dilation_hintdilation_w
13stride_hintstride_w
14pad_topintpad_left
15pad_rightintpad_left
16pad_bottomintpad_top
18pad_valuefloat0.f
19dynamic_weightint0
weighttypeshape
weight_datafloat/fp16/int8[kernel_w, kernel_h, num_input / group, num_output / group, group]
bias_datafloat[num_output]
weight_data_int8_scalesfloat[group]
bottom_blob_int8_scalesfloat[1]
top_blob_int8_scalesfloat[1]

ConvolutionDepthWise1D

x2 = pad(x, pads, pad_value)
x3 = conv1d(x2, weight, kernel, stride, dilation, group) + bias
y = activation(x3, act_type, act_params)
  • one_blob_only
param idnametypedefaultdescription
0num_outputint0
1kernel_wint0
2dilation_wint1
3stride_wint1
4pad_leftint0
5bias_termint0
6weight_data_sizeint0
7groupint1
9activation_typeint0
10activation_paramsarray[ ]
15pad_rightintpad_left
18pad_valuefloat0.f
19dynamic_weightint0
weighttypeshape
weight_datafloat/fp16/int8[kernel_w, num_input / group, num_output / group, group]
bias_datafloat[num_output]

ConvolutionDepthWise3D

x2 = pad(x, pads, pad_value)
x3 = conv3d(x2, weight, kernel, stride, dilation, group) + bias
y = activation(x3, act_type, act_params)
  • one_blob_only
param idnametypedefaultdescription
0num_outputint0
1kernel_wint0
2dilation_wint1
3stride_wint1
4pad_leftint0
5bias_termint0
6weight_data_sizeint0
7groupint1
9activation_typeint0
10activation_paramsarray[ ]
11kernel_hintkernel_w
12dilation_hintdilation_w
13stride_hintstride_w
14pad_topintpad_left
15pad_rightintpad_left
16pad_bottomintpad_top
17pad_behindintpad_front
18pad_valuefloat0.f
21kernel_dintkernel_w
22dilation_dintdilation_w
23stride_dintstride_w
24pad_frontintpad_left
weighttypeshape
weight_datafloat/fp16/int8[kernel_w, kernel_h, kernel_d, num_input / group, num_output / group, group]
bias_datafloat[num_output]

CopyTo

self[offset] = src
  • one_blob_only
param idnametypedefaultdescription
0woffsetint0
1hoffsetint0
13doffsetint0
2coffsetint0
9startsarray[ ]
11axesarray[ ]

Crop

y = crop(x)
  • one_blob_only
param idnametypedefaultdescription
0woffsetint0
1hoffsetint0
13doffsetint0
2coffsetint0
3outwint0
4outhint0
14outdint0
5outcint0
6woffset2int0
7hoffset2int0
15doffset2int0
8coffset2int0
9startsarray[ ]
10endsarray[ ]
11axesarray[ ]
19starts_exprstr""
20ends_exprstr""
21axes_exprstr""

CumulativeSum

If axis < 0, we use axis = x.dims + axis

It implements https://pytorch.org/docs/stable/generated/torch.cumsum.html

  • one_blob_only
  • support_inplace
param idnametypedefaultdescription
0axisint0

Deconvolution

x2 = deconv(x, weight, kernel, stride, dilation) + bias
x3 = depad(x2, pads, pad_value)
y = activation(x3, act_type, act_params)
  • one_blob_only
param idnametypedefaultdescription
0num_outputint0
1kernel_wint0
2dilation_wint1
3stride_wint1
4pad_leftint0
5bias_termint0
6weight_data_sizeint0
9activation_typeint0
10activation_paramsarray[ ]
11kernel_hintkernel_w
12dilation_hintdilation_w
13stride_hintstride_w
14pad_topintpad_left
15pad_rightintpad_left
16pad_bottomintpad_top
18output_pad_rightint0
19output_pad_bottomintoutput_pad_right
20output_wint0
21output_hintoutput_w
28dynamic_weightint0
weighttypeshape
weight_datafloat/fp16[kernel_w, kernel_h, num_input, num_output]
bias_datafloat[num_output]

Deconvolution1D

x2 = deconv1d(x, weight, kernel, stride, dilation) + bias
x3 = depad(x2, pads, pad_value)
y = activation(x3, act_type, act_params)
  • one_blob_only
param idnametypedefaultdescription
0num_outputint0
1kernel_wint0
2dilation_wint1
3stride_wint1
4pad_leftint0
5bias_termint0
6weight_data_sizeint0
9activation_typeint0
10activation_paramsarray[ ]
15pad_rightintpad_left
18output_pad_rightint0
20output_wint0
28dynamic_weightint0
weighttypeshape
weight_datafloat/fp16[kernel_w, num_input, num_output]
bias_datafloat[num_output]

Deconvolution3D

x2 = deconv3d(x, weight, kernel, stride, dilation) + bias
x3 = depad(x2, pads, pad_value)
y = activation(x3, act_type, act_params)
  • one_blob_only
param idnametypedefaultdescription
0num_outputint0
1kernel_wint0
2dilation_wint1
3stride_wint1
4pad_leftint0
5bias_termint0
6weight_data_sizeint0
9activation_typeint0
10activation_paramsarray[ ]
11kernel_hintkernel_w
12dilation_hintdilation_w
13stride_hintstride_w
14pad_topintpad_left
15pad_rightintpad_left
16pad_bottomintpad_top
17pad_behindintpad_front
18output_pad_rightint0
19output_pad_bottomintoutput_pad_right
20output_pad_behindintoutput_pad_right
21kernel_dintkernel_w
22dilation_dintdilation_w
23stride_dintstride_w
24pad_frontintpad_left
25output_wint0
26output_hintoutput_w
27output_dintoutput_w
weighttypeshape
weight_datafloat/fp16[kernel_w, kernel_h, kernel_d, num_input, num_output]
bias_datafloat[num_output]

DeconvolutionDepthWise

x2 = deconv(x, weight, kernel, stride, dilation, group) + bias
x3 = depad(x2, pads, pad_value)
y = activation(x3, act_type, act_params)
  • one_blob_only
param idnametypedefaultdescription
0num_outputint0
1kernel_wint0
2dilation_wint1
3stride_wint1
4pad_leftint0
5bias_termint0
6weight_data_sizeint0
7groupint1
9activation_typeint0
10activation_paramsarray[ ]
11kernel_hintkernel_w
12dilation_hintdilation_w
13stride_hintstride_w
14pad_topintpad_left
15pad_rightintpad_left
16pad_bottomintpad_top
18output_pad_rightint0
19output_pad_bottomintoutput_pad_right
20output_wint0
21output_hintoutput_w
28dynamic_weightint0
weighttypeshape
weight_datafloat/fp16[kernel_w, kernel_h, num_input / group, num_output / group, group]
bias_datafloat[num_output]

DeconvolutionDepthWise1D

x2 = deconv1d(x, weight, kernel, stride, dilation, group) + bias
x3 = depad(x2, pads, pad_value)
y = activation(x3, act_type, act_params)
  • one_blob_only
param idnametypedefaultdescription
0num_outputint0
1kernel_wint0
2dilation_wint1
3stride_wint1
4pad_leftint0
5bias_termint0
6weight_data_sizeint0
7groupint1
9activation_typeint0
10activation_paramsarray[ ]
15pad_rightintpad_left
18output_pad_rightint0
20output_wint0
28dynamic_weightint0
weighttypeshape
weight_datafloat/fp16[kernel_w, num_input / group, num_output / group, group]
bias_datafloat[num_output]

DeconvolutionDepthWise3D

x2 = deconv3d(x, weight, kernel, stride, dilation, group) + bias
x3 = depad(x2, pads, pad_value)
y = activation(x3, act_type, act_params)
  • one_blob_only
param idnametypedefaultdescription
0num_outputint0
1kernel_wint0
2dilation_wint1
3stride_wint1
4pad_leftint0
5bias_termint0
6weight_data_sizeint0
7groupint1
9activation_typeint0
10activation_paramsarray[ ]
11kernel_hintkernel_w
12dilation_hintdilation_w
13stride_hintstride_w
14pad_topintpad_left
15pad_rightintpad_left
16pad_bottomintpad_top
17pad_behindintpad_front
18output_pad_rightint0
19output_pad_bottomintoutput_pad_right
20output_pad_behindintoutput_pad_right
21kernel_dintkernel_w
22dilation_dintdilation_w
23stride_dintstride_w
24pad_frontintpad_left
25output_wint0
26output_hintoutput_w
27output_dintoutput_w
weighttypeshape
weight_datafloat/fp16[kernel_w, kernel_h, kernel_d, num_input / group, num_output / group, group]
bias_datafloat[num_output]

DeformableConv2D

x2 = deformableconv2d(x, offset, mask, weight, kernel, stride, dilation) + bias
y = activation(x2, act_type, act_params)
param idnametypedefaultdescription
0num_outputint0
1kernel_wint0
2dilation_wint1
3stride_wint1
4pad_leftint0
5bias_termint0
6weight_data_sizeint0
9activation_typeint0
10activation_paramsarray[ ]
11kernel_hintkernel_w
12dilation_hintdilation_w
13stride_hintstride_w
14pad_topintpad_left
15pad_rightintpad_left
16pad_bottomintpad_top
weighttypeshape
weight_datafloat/fp16/int8[kernel_w, kernel_h, num_input, num_output]
bias_datafloat[num_output]

Dequantize

y = x * scale + bias
  • one_blob_only
  • support_inplace
param idnametypedefaultdescription
0scale_data_sizeint1
1bias_data_sizeint0
weighttypeshape
scale_datafloat[scale_data_size]
bias_datafloat[bias_data_size]

Diag

y = diag(x, diagonal)
  • one_blob_only
param idnametypedefaultdescription
0diagonalint0

Dropout

y = x * scale
  • one_blob_only
param idnametypedefaultdescription
0scalefloat1.f

Eltwise

y = elementwise_op(x0, x1, ...)
param idnametypedefaultdescription
0op_typeint0
1coeffsarray[ ]

Operation type:

  • 0 = PROD
  • 1 = SUM
  • 2 = MAX

ELU

if x < 0    y = (exp(x) - 1) * alpha
else        y = x
  • one_blob_only
  • support_inplace
param idnametypedefaultdescription
0alphafloat0.1f

Embed

y = embedding(x)
param idnametypedefaultdescription
0num_outputint0
1input_dimint0
2bias_termint0
3weight_data_sizeint0
18int8_scale_termint0
weighttypeshape
weight_datafloat[weight_data_size]
bias_termfloat[num_output]
weight_data_int8_scalesfloat[1]

Exp

if base == -1   y = exp(shift + x * scale)
else            y = pow(base, (shift + x * scale))
  • one_blob_only
  • support_inplace
param idnametypedefaultdescription
0basefloat-1.f
1scalefloat1.f
2shiftfloat0.f

ExpandDims

  • one_blob_only
param idnametypedefaultdescription
3axesarray[ ]

Flatten

Reshape blob to 1 dimension

  • one_blob_only

Flip

  • one_blob_only
param idnametypedefaultdescription
0axesarray[ ]

Fold

y = fold(x)
  • one_blob_only
param idnametypedefaultdescription
0num_outputint0
1kernel_wint0
2dilation_wint1
3stride_wint1
4pad_leftint0
11kernel_hintkernel_w
12dilation_hintdilation_w
13stride_hintstride_w
14pad_topintpad_left
15pad_rightintpad_left
16pad_bottomintpad_top
20output_wint0
21output_hintoutput_w

GELU

if fast_gelu == 1   y = 0.5 * x * (1 + tanh(0.79788452 * (x + 0.044715 * x * x * x)));
else                y = 0.5 * x * erfc(-0.70710678 * x)
  • one_blob_only
  • support_inplace
param idnametypedefaultdescription
0fast_geluint0use approximation

GLU

If axis < 0, we use axis = x.dims + axis

GLU(a,b)=a⊗σ(b)

where a is the first half of the input matrix and b is the second half.

axis specifies the dimension to split the input

  • one_blob_only
param idnametypedefaultdescription
0axisint0

Gemm

a = transA ? transpose(x0) : x0
b = transb ? transpose(x1) : x1
c = x2
y = (gemm(a, b) + c * beta) * alpha
param idnametypedefaultdescription
0alphafloat1.f
1betafloat1.f
2transAint0
3transbint0
4constantAint0
5constantBint0
6constantCint0
7constantMint0
8constantNint0
9constantKint0
10constant_broadcast_type_Cint0
11output_N1Mint0
12output_elempackint0
13output_elemtypeint0
14output_transposeint0
18int8_scale_termint0
20constant_TILE_Mint0
21constant_TILE_Nint0
22constant_TILE_Kint0
weighttypeshape
A_datafloat/fp16/int8[M, K] or [K, M]
B_datafloat/fp16/int8[N, K] or [K, N]
C_datafloat[1], [M] or [N] or [1, M] or [N,1] or [N, M]
A_data_int8_scalesfloat[M]
B_data_int8_scalesfloat[1]

GridSample

Given an input and a flow-field grid, computes the output using input values and pixel locations from grid.

For each output location output[:, h2, w2], the size-2 vector grid[h2, w2, 2] specifies input pixel[:, h1, w1] locations x and y, 
which are used to interpolate the output value output[:, h2, w2]

This function is often used in conjunction with affine_grid() to build Spatial Transformer Networks .
param idnametypedefaultdescription
0sample_typeint1
1padding_modeint1
2align_cornerint0
3permute_fusionint0fuse with permute

Sample type:

  • 1 = Nearest
  • 2 = Bilinear
  • 3 = Bicubic

Padding mode:

  • 1 = zeros
  • 2 = border
  • 3 = reflection

GroupNorm

split x along channel axis into group x0, x1 ...
l2 normalize for each group x0, x1 ...
y = x * gamma + beta
  • one_blob_only
  • support_inplace
param idnametypedefaultdescription
0groupint1
1channelsint0
2epsfloat0.001fx = x / sqrt(var + eps)
3affineint1
weighttypeshape
gamma_datafloat[channels]
beta_datafloat[channels]

GRU

Apply a single-layer GRU to a feature sequence of T timesteps. The input blob shape is [w=input_size, h=T] and the output blob shape is [w=num_output, h=T].

y = gru(x)
y0, hidden y1 = gru(x0, hidden x1)
  • one_blob_only if bidirectional
param idnametypedefaultdescription
0num_outputint0hidden size of output
1weight_data_sizeint0total size of weight matrix
2directionint00=forward, 1=reverse, 2=bidirectional
weighttypeshape
weight_xc_datafloat/fp16/int8[input_size, num_output * 3, num_directions]
bias_c_datafloat/fp16/int8[num_output, 4, num_directions]
weight_hc_datafloat/fp16/int8[num_output, num_output * 3, num_directions]

Direction flag:

  • 0 = forward only
  • 1 = reverse only
  • 2 = bidirectional

HardSigmoid

y = clamp(x * alpha + beta, 0, 1)
  • one_blob_only
  • support_inplace
param idnametypedefaultdescription
0alphafloat0.2f
1betafloat0.5f

HardSwish

y = x * clamp(x * alpha + beta, 0, 1)
  • one_blob_only
  • support_inplace
param idnametypedefaultdescription
0alphafloat0.2f
1betafloat0.5f

InnerProduct

x2 = innerproduct(x, weight) + bias
y = activation(x2, act_type, act_params)
  • one_blob_only
param idnametypedefaultdescription
0num_outputint0
1bias_termint0
2weight_data_sizeint0
8int8_scale_termint0
9activation_typeint0
10activation_paramsarray[ ]
weighttypeshape
weight_datafloat/fp16/int8[num_input, num_output]
bias_datafloat[num_output]
weight_data_int8_scalesfloat[num_output]
bottom_blob_int8_scalesfloat[1]

Input

y = input
  • support_inplace
param idnametypedefaultdescription
0wint0
1hint0
11dint0
2cint0

InstanceNorm

split x along channel axis into instance x0, x1 ...
l2 normalize for each channel instance x0, x1 ...
y = x * gamma + beta
  • one_blob_only
  • support_inplace
param idnametypedefaultdescription
0channelsint0
1epsfloat0.001fx = x / sqrt(var + eps)
2affineint1
weighttypeshape
gamma_datafloat[channels]
beta_datafloat[channels]

Interp

if dynamic_target_size == 0     y = resize(x) by fixed size or scale
else                            y = resize(x0, size(x1))
  • one_blob_only if dynamic_target_size == 0
param idnametypedefaultdescription
0resize_typeint0
1height_scalefloat1.f
2width_scalefloat1.f
3output_heightint0
4output_widthint0
5dynamic_target_sizeint0
6align_cornerint0
9size_exprstr""

Resize type:

  • 1 = Nearest
  • 2 = Bilinear
  • 3 = Bicubic

InverseSpectrogram

x1 = x as complex
x1 = x1 * sqrt(norm) if normalized
y = istft(x1)
y1 = unpad(y) if center

if returns == 0 return y1 as complex
if returns == 1 return y1 real
if returns == 2 return y1 imag
  • one_blob_only
param idnametypedefaultdescription
0n_fftint0
1returnsint1
2hoplenintn_fft / 4
3winlenintn_fft
4window_typeint00=ones 1=hann 2=hamming
5centerint1
7normalizedint00=no 1=n_fft 2=window-l2-energy

LayerNorm

split x along outmost axis into part x0, x1 ...
l2 normalize for each part x0, x1 ...
y = x * gamma + beta by elementwise
  • one_blob_only
  • support_inplace
param idnametypedefaultdescription
0affine_sizeint0
1epsfloat0.001fx = x / sqrt(var + eps)
2affineint1
weighttypeshape
gamma_datafloat[affine_size]
beta_datafloat[affine_size]

Log

if base == -1   y = log(shift + x * scale)
else            y = log(shift + x * scale) / log(base)
  • one_blob_only
  • support_inplace
param idnametypedefaultdescription
0basefloat-1.f
1scalefloat1.f
2shiftfloat0.f

LRN

if region_type == ACROSS_CHANNELS   square_sum = sum of channel window of local_size
if region_type == WITHIN_CHANNEL    square_sum = sum of spatial window of local_size
y = x * pow(bias + alpha * square_sum / (local_size * local_size), -beta)
  • one_blob_only
  • support_inplace
param idnametypedefaultdescription
0region_typeint0
1local_sizeint5
2alphafloat1.f
3betafloat0.75f
4biasfloat1.f

Region type:

  • 0 = ACROSS_CHANNELS
  • 1 = WITHIN_CHANNEL

LSTM

Apply a single-layer LSTM to a feature sequence of T timesteps. The input blob shape is [w=input_size, h=T] and the output blob shape is [w=num_output, h=T].

y = lstm(x)
y0, hidden y1, cell y2 = lstm(x0, hidden x1, cell x2)
  • one_blob_only if bidirectional
param idnametypedefaultdescription
0num_outputint0output size of output
1weight_data_sizeint0total size of IFOG weight matrix
2directionint00=forward, 1=reverse, 2=bidirectional
3hidden_sizeintnum_outputhidden size
weighttypeshape
weight_xc_datafloat/fp16/int8[input_size, hidden_size * 4, num_directions]
bias_c_datafloat/fp16/int8[hidden_size, 4, num_directions]
weight_hc_datafloat/fp16/int8[num_output, hidden_size * 4, num_directions]
weight_hr_datafloat/fp16/int8[hidden_size, num_output, num_directions]

Direction flag:

  • 0 = forward only
  • 1 = reverse only
  • 2 = bidirectional

MemoryData

y = data
param idnametypedefaultdescription
0wint0
1hint0
11dint0
2cint0
21load_typeint11=fp32
weighttypeshape
datafloat[w, h, d, c]

Mish

y = x * tanh(log(exp(x) + 1))
  • one_blob_only
  • support_inplace

MultiHeadAttention

q_affine = affine(q) / (embed_dim / num_head)
k_affine = affine(k) or reuse kv_cache part
v_affine = affine(v) or reuse kv_cache part
split q k v into num_head part q0, k0, v0, q1, k1, v1 ...
for each num_head part
    qk = q * k
    qk = qk + attn_mask if attn_mask exists
    softmax(qk)
    qkv = qk * v
    merge qkv to out
y = affine(out)
param idnametypedefaultdescription
0embed_dimint0
1num_headsint1
2weight_data_sizeint0qdim = weight_data_size / embed_dim
3kdimintembed_dim
4vdimintembed_dim
5attn_maskint0
6scalefloat1.f / sqrt(embed_dim / num_heads)
7kv_cacheint0
18int8_scale_termint0
weighttypeshape
q_weight_datafloat/fp16/int8[embed_dim * qdim]
q_bias_datafloat[embed_dim]
k_weight_datafloat/fp16/int8[embed_dim * kdim]
k_bias_datafloat[embed_dim]
v_weight_datafloat/fp16/int8[embed_dim * vdim]
v_bias_datafloat[embed_dim]
out_weight_datafloat/fp16/int8[qdim * embed_dim]
out_bias_datafloat[qdim]
q_weight_data_int8_scalesfloat[embed_dim]
k_weight_data_int8_scalesfloat[embed_dim]
v_weight_data_int8_scalesfloat[embed_dim]
out_weight_data_int8_scalesfloat[1]

MVN

if normalize_variance == 1 && across_channels == 1      y = (x - mean) / (sqrt(var) + eps) of whole blob
if normalize_variance == 1 && across_channels == 0      y = (x - mean) / (sqrt(var) + eps) of each channel
if normalize_variance == 0 && across_channels == 1      y = x - mean of whole blob
if normalize_variance == 0 && across_channels == 0      y = x - mean of each channel
  • one_blob_only
param idnametypedefaultdescription
0normalize_varianceint0
1across_channelsint0
2epsfloat0.0001fx = x / (sqrt(var) + eps)

Noop

y = x

Normalize

if across_spatial == 1 && across_channel == 1      x2 = normalize(x) of whole blob
if across_spatial == 1 && across_channel == 0      x2 = normalize(x) of each channel
if across_spatial == 0 && across_channel == 1      x2 = normalize(x) of each position
y = x2 * scale
  • one_blob_only
  • support_inplace
param idnametypedefaultdescription
0across_spatialint0
1channel_sharedint0
2epsfloat0.0001fsee eps mode
3scale_data_sizeint0
4across_channelint0
9eps_modeint0
weighttypeshape
scale_datafloat[scale_data_size]

Eps Mode:

  • 0 = caffe/mxnet x = x / sqrt(var + eps)
  • 1 = pytorch x = x / max(sqrt(var), eps)
  • 2 = tensorflow x = x / sqrt(max(var, eps))

Packing

y = wrap_packing(x)
  • one_blob_only
param idnametypedefaultdescription
0out_elempackint1
1use_paddingint0
2cast_type_fromint0
3cast_type_toint0
4storage_type_fromint0
5storage_type_toint0

Padding

y = pad(x, pads)
param idnametypedefaultdescription
0topint0
1bottomint0
2leftint0
3rightint0
4typeint0
5valuefloat0
6per_channel_pad_data_sizeint0
7frontintstride_w
8behindintpad_left
weighttypeshape
per_channel_pad_datafloat[per_channel_pad_data_size]

Padding type:

  • 0 = CONSTANT
  • 1 = REPLICATE
  • 2 = REFLECT

Permute

y = reorder(x)
param idnametypedefaultdescription
0order_typeint0

Order Type:

  • 0 = WH WHC WHDC
  • 1 = HW HWC HWDC
  • 2 = WCH WDHC
  • 3 = CWH DWHC
  • 4 = HCW HDWC
  • 5 = CHW DHWC
  • 6 = WHCD
  • 7 = HWCD
  • 8 = WCHD
  • 9 = CWHD
  • 10 = HCWD
  • 11 = CHWD
  • 12 = WDCH
  • 13 = DWCH
  • 14 = WCDH
  • 15 = CWDH
  • 16 = DCWH
  • 17 = CDWH
  • 18 = HDCW
  • 19 = DHCW
  • 20 = HCDW
  • 21 = CHDW
  • 22 = DCHW
  • 23 = CDHW

PixelShuffle

if mode == 0    y = depth_to_space(x) where x channel order is sw-sh-outc
if mode == 1    y = depth_to_space(x) where x channel order is outc-sw-sh
  • one_blob_only
param idnametypedefaultdescription
0upscale_factorint1
1modeint0

Pooling

x2 = pad(x, pads)
x3 = pooling(x2, kernel, stride)
param idnametypedefaultdescription
0pooling_typeint0
1kernel_wint0
2stride_wint1
3pad_leftint0
4global_poolingint0
5pad_modeint0
6avgpool_count_include_padint0
7adaptive_poolingint0
8out_wint0
11kernel_hintkernel_w
12stride_hintstride_w
13pad_topintpad_left
14pad_rightintpad_left
15pad_bottomintpad_top
18out_hintout_w

Pooling type:

  • 0 = MAX
  • 1 = AVG

Pad mode:

  • 0 = full padding
  • 1 = valid padding
  • 2 = tensorflow padding=SAME or onnx padding=SAME_UPPER
  • 3 = onnx padding=SAME_LOWER

Pooling1D

x2 = pad(x, pads)
x3 = pooling1d(x2, kernel, stride)
param idnametypedefaultdescription
0pooling_typeint0
1kernel_wint0
2stride_wint1
3pad_leftint0
4global_poolingint0
5pad_modeint0
6avgpool_count_include_padint0
7adaptive_poolingint0
8out_wint0
14pad_rightintpad_left

Pooling type:

  • 0 = MAX
  • 1 = AVG

Pad mode:

  • 0 = full padding
  • 1 = valid padding
  • 2 = tensorflow padding=SAME or onnx padding=SAME_UPPER
  • 3 = onnx padding=SAME_LOWER

Pooling3D

x2 = pad(x, pads)
x3 = pooling3d(x2, kernel, stride)
param idnametypedefaultdescription
0pooling_typeint0
1kernel_wint0
2stride_wint1
3pad_leftint0
4global_poolingint0
5pad_modeint0
6avgpool_count_include_padint0
7adaptive_poolingint0
8out_wint0
11kernel_hintkernel_w
12stride_hintstride_w
13pad_topintpad_left
14pad_rightintpad_left
15pad_bottomintpad_top
16pad_behindintpad_front
18out_hintout_w
21kernel_dintkernel_w
22stride_dintstride_w
23pad_frontintpad_left
28out_dintout_w

Pooling type:

  • 0 = MAX
  • 1 = AVG

Pad mode:

  • 0 = full padding
  • 1 = valid padding
  • 2 = tensorflow padding=SAME or onnx padding=SAME_UPPER
  • 3 = onnx padding=SAME_LOWER

Power

y = pow((shift + x * scale), power)
  • one_blob_only
  • support_inplace
param idnametypedefaultdescription
0powerfloat1.f
1scalefloat1.f
2shiftfloat0.f

PReLU

if x < 0    y = x * slope
else        y = x
  • one_blob_only
  • support_inplace
param idnametypedefaultdescription
0num_slopeint0
weighttypeshape
slope_datafloat[num_slope]

Quantize

y = float2int8(x * scale)
  • one_blob_only
param idnametypedefaultdescription
0scale_data_sizeint1
weighttypeshape
scale_datafloat[scale_data_size]

Reduction

y = reduce_op(x * coeff)
  • one_blob_only
param idnametypedefaultdescription
0operationint0
1reduce_allint1
2coefffloat1.f
3axesarray[ ]
4keepdimsint0
5fixbug0int0hack for bug fix, should be 1

Operation type:

  • 0 = SUM
  • 1 = ASUM
  • 2 = SUMSQ
  • 3 = MEAN
  • 4 = MAX
  • 5 = MIN
  • 6 = PROD
  • 7 = L1
  • 8 = L2
  • 9 = LogSum
  • 10 = LogSumExp

ReLU

if x < 0    y = x * slope
else        y = x
  • one_blob_only
  • support_inplace
param idnametypedefaultdescription
0slopefloat0.f

Reorg

if mode == 0    y = space_to_depth(x) where x channel order is sw-sh-outc
if mode == 1    y = space_to_depth(x) where x channel order is outc-sw-sh
  • one_blob_only
param idnametypedefaultdescription
0strideint1
1modeint0

Requantize

x2 = x * scale_in + bias
x3 = activation(x2)
y = float2int8(x3 * scale_out)
  • one_blob_only
param idnametypedefaultdescription
0scale_in_data_sizeint1
1scale_out_data_sizeint1
2bias_data_sizeint0
3activation_typeint0
4activation_paramsint[ ]
weighttypeshape
scale_in_datafloat[scale_in_data_size]
scale_out_datafloat[scale_out_data_size]
bias_datafloat[bias_data_size]

Reshape

y = reshape(x)
  • one_blob_only
param idnametypedefaultdescription
0wint-233
1hint-233
11dint-233
2cint-233
6shape_exprstr""

Reshape flag:

  • 0 = copy from bottom
  • -1 = remaining
  • -233 = drop this dim(default)

RMSNorm

split x along outmost axis into part x0, x1 ...
root mean square normalize for each part x0, x1 ...
y = x * gamma by elementwise
  • one_blob_only
  • support_inplace
param idnametypedefaultdescription
0affine_sizeint0
1epsfloat0.001fx = x / sqrt(var + eps)
2affineint1
weighttypeshape
gamma_datafloat[affine_size]

RNN

Apply a single-layer RNN to a feature sequence of T timesteps. The input blob shape is [w=input_size, h=T] and the output blob shape is [w=num_output, h=T].

y = rnn(x)
y0, hidden y1 = rnn(x0, hidden x1)
  • one_blob_only if bidirectional
param idnametypedefaultdescription
0num_outputint0hidden size of output
1weight_data_sizeint0total size of weight matrix
2directionint00=forward, 1=reverse, 2=bidirectional
weighttypeshape
weight_xc_datafloat/fp16/int8[input_size, num_output, num_directions]
bias_c_datafloat/fp16/int8[num_output, 1, num_directions]
weight_hc_datafloat/fp16/int8[num_output, num_output, num_directions]

Direction flag:

  • 0 = forward only
  • 1 = reverse only
  • 2 = bidirectional

RotaryEmbed

Apply rotary positional embeddings with cos and sin cache

y1 = x1 * cos - x2 * sin
y2 = x1 * sin + x2 * cos
param idnametypedefaultdescription
0interleavedint0

Scale

if scale_data_size == -233  y = x0 * x1
else                        y = x * scale + bias
  • one_blob_only if scale_data_size != -233
  • support_inplace
param idnametypedefaultdescription
0scale_data_sizeint0
1bias_termint0
weighttypeshape
scale_datafloat[scale_data_size]
bias_datafloat[scale_data_size]

SDPA

scaled dot product attention
for each num_head part
    qk = q * k
    qk = qk + attn_mask if attn_mask exists
    softmax(qk)
    qkv = qk * v
param idnametypedefaultdescription
5attn_maskint0
6scalefloat0.fauto = 1.f / sqrt(embed_dim)
7kv_cacheint0
18int8_scale_termint0

SELU

if x < 0    y = (exp(x) - 1.f) * alpha * lambda
else        y = x * lambda
  • one_blob_only
  • support_inplace
param idnametypedefaultdescription
0alphafloat1.67326324f
1lambdafloat1.050700987f

Shrink

if x < -lambd y = x + bias
if x >  lambd y = x - bias
else          y = x
  • one_blob_only
  • support_inplace
param idnametypedefaultdescription
0biasfloat0.0f
1lambdfloat0.5f

ShuffleChannel

if reverse == 0     y = shufflechannel(x) by group
if reverse == 1     y = shufflechannel(x) by channel / group
  • one_blob_only
param idnametypedefaultdescription
0groupint1
1reverseint0

Sigmoid

y = 1 / (1 + exp(-x))
  • one_blob_only
  • support_inplace

Slice

split x along axis into slices, each part slice size is based on slices array
param idnametypedefaultdescription
0slicesarray[ ]
1axisint0
2indicesarray[ ]

Softmax

softmax(x, axis)
  • one_blob_only
  • support_inplace
param idnametypedefaultdescription
0axisint0
1fixbug0int0hack for bug fix, should be 1

Softplus

y = log(exp(x) + 1)
  • one_blob_only
  • support_inplace

Spectrogram

x1 = pad(x) if center
y = stft(x1)
y = y / sqrt(norm) if normalized

if power == 0 return y as real
if power == 1 return magnitude
if power == 2 return square of magnitude
  • one_blob_only
param idnametypedefaultdescription
0n_fftint0
1powerint0
2hoplenintn_fft / 4
3winlenintn_fft
4window_typeint00=ones 1=hann 2=hamming
5centerint1
6pad_typeint20=CONSTANT 1=REPLICATE 2=REFLECT
7normalizedint00=no 1=n_fft 2=window-l2-energy
8onesidedint1

Split

y0, y1 ... = x

Squeeze

  • one_blob_only
param idnametypedefaultdescription
0squeeze_wint0
1squeeze_hint0
11squeeze_dint0
2squeeze_cint0
3axesarray[ ]

Swish

y = x / (1 + exp(-x))
  • one_blob_only
  • support_inplace

TanH

y = tanh(x)
  • one_blob_only
  • support_inplace

Threshold

if x > threshold    y = 1
else                y = 0
  • one_blob_only
  • support_inplace
param idnametypedefaultdescription
0thresholdfloat0.f

Tile

y = repeat tiles along axis for x
  • one_blob_only
param idnametypedefaultdescription
0axisint0
1tilesint1
2repeatsarray[ ]

UnaryOp

y = unaryop(x)
  • one_blob_only
  • support_inplace
param idnametypedefaultdescription
0op_typeint0Operation type as follows

Operation type:

  • 0 = ABS
  • 1 = NEG
  • 2 = FLOOR
  • 3 = CEIL
  • 4 = SQUARE
  • 5 = SQRT
  • 6 = RSQ
  • 7 = EXP
  • 8 = LOG
  • 9 = SIN
  • 10 = COS
  • 11 = TAN
  • 12 = ASIN
  • 13 = ACOS
  • 14 = ATAN
  • 15 = RECIPROCAL
  • 16 = TANH
  • 17 = LOG10
  • 18 = ROUND
  • 19 = TRUNC

Unfold

y = unfold(x)
  • one_blob_only
param idnametypedefaultdescription
0num_outputint0
1kernel_wint0
2dilation_wint1
3stride_wint1
4pad_leftint0
11kernel_hintkernel_w
12dilation_hintdilation_w
13stride_hintstride_w
14pad_topintpad_left
15pad_rightintpad_left
16pad_bottomintpad_top