nmsPlugin

Table Of Contents

Description
- Structure
Parameters
- CodeType
- inputOrder
Additional resources
License
Changelog
Known issues

Description

NOTE: nmsPlugin is deprecated since TensorRT 9.0. Its functionality has been superseded by the INMSLayer and EfficientNMS plugin.

The nmsPlugin, similar to the batchedNMSPlugin, implements a non_max_suppression (NMS) operation over bounding boxes for object detection networks. This plugin is included in TensorRT.

Additionally, the nmsPlugin has a bounding box decoding step prior to the non_max_suppression step. The nmsPlugin takes the predicted encoded bounding box data as input, decodes them, followed by the non maximum suppression step in a GPU-accelerated fashion in TensorRT.

Structure

This plugin takes three inputs, loc_data, conf_data and prior_data and generates one output containing the following information:

image id
bounding box label
confidence score
bounding box labels

and another output containing the number of valid detections for each batch item after non maximum suppression.

Where:

loc_data is the predicted bounding box data subject to decoding. It has a shape of [batchSize, numPriors * numLocClasses * 4, 1, 1]. Where:
- numPriors is the total number of prior boxes for one sample
- numLocClasses is 1 if each bounding box predicts the probability for all candidate classes, else the number of candidate classes if the bounding box only does binary prediction for one candidate class. For example, say you have 81 candidate classes, if your bounding box could predict the probability for all the 81 candidate classes, numLocClasses will be 1 in this case. This is also the original implementation of the SSD model. However, if your bounding box was designed to do binary classification, you would need 81 bounding boxes for one anchor box, numLocClasses will be 81 in this case.
conf_data is the object classification confidence probability distribution of the bounding box. It has a shape of [batchSize, numPriors * numClasses, 1, 1].
prior_data is the anchor box data with additional scaling factor or variance used for bounding box encoding and decoding generated from the custom plugin gridAnchorPlugin. It has a shape of [batchSize, 2, numPriors * 4, 1]. The prior_data input has two channels.
- The first channel is the anchor box data.
- The second channel is the scaling factor or variance used for bounding box encoding and decoding.

After decoding, the decoded boxes will proceed to the non maximum suppression step, which performs the same action as batchedNMSPlugin. The only difference is that instead of generating four outputs:

nmsed box count (1 value)
nmsed box locations (4 values)
nmsed box scores (1 value)
nmsed box class IDs (1 value)

The nmsPlugin generates an output of shape [batchSize, 1, keepTopK, 7] which contains the same information as the outputs nmsed box locations, nmsed box scores, and nmsed box class IDs from batchedNMSPlugin, and an another output of shape [batchSize, 1, 1, 1] which contains the same information as the output nmsed box count from batchedNMSPlugin.

Parameters

The plugin has the plugin creator class NMSPluginCreator and the plugin class DetectionOutput.

The DetectionOutput plugin instance is created using an array of DetectionOutputParameters type parameters. DetectionOutputParameters consists of the following parameters:

Type	Parameter	Description
`bool`	`shareLocation`	If `true`, the bounding boxes are shared among different classes.
`bool`	`varianceEncodedInTarget`	If `true`, variance is encoded in target, you will not use the variance values to adjust the predicted bounding box. Otherwise, you will use the variance values to adjust the predicted bounding box accordingly.
`int`	`backgroundLabelId`	The background label ID. If there is no background class, set it to `-1`.
`int`	`numClasses`	Number of classes to be predicted.
`int`	`topK`	Number of boxes per image with top confidence scores that are fed into the NMS algorithm.
`int`	`keepTopK`	Number of total bounding boxes to be kept per image after the NMS step.
`float`	`confidenceThreshold`	Considers detections whose confidences are larger than a threshold.
`float`	`nmsThreshold`	Intersection over union (IoU) threshold to be used in NMS.
`codeTypeSSD`	`codeType`	Type of coding method for `bbox`.
`int`	`inputOrder`	Specifies the order of inputs `{loc_data, conf_data, priorbox_data}`, in other words, `inputOrder[0]` is for `loc_data`, `inputOrder[1]` is for `conf_data` and `inputOrder[2]` is for `priorbox_data`. For example, if your inputs in the memory are in the order of `loc_data`, `priorbox_data`, `conf_data`, then `inputOrder` should be `[0, 2, 1]`.
`bool`	`confSigmoid`	Set to `true` to calculate sigmoid of confidence scores.
`bool`	`isNormalized`	Set to `true` if bounding box data is normalized by the network, in other words, the bounding box coordinates used in the model are not pixel coordinates.
`int`	`scoreBits`	The number of bits to represent the score values during radix sort. The number of bits to represent score values(confidences) during radix sort. This valid range is 0 < scoreBits <= 10. The default value is 16(which means to use full bits in radix sort). Setting this parameter to any invalid value will result in the same effect as setting it to 16. This parameter can be tuned to strike for a best trade-off between performance and accuracy. Lowering scoreBits will improve performance but with some minor degradation to the accuracy. This parameter is only valid for FP16 data type for now.

`CodeType`

The bounding boxes are used in an encoded format in the model. In order to get the exact bounding box coordinates on the original input image, we need to know the encoding method and decode them.

The bounding box is decoded using the decodeBBoxes CUDA kernel defined in the decodeBBoxes.cu file based on the encoding and decoding method used during model training. Currently, we support the following encoding methods:

CodeTypeSSD::CORNER
CodeTypeSSD::CENTER_SIZE
CodeTypeSSD::CORNER_SIZE
CodeTypeSSD::TF_CENTER

CodeTypeSSD is defined in NvInferPlugin.h and has a brief description on NvInferPlugin.h File Reference. The mathematical formulation of the coding methods are listed below.

`CodeTypeSSD::CORNER`

Without using or having variance encoded, the encoded bounding box representation is:

[x_{min, gt} - x_{min, anchor}, y_{min, gt} - y_{min, anchor}, x_{max, gt} - x_{max, anchor}, y_{max, gt} - y_{max, anchor}]

Using or having variance encoded, the encoded bounding box representation is:

[(x_{min, gt} - x_{min, anchor}) / variance_0, (y_{min, gt} - y_{min, anchor}) / variance_1, (x_{max, gt} - x_{max, anchor}) / variance_2, (y_{max, gt} - y_{max, anchor}) / variance_3]

`CodeTypeSSD::CENTER_SIZE`

Without using or having variance encoded, the encoded bounding box representation is:

[(x_{center, gt} - x_{center, anchor}) / w_{anchor}, (y_{center, gt} - y_{center, anchor}) / h_{anchor}, ln(w_{gt} / w_{anchor}), ln(h_{gt} / h_{anchor})]

Using or having variance encoded, the encoded bounding box representation is:

[(x_{center, gt} - x_{center, anchor}) / w_{anchor} / variance_0, (y_{center, gt} - y_{center, anchor}) / h_{anchor} / variance_1, ln(w_{gt} / w_{anchor}) / variance_2, ln(h_{gt} / h_{anchor}) / variance_3]

`CodeTypeSSD::CORNER_SIZE`

Without using or having variance encoded, the encoded bounding box representation is:

[(x_{min, gt} - x_{min, anchor}) / w_{anchor}, (y_{min, gt} - y_{min, anchor}) / h_{anchor}, (x_{max, gt} - x_{max, anchor}) / w_{anchor}, (y_{max, gt} - y_{max, anchor}) / h_{anchor}]

Using or having variance encoded, the encoded bounding box representation is:

[(x_{min, gt} - x_{min, anchor}) / w_{anchor} / variance_0, (y_{min, gt} - y_{min, anchor}) / h_{anchor} / variance_1, (x_{max, gt} - x_{max, anchor}) / w_{anchor} / variance_2, (y_{max, gt} - y_{max, anchor}) / h_{anchor} / variance_3]

`CodeTypeSSD::TF_CENTER`

Using or having variance encoded, the encoded bounding box representation is:

[(y_{center, gt} - y_{center, anchor}) / h_{anchor} / variance_0, (x_{center, gt} - x_{center, anchor}) / w_{anchor} / variance_1, ln(h_{gt} / h_{anchor}) / variance_2, ln(w_{gt} / w_{anchor}) / variance_3]

Note: This code is almost the same to CodeTypeSSD::CENTER_SIZE using variance encoded except that the order of coordinates were different.

`inputOrder`

When converting the frozen graph pb file to the unified framework format uff file using convert-to-uff, make sure to generate a human readable graph file with argument -t. The order of the tensor inputs to the plugin will be exactly the same to the order of tensor inputs in the corresponding node shown in the human readable graph file.

Additional resources

The following resources provide a deeper understanding of the nmsPlugin plugin:

Networks:

License

For terms and conditions for use, reproduction, and distribution, see the TensorRT Software License Agreement documentation.

Changelog

June 2023 Add deprecation note.

May 2019 This is the first release of this README.md file.

Known issues

There are no known issues in this plugin.