docs/actionnet.md
<sup>Action Recognition</sup></s></p>
Action recognition classifies the activity, behavior, or gesture occuring over a sequence of video frames. The DNNs typically use image classification backbones with an added temporal dimension. For example, the ResNet18-based pre-trained models use a window of 16 frames. You can also skip frames to lengthen the window of time over which the model classifies actions.
The actionNet object takes in one video frame at a time, buffers them as input to the model, and outputs the class with the highest confidence. actionNet can be used from Python and C++.
As examples of using the actionNet class, there are sample programs for C++ and Python:
actionnet.cpp (C++)actionnet.py (Python)To run action recognition on a live camera stream or video, pass in a device or file path from the Camera Streaming and Multimedia page.
# C++
$ ./actionnet /dev/video0 # V4L2 camera input, display output (default)
$ ./actionnet input.mp4 output.mp4 # video file input/output (mp4, mkv, avi, flv)
# Python
$ ./actionnet.py /dev/video0 # V4L2 camera input, display output (default)
$ ./actionnet.py input.mp4 output.mp4 # video file input/output (mp4, mkv, avi, flv)
These optional command-line arguments can be used with actionnet/actionnet.py:
--network=NETWORK pre-trained model to load, one of the following:
* resnet-18 (default)
* resnet-34
--model=MODEL path to custom model to load (.onnx)
--labels=LABELS path to text file containing the labels for each class
--input-blob=INPUT name of the input layer (default is 'input')
--output-blob=OUTPUT name of the output layer (default is 'output')
--threshold=CONF minimum confidence threshold for classification (default is 0.01)
--skip-frames=SKIP how many frames to skip between classifications (default is 1)
By default, the model will process every-other frame to lengthen the window of time for classifying actions over. You can change this with the --skip-frames parameter (using --skip-frames=0 will process every frame).
Below are the pre-trained action recognition model available, and the associated --network argument to actionnet used for loading them:
| Model | CLI argument | Classes |
|---|---|---|
| Action-ResNet18-Kinetics | resnet18 | 1040 |
| Action-ResNet34-Kinetics | resnet34 | 1040 |
The default is resnet18. These models were trained on the Kinetics 700 and Moments in Time datasets (see here for the list of class labels).
Back | <b><a href="posenet.md">Pose Estimation with PoseNet</a></p> </b><p align="center"><sup>© 2016-2021 NVIDIA | </sup><a href="../README.md#hello-ai-world"><sup>Table of Contents</sup></a></p>