<a href="posenet.md">Back</a> | <a href="backgroundnet.md">Next</a> | <a href="../README.md#hello-ai-world">Contents</a>

Action Recognition</s>

Action Recognition

Action recognition classifies the activity, behavior, or gesture occuring over a sequence of video frames. The DNNs typically use image classification backbones with an added temporal dimension. For example, the ResNet18-based pre-trained models use a window of 16 frames. You can also skip frames to lengthen the window of time over which the model classifies actions.

The actionNet object takes in one video frame at a time, buffers them as input to the model, and outputs the class with the highest confidence. actionNet can be used from Python and C++.

As examples of using the actionNet class, there are sample programs for C++ and Python:

actionnet.cpp (C++)
actionnet.py (Python)

Running the Example

To run action recognition on a live camera stream or video, pass in a device or file path from the Camera Streaming and Multimedia page.

bash

# C++
$ ./actionnet /dev/video0           # V4L2 camera input, display output (default) 
$ ./actionnet input.mp4 output.mp4  # video file input/output (mp4, mkv, avi, flv)

# Python
$ ./actionnet.py /dev/video0           # V4L2 camera input, display output (default) 
$ ./actionnet.py input.mp4 output.mp4  # video file input/output (mp4, mkv, avi, flv)

Command-Line Arguments

These optional command-line arguments can be used with actionnet/actionnet.py:

  --network=NETWORK    pre-trained model to load, one of the following:
                           * resnet-18 (default)
                           * resnet-34
  --model=MODEL        path to custom model to load (.onnx)
  --labels=LABELS      path to text file containing the labels for each class
  --input-blob=INPUT   name of the input layer (default is 'input')
  --output-blob=OUTPUT name of the output layer (default is 'output')
  --threshold=CONF     minimum confidence threshold for classification (default is 0.01)
  --skip-frames=SKIP   how many frames to skip between classifications (default is 1)

By default, the model will process every-other frame to lengthen the window of time for classifying actions over. You can change this with the --skip-frames parameter (using --skip-frames=0 will process every frame).

Pre-trained Action Recognition Models

Below are the pre-trained action recognition model available, and the associated --network argument to actionnet used for loading them:

Model	CLI argument	Classes
Action-ResNet18-Kinetics	`resnet18`	1040
Action-ResNet34-Kinetics	`resnet34`	1040

The default is resnet18. These models were trained on the Kinetics 700 and Moments in Time datasets (see here for the list of class labels).

Next | <a href="backgroundnet.md">Background Removal</a>