alexnet/README.md
AlexNet model architecture comes from this paper: One weird trick for parallelizing convolutional neural networks. To generate .wts file, you can refer to pytorchx/alexnet. To check the pytorch implementation of AlexNet, refer to HERE
AlexNet consists of 3 major parts: features, adaptive average pooling, and classifier:
CRP(conv-relu-pool) and CR layersfc-relu layers. All layers can be implemented by tensorrt api, including addConvolution, addActivation, addPooling, addMatrixMultiply, addElementWise etc.We can use torchvision to load the pretrained alexnet model:
alexnet = torchvision.models.alexnet(pretrained=True)
The model structure is:
AlexNet(
(features): Sequential(
(0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
(1): ReLU(inplace=True)
(2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
(3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
(4): ReLU(inplace=True)
(5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
(6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(7): ReLU(inplace=True)
(8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(9): ReLU(inplace=True)
(10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(11): ReLU(inplace=True)
(12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
(classifier): Sequential(
(0): Dropout(p=0.5, inplace=False)
(1): Linear(in_features=9216, out_features=4096, bias=True)
(2): ReLU(inplace=True)
(3): Dropout(p=0.5, inplace=False)
(4): Linear(in_features=4096, out_features=4096, bias=True)
(5): ReLU(inplace=True)
(6): Linear(in_features=4096, out_features=1000, bias=True)
)
)
gen_wts.py to generate wts file.python3 gen_wts.py
pushd tensorrtx/alexnet
cmake -S . -B build -G Ninja --fresh
cmake --build build
./build/alexnet -s
./build/alexnet -d
output looks like:
...
====
Execution time: 1ms
0.1234, -0.5678, ...
====
prediction result:
Top: 0 idx: 285, logits: 9.9, label: Egyptian cat
Top: 1 idx: 281, logits: 8.304, label: tabby, tabby cat
Top: 2 idx: 282, logits: 6.859, label: tiger cat
If your output is different from pytorch, you have to check which TensorRT API or your code cause this. A simple solution would be check the .engine output part by part, e.g., you can set the early layer of alexnet as output:
fc3_1->getOutput(0)->setName(OUTPUT_NAME);
network->markOutput(*pool3->getOutput(0)); // original is: "*fc3_1->getOutput(0)"
For this line of code, i use the output from "feature" part of alexnet, ignoring the rest of the model, then, don't forget to change the OUTPUT_SIZE macro on top of the file, lastly, build the .engine file to apply the changes.
You can sum up all output from C++ code, and compare it with Pytorch output, for Pytorch, you can do this by: torch.sum(x) at debug phase. The ideal value deviation between 2 values would be $[10^{-1}, 10^{-2}]$, for this example, since the output elements for "feature" is $256 * 6 * 6$ (bacth = 1), the final error would roughly be $10^{-4}$.
Note: This is a quick check, for more accurate check, you have to save the output tensor into a file to compare them value by value, but this situation is rare.