Back to Insightface

InsightFace Edge Inference and Deployment

examples/edge_inference/README.md

0.72.0 KB
Original Source

InsightFace Edge Inference and Deployment

In this tutorial, we give examples and benchmarks of running insightface models on edge devices, mainly using 8-bits quantization technologies to make acceleration.

Recognition

In recognition tutorial, we use an open-source model: IR50@Glint360K, and use a hard private 1:N testset(N=50000). The metric contains Rank1 and TAR@FAR<=e-3.

Granularity and symmetry both stand for quantization setting, and mostly defined by hardware providers. Symmetric uses INT8 to save quantization results while Asymmetric uses UINT8 type.

HardwareProviderTypeBackendTimeGranularitySymmetryRank1-AccTAR@FAR<=e-3
V100NVIDIAGPUonnxruntime4ms--80.9430.77
Jetson NXNVIDIAGPUTensorRT16msPer-channelSymmetric79.2631.07
A311DKhadasASICTengine26msPer-tensorAsymmetric77.8326.58
A311D*KhadasASICTengine26msPer-tensorAsymmetric79.3828.59
NXP-IMX8PNXPASICTengine24msPer-tensorAsymmetric77.8726.80
NXP-IMX8P*NXPASICTengine24msPer-tensorAsymmetric79.4228.39
RV1126RockchipASICRKNN38msPer-tensorAsymmetric75.6024.23
RV1126*RockchipASICRKNN38msPer-tensorAsymmetric77.8226.30

Suffix-* means mixed mode: using float32 model for gallery while using quantized model for probe images. Result features are all in float32 type.

The example code of running quantized networks can be now found at Tengine. Later, we will put a copy here and give full tutorial on how to quantize recognition models from 0 to 1.

Detection

TODO