Back to Machinelearning

DynamicLoadingReport

docs/gen-ai/DynamicLoadingReport.md

5.0.07.6 KB
Original Source

Conclusion

  • The main bottleneck of auto inference(dynamic loading) is the overhead of CPU-GPU data transfer.
  • The larger the layer size, the more acceleration we can get from GPU. So we should try to put larger layers on GPU.

Hardware: i9-14900k, 64GB memory, rtx 4090

Sequential Layer

DeviceNum of LayersLayer SizeModel SizeNum of Layers on GPUNum of Layers on CPUAverage Inference (ms)Acceleration% of Layer in GPU
CPU5124MB2GB--939.81.00%
Auto5124MB2GB05124901.90%
Auto5124MB2GB2532592723.549.4%
Auto5124MB2GB51203229.4100%
GPU5124MB2GB--32.429.0100%

Sequential Layer, Deeper Model

DeviceNum of LayersLayer SizeModel SizeNum of Layers on GPUNum of Layers on CPUAverage Inference (ms)Acceleration% of Layer in GPU
CPU10244MB4GB--1839.81.00%
Auto10244MB4GB010249541.90%
Auto10244MB4GB2527727872.324.6%
Auto10244MB4GB5085165303.549.6%
Auto10244MB4GB764260312.55.974.6%
Auto10244MB4GB1020469.726.999.6%
GPU10244MB4GB--65.927.9100%

Sequential Layer, Larger Layer (16MB)

DeviceNum of LayersLayer SizeModel SizeNum of Layers on GPUNum of Layers on CPUAverage Inference (ms)Acceleration% of Layer in GPU
CPU25616MB4GB--8641.00%
Auto25616MB4GB0256844.71.020%
Auto25616MB4GB60196669.91.323.4%
Auto25616MB4GB124132494.21.748.4%
Auto25616MB4GB18868372.72.373.4%
Auto25616MB4GB2524152.55.798.4%
GPU25616MB4GB--1197.3100%

Sequential Layer, Even Larger Layer (64MB)

DeviceNum of LayersLayer SizeModel SizeNum of Layers on GPUNum of Layers on CPUAverage Inference (ms)Acceleration% of Layer in GPU
CPU6464MB4GB--85011.00%
Auto6464MB4GB0648989.50%
Auto6464MB4GB1252755.211.318.8%
Auto6464MB4GB283659814.243.8%
Auto6464MB4GB4420419.720.268.8%
Auto6464MB4GB604263.732.393.8%
Auto6464MB4GB64070.54121100%
GPU6464MB4GB--69.8121.7100%

Hardware: Xeon W-2133, 32GB memory, gtx 1066

DeviceNum of LayersLayer SizeModel SizeNum of Layers on GPUNum of Layers on CPUAverage Inference (ms)Acceleration% of Layer in GPU
CPU6464MB4GB--174191.00%
Auto6464MB4GB0643783.44.60%
Auto6464MB4GB125234155.118.8%
Auto6464MB4GB283630045.7943.8%
Auto6464MB4GB442025366.8668.8%
Auto6464MB4GB60421018.2993.8%
Auto6464MB4GB640116314.97100%
GPU6464MB4GB--121314.3100%