docs/developer-guide/layer-support-behavior.md
support_XYZ Properties in ncnn's Layer ClassThis document is for developers implementing new layers in ncnn. It explains the support_XYZ boolean properties in the ncnn::Layer base class. Correctly setting these properties declares the capabilities of your layer to the ncnn inference engine. This allows the engine to apply specific optimizations, such as enabling SIMD, half-precision floating-point computation, or Vulkan GPU acceleration, to achieve optimal performance and memory efficiency.
support PropertiesA layer can set its support properties in two ways:
create_pipeline: If the layer's capabilities depend on parameters loaded from load_param or load_model (e.g., the data type of weights), you can set these properties dynamically within the create_pipeline method.Here is a detailed breakdown of each support property and what it means for your layer's implementation.
one_blob_onlyblob and produces only one output blob.true: You must implement the single-input, single-output version of the forward method:
virtual int forward(const Mat& bottom_blob, Mat& top_blob, const Option& opt) const;
true, ncnn calls this overload. If false (default), the std::vector<Mat> version of forward is called.support_inplacetrue: You must implement the forward_inplace method. Depending on whether one_blob_only is also enabled, implement the corresponding version:
// If one_blob_only is true
virtual int forward_inplace(Mat& bottom_top_blob, const Option& opt) const;
// If one_blob_only is false
virtual int forward_inplace(std::vector<Mat>& bottom_top_blobs, const Option& opt) const;
support_vulkantrue:
forward / forward_inplace methods that accept VkMat for input and output.upload_model to transfer weight data to the GPU.create_pipeline and destroy_pipeline to manage Vulkan Pipeline objects and other GPU resources.support_packing (for CPU)Mat data with a "packing" memory layout (i.e., elempack > 1). This is crucial for SIMD optimizations (e.g., processing 4 or 8 floats at once with NEON or AVX).true:
Mat channel count is a multiple of the SIMD width, the ncnn engine ensures that the input Mat passed to forward / forward_inplace is packed (e.g., elempack=4 or elempack=8).Mat data where cstep and elempack are not their default values.false:
ncnn engine guarantees that the input Mat passed to your layer will always have elempack=1. The engine will automatically insert conversions if the preceding layer produced a packed output.Mat with any elempack. However, it is highly recommended to output a Mat with an adaptive elempack to avoid unnecessary conversions in subsequent layers.support_any_packing (for CPU)support_packing. It declares that the layer's CPU implementation is flexible enough to handle a Mat with any elempack value (1, 4, 8, etc.).true:
ncnn engine can pass an input Mat with any packing layout to your forward method, without forcing a conversion to the hardware's "optimal" elempack. For example, on an AVX512 system where elempack=16 is optimal, your layer can still accept elempack=1, 4, or 8.false: If false (but support_packing is true), the engine will try to provide an input Mat with an optimal elempack for the target architecture.Mat, which can have any elempack.support_vulkan_packing (for Vulkan)support_packing. It declares that the layer's Vulkan implementation can handle VkMat with elempack=4.true: When the input VkMat has a channel count that is a multiple of 4, the ncnn engine will provide a packed VkMat (with elempack=4) to your Vulkan forward methods.false: The engine will ensure the input VkMat has elempack=1.support_packing and support_vulkan_packing are independent. A layer can support packing on CPU but not on Vulkan, or vice-versa.support_vulkan_any_packing (for Vulkan)support_vulkan_packing. It declares that the layer's Vulkan implementation can handle a VkMat with any supported elempack value (e.g., 1, 4).true:
ncnn engine can pass an input VkMat with any supported packing layout to your Vulkan forward method. This allows the engine to avoid unnecessary repacking operations on the GPU.false: If false (but support_vulkan_packing is true), the engine will try to provide a VkMat with elempack=4 if the channel count is a multiple of 4.support_any_packing.support_bf16_storagebfloat16 data.true:
forward method may receive an input Mat of type bfloat16 (elembits() == 16) or fp32.forward implementation, you must check opt.use_bf16_storage and bottom_blob.elembits() to determine whether to use a bfloat16-optimized code path.false: The ncnn engine ensures your layer will not receive a bfloat16 Mat.bfloat16 or fp32 Mat. When opt.use_bf16_storage is active, outputting bfloat16 is recommended to maintain precision and performance across the network.support_fp16_storagefloat16 data for half-precision inference.true:
support_bf16_storage, the forward method may receive an fp16 or fp32 Mat.opt.use_fp16_storage and bottom_blob.elembits() to select the correct code path.false: The ncnn engine ensures your layer will not receive an fp16 Mat.fp16 or fp32 Mat. When opt.use_fp16_storage is active, outputting an fp16 Mat is recommended.support_int8_storageint8 quantized inference.true:
opt.use_int8_inference is true, the forward method may receive an int8 or fp32 Mat.fp32, your forward implementation is responsible for dynamically quantizing it to int8 before performing computations.false: The ncnn engine ensures your layer will not receive an int8 Mat.int8 or fp32, depending on your layer's design.A layer can set support_fp16_storage and support_bf16_storage to true simultaneously. The ncnn engine prioritizes these formats based on the Option flags. As seen in the convert_layout function in src/net.cpp, if opt.use_bf16_storage is true, the engine will prefer converting inputs to bfloat16. Otherwise, it falls back to fp16 if opt.use_fp16_storage is true.
The chosen elempack also depends on the precision. For instance, with SIMD, the priority might be:
elempack=8 (if supported), then elempack=4, then 1.elempack=4, then 1.Your forward implementation should reflect this by checking elembits() and elempack to dispatch to the correct kernel.
Clip_armThe Clip_arm layer provides a great example of these concepts in practice.
Declaring Support in the Constructor: It declares support for packing and, conditionally, for fp16 and bf16 storage.
// From: src/layer/arm/clip_arm.cpp
Clip_arm::Clip_arm()
{
#if __ARM_NEON
support_packing = true;
#if NCNN_ARM82
support_fp16_storage = cpu_support_arm_asimdhp();
#endif
#endif // __ARM_NEON
#if NCNN_BF16
support_bf16_storage = true;
#endif
}
Dispatching in forward_inplace:
The forward_inplace method acts as a dispatcher. It first checks the element size (elembits) and the corresponding opt flag to decide whether to call a specialized low-precision implementation (fp16s or bf16s). If neither is applicable, it defaults to the standard fp32 implementation.
// From: src/layer/arm/clip_arm.cpp
int Clip_arm::forward_inplace(Mat& bottom_top_blob, const Option& opt) const
{
int elembits = bottom_top_blob.elembits();
#if NCNN_ARM82
if (support_fp16_storage && opt.use_fp16_storage && elembits == 16)
return forward_inplace_fp16s(bottom_top_blob, opt);
#endif
#if NCNN_BF16
if (opt.use_bf16_storage && elembits == 16)
return forward_inplace_bf16s(bottom_top_blob, opt);
#endif
// Default fp32 implementation follows...
int w = bottom_top_blob.w;
// ...
}
Adopting a gradual approach can simplify the development of a new layer:
support_XYZ properties set to false. Focus on getting the mathematical logic correct using standard fp32 data and elempack=1.support_packing = true. Modify your code to handle elempack > 1 and implement SIMD optimizations (e.g., using NEON intrinsics).fp16, bf16, or int8. Set the corresponding support_*_storage flags to true and add branches in your forward method to handle these data types based on the opt flags.support_vulkan = true and implement the Vulkan-specific methods.This incremental process allows you to tackle one challenge at a time, making it easier to develop a highly optimized and feature-rich layer.