Back to Sglang

W8A8 Block FP8 Kernel Configurations

python/sglang/srt/layers/quantization/configs/README.md

0.5.12651 B
Original Source

W8A8 Block FP8 Kernel Configurations

This directory contains optimized kernel configurations for the W8A8 block FP8 matrix multiplication kernel.

Configuration File Format

Configuration files are named using the following pattern:

N={N},K={K},device_name={DEVICE_NAME},dtype=fp8_w8a8,block_shape=[{BLOCK_N},{BLOCK_K}].json

Where:

  • N: Output dimension (number of columns in weight matrix)
  • K: Input dimension (number of columns in activation matrix)
  • DEVICE_NAME: GPU device name with spaces replaced by underscores (e.g., NVIDIA_H100_80GB_HBM3)
  • BLOCK_N, BLOCK_K: Block quantization granularity (typically [128,128])