README_WEIGHTS.md
config.jsondeepseek_v3 in this release.The DeepSeek-V3 weight file consists of two main components: Main Model Weights and MTP Modules.
model.embed_tokens.weightmodel.layers.0 to model.layers.60, totaling num_hidden_layers layers.model.norm.weightlm_head.weightnum_nextn_predict_layers field. In this model, the value is set to 1.model.layers.61.self_attn & mlp (structure identical to the Main Model hidden layers).num_hidden_layers parameter in config.json.num_nextn_predict_layers parameter, with layer IDs appended immediately after the Main Model hidden layers. For example:
num_hidden_layers = 61 and num_nextn_predict_layers = 1, the MTP Module's layer ID is 61.DeepSeek-V3 natively supports FP8 weight format with 128x128 block scaling.
The FP8 weight file introduces a quantization_config field to describe the quantization method. Below is an example configuration:
"quantization_config": {
"activation_scheme": "dynamic",
"fmt": "e4m3",
"quant_method": "fp8",
"weight_block_size": [128, 128]
}
fp8 and e4m3 (corresponding to torch.float8_e4m3fn).128x128.dynamic).The FP8 weight file includes a weight_scale_inv field, which stores the dequantization scale for each weight block.
float32 Tensor, stored alongside the weight data.(128x128 weight block) * weight_scale_inv.Through dequantization of the FP8 weights, runtime operations enable online quantization at a granularity of per-token-per-128-channel.