PP-Structure Model list

1. Layout Analysis

model name	description	inference model size	download	dict path
picodet_lcnet_x1_0_fgd_layout	The layout analysis English model trained on the PubLayNet dataset based on PicoDet LCNet_x1_0 and FGD . the model can recognition 5 types of areas such as Text, Title, Table, Picture and List	9.7M	inference model / trained model	PubLayNet dict
ppyolov2_r50vd_dcn_365e_publaynet	The layout analysis English model trained on the PubLayNet dataset based on PP-YOLOv2	221.0M	inference_moel / trained model	same as above
picodet_lcnet_x1_0_fgd_layout_cdla	The layout analysis Chinese model trained on the CDLA dataset, the model can recognition 10 types of areas such as Table、Figure、Figure caption、Table、Table caption、Header、Footer、Reference、Equation	9.7M	inference model / trained model	CDLA dict
picodet_lcnet_x1_0_fgd_layout_table	The layout analysis model trained on the table dataset, the model can detect tables in Chinese and English documents	9.7M	inference model / trained model	Table dict
ppyolov2_r50vd_dcn_365e_tableBank_word	The layout analysis model trained on the TableBank Word dataset based on PP-YOLOv2, the model can detect tables in English documents	221.0M	inference model	same as above
ppyolov2_r50vd_dcn_365e_tableBank_latex	The layout analysis model trained on the TableBank Latex dataset based on PP-YOLOv2, the model can detect tables in English documents	221.0M	inference model	same as above

2. OCR and Table Recognition

2.1 OCR

model name	description	inference model size	download
en_ppocr_mobile_v2.0_table_det	Text detection model of English table scenes trained on PubTabNet dataset	4.7M	inference model / trained model
en_ppocr_mobile_v2.0_table_rec	Text recognition model of English table scenes trained on PubTabNet dataset	6.9M	inference model / trained model

If you need to use other OCR models, you can download the model in PP-OCR model_list or use the model you trained yourself to configure to det_model_dir, rec_model_dir field.

2.2 Table Recognition

model	description	inference model size	download
en_ppocr_mobile_v2.0_table_structure	English table recognition model trained on PubTabNet dataset based on TableRec-RARE	6.8M	inference model / trained model
en_ppstructure_mobile_v2.0_SLANet	English table recognition model trained on PubTabNet dataset based on SLANet	9.2M	inference model / trained model
ch_ppstructure_mobile_v2.0_SLANet	Chinese table recognition model based on SLANet	9.3M	inference model / trained model

3. KIE

On XFUND_zh dataset, Accuracy and time cost of different models on V100 GPU are as follows.

Model	Backbone	Task	Config	Hmean	Time cost(ms)	Download link
VI-LayoutXLM	VI-LayoutXLM-base	SER	ser_vi_layoutxlm_xfund_zh_udml.yml	93.19%	15.49	trained model
LayoutXLM	LayoutXLM-base	SER	ser_layoutxlm_xfund_zh.yml	90.38%	19.49	trained model
LayoutLM	LayoutLM-base	SER	ser_layoutlm_xfund_zh.yml	77.31%	-	trained model
LayoutLMv2	LayoutLMv2-base	SER	ser_layoutlmv2_xfund_zh.yml	85.44%	31.46	trained model
VI-LayoutXLM	VI-LayoutXLM-base	RE	re_vi_layoutxlm_xfund_zh_udml.yml	83.92%	15.49	trained model
LayoutXLM	LayoutXLM-base	RE	re_layoutxlm_xfund_zh.yml	74.83%	19.49	trained model
LayoutLMv2	LayoutLMv2-base	RE	re_layoutlmv2_xfund_zh.yml	67.77%	31.46	trained model

Note: The above time cost information just considers inference time without preprocess or postprocess, test environment: V100 GPU + CUDA 10.2 + CUDNN 8.1.1 + TRT 7.2.3.4

On wildreceipt dataset, the algorithm result is as follows:

Model	Backbone	Config	Hmean	Download link
SDMGR	VGG6	configs/kie/sdmgr/kie_unet_sdmgr.yml	86.70%	trained model