Back to Unilm

LayoutXLM (Document Foundation Model)

layoutxlm/README.md

latest6.4 KB
Original Source

LayoutXLM (Document Foundation Model)

Multimodal (text + layout/format + image) pre-training for multilingual Document AI

Introduction

LayoutXLM is a multimodal pre-trained model for multilingual document understanding, which aims to bridge the language barriers for visually-rich document understanding. Experiment results show that it has significantly outperformed the existing SOTA cross-lingual pre-trained models on the XFUND dataset.

LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei, arXiv Preprint 2021

Models

layoutxlm-base | huggingface

Fine-tuning Example on XFUND

Installation

Please refer to layoutlmft

Fine-tuning for Semantic Entity Recognition

cd layoutlmft
python -m torch.distributed.launch --nproc_per_node=4 examples/run_xfun_ser.py \
        --model_name_or_path microsoft/layoutxlm-base \
        --output_dir /tmp/test-ner \
        --do_train \
        --do_eval \
        --lang zh \
        --max_steps 1000 \
        --warmup_ratio 0.1 \
        --fp16

Fine-tuning for Relation Extraction

cd layoutlmft
python -m torch.distributed.launch --nproc_per_node=4 examples/run_xfun_re.py \
        --model_name_or_path microsoft/layoutxlm-base \
        --output_dir /tmp/test-ner \
        --do_train \
        --do_eval \
        --lang zh \
        --max_steps 2500 \
        --per_device_train_batch_size 2 \
        --warmup_ratio 0.1 \
        --fp16

Results on XFUND

Language-specific Finetuning

ModelFUNSDZHJAESFRITDEPTAvg.
Semantic Entity Recognitionxlm-roberta-base0.6670.87740.77610.61050.67430.66870.68140.68180.7047
infoxlm-base0.68520.88680.78650.62300.70150.67510.70630.70080.7207
layoutxlm-base0.7940.89240.79210.75500.79020.80820.82220.79030.8056
Relation Extractionxlm-roberta-base0.26590.51050.58000.52950.49650.53050.50410.39820.4769
infoxlm-base0.29200.52140.60000.55160.49130.52810.52620.41700.4910
layoutxlm-base0.54830.70730.69630.68960.63530.64150.65510.57180.6432

Zero-shot Transfer Learning

ModelFUNSDZHJAESFRITDEPTAvg.
SERxlm-roberta-base0.6670.41440.30230.30550.3710.27670.32860.39360.3824
infoxlm-base0.68520.44080.36030.31020.40210.28800.35870.45020.4119
layoutxlm-base0.7940.60190.47150.45650.57570.48460.52520.5390.5561
RExlm-roberta-base0.26590.16010.26110.24400.22400.23740.22880.19960.2276
infoxlm-base0.29200.24050.28510.24810.24540.21930.20270.20490.2423
layoutxlm-base0.54830.44940.44080.47080.44160.40900.38200.36850.4388

Multitask Fine-tuning

ModelFUNSDZHJAESFRITDEPTAvg.
SERxlm-roberta-base0.66330.8830.77860.62230.70350.68140.71460.67260.7149
infoxlm-base0.65380.87410.78550.59790.70570.68260.70550.67960.7106
layoutxlm-base0.79240.89730.79640.77980.81730.8210.83220.82410.8201
RExlm-roberta-base0.36380.67970.68290.68280.67270.69370.68870.60820.6341
infoxlm-base0.36990.64930.64730.68280.68310.66900.63840.57630.6145
layoutxlm-base0.66710.82410.81420.81040.82210.83100.78540.70440.7823

Citation

If you find LayoutXLM useful in your research, please cite the following paper:

latex
@article{Xu2020LayoutXLMMP,
  title         = {LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding},
  author        = {Yiheng Xu and Tengchao Lv and Lei Cui and Guoxin Wang and Yijuan Lu and Dinei Florencio and Cha Zhang and Furu Wei},
  year          = {2021},
  eprint        = {2104.08836},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CL}
}

License

The content of this project itself is licensed under the Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)

Contact Information

For help or issues using LayoutXLM, please submit a GitHub issue.

For other communications related to LayoutXLM, please contact Lei Cui ([email protected]), Furu Wei ([email protected]).