Back to Unilm

Few-shot Learning with Multilingual Language Models

kosmos-2/fairseq/examples/xglm/README.md

latest2.0 KB
Original Source

Few-shot Learning with Multilingual Language Models

Introduction

In this work, we train multilingual generative language models, dubbed XGLM, on a balanced corpus covering a diverse set of languages, and study their few- and zero-shot learning capabilities in a wide range of tasks. Our largest model with 7.5 billion parameters sets new state of the art in few-shot learning on more than 20 representative languages, outperforming GPT-3 of comparable size in multilingual commonsense reasoning (+7.4 accuracy points for 0-shot, +9.4 for 4-shot) and natural language inference (+5.4 for 0-shot, +5.4 for 4-shot). We have included a model card of XGLM for transparency and accountability.

Data and Languages

XGLM models are trained on a new multilingual corpus extracted from CommonCrawl (CC100-XL), a significantly larger multilingual dataset covering 68 Common Crawl (CC) snapshots (from Summer 2013 to March/April 2020 consisting of 134 languages. The detailed languages and data statistics are reported in the paper (Table A.1).

Pre-trained models

ModelLayersModel DimLanguagesDownload
XGLM 564M241024trained on 30 languagesxglm.564M.tar.gz
XGLM 1.7B242048trained on 30 languagesxglm.1.7B.tar.gz
XGLM 2.9B482048trained on 30 languagesxglm.2.9B.tar.gz
XGLM 7.5B324096trained on 30 languagesxglm.7.5B.tar.gz
XGLM 4.5B482048trained on 134 languagesxglm.4.5B.tar.gz

Evaluation

Coming soon.

Citation

Coming soon.