kosmos-2/fairseq/examples/xglm/README.md
In this work, we train multilingual generative language models, dubbed XGLM, on a balanced corpus covering a diverse set of languages, and study their few- and zero-shot learning capabilities in a wide range of tasks. Our largest model with 7.5 billion parameters sets new state of the art in few-shot learning on more than 20 representative languages, outperforming GPT-3 of comparable size in multilingual commonsense reasoning (+7.4 accuracy points for 0-shot, +9.4 for 4-shot) and natural language inference (+5.4 for 0-shot, +5.4 for 4-shot). We have included a model card of XGLM for transparency and accountability.
XGLM models are trained on a new multilingual corpus extracted from CommonCrawl (CC100-XL), a significantly larger multilingual dataset covering 68 Common Crawl (CC) snapshots (from Summer 2013 to March/April 2020 consisting of 134 languages. The detailed languages and data statistics are reported in the paper (Table A.1).
| Model | Layers | Model Dim | Languages | Download |
|---|---|---|---|---|
XGLM 564M | 24 | 1024 | trained on 30 languages | xglm.564M.tar.gz |
XGLM 1.7B | 24 | 2048 | trained on 30 languages | xglm.1.7B.tar.gz |
XGLM 2.9B | 48 | 2048 | trained on 30 languages | xglm.2.9B.tar.gz |
XGLM 7.5B | 32 | 4096 | trained on 30 languages | xglm.7.5B.tar.gz |
XGLM 4.5B | 48 | 2048 | trained on 134 languages | xglm.4.5B.tar.gz |
Coming soon.
Coming soon.