README.md
We are hiring at all levels (including FTE researchers and interns)! If you are interested in working with us on Foundation Models (aka large-scale pre-trained models) and General AI, NLP, MT, Speech, Document AI and Multimodal AI, please send your resume to <a href="mailto:[email protected]" class="x-hidden-focus">[email protected]</a>.
Fundamental research to develop new architectures for foundation models and AI, focusing on modeling generality and capability, as well as training stability and efficiency.
Stability - DeepNet: scaling Transformers to 1,000 Layers and beyond
Generality - Foundation Transformers (Magneto): towards true general-purpose modeling across tasks and modalities (including language, vision, speech, and multimodal)
Capability - A Length-Extrapolatable Transformer
Efficiency & Transferability - X-MoE: scalable & finetunable sparse Mixture-of-Experts (MoE)
BitNet: 1-bit Transformers for Large Language Models
RetNet: Retentive Network: A Successor to Transformer for Large Language Models
LongNet: Scaling Transformers to 1,000,000,000 Tokens
Kosmos-2.5: A Multimodal Literate Model
Kosmos-2: Grounding Multimodal Large Language Models to the World
Kosmos-1: A Multimodal Large Language Model (MLLM)
MetaLM: Language Models are General-Purpose Interfaces
The Big Convergence - Large-scale self-supervised pre-training across tasks (predictive and generative), languages (100+ languages), and modalities (language, image, audio, layout/format + language, vision + language, audio + language, etc.)
UniLM: unified pre-training for language understanding and generation
InfoXLM/XLM-E: multilingual/cross-lingual pre-trained models for 100+ languages
DeltaLM/mT6: encoder-decoder pre-training for language generation and translation for 100+ languages
MiniLM: small and fast pre-trained models for language understanding and generation
AdaLM: domain, language, and task adaptation of pre-trained models
EdgeLM(
NEW): small pre-trained models on edge/client devices
SimLM (
NEW): large-scale pre-training for similarity matching
E5 (
NEW): text embeddings
MiniLLM (
NEW): Knowledge Distillation of Large Language Models
BEiT/BEiT-2: generative self-supervised pre-training for vision / BERT Pre-Training of Image Transformers
DiT: self-supervised pre-training for Document Image Transformers
TextDiffuser/TextDiffuser-2 (
NEW): Diffusion Models as Text Painters
WavLM: speech pre-training for full stack tasks
VALL-E: a neural codec language model for TTS
LayoutLM/LayoutLMv2/LayoutLMv3: multimodal (text + layout/format + image) Document Foundation Model for Document AI (e.g. scanned documents, PDF, etc.)
LayoutXLM: multimodal (text + layout/format + image) Document Foundation Model for multilingual Document AI
MarkupLM: markup language model pre-training for visually-rich document understanding
XDoc: unified pre-training for cross-format document understanding
UniSpeech: unified pre-training for self-supervised learning and supervised learning for ASR
UniSpeech-SAT: universal speech representation learning with speaker-aware pre-training
SpeechT5: encoder-decoder pre-training for spoken language processing
SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data
VLMo: Unified vision-language pre-training
VL-BEiT (
NEW): Generative Vision-Language Pre-training - evolution of BEiT to multimodal
BEiT-3 (
NEW): a general-purpose multimodal foundation model, and a major milestone of The Big Convergence of Large-scale Pre-training Across Tasks, Languages, and Modalities.
s2s-ft: sequence-to-sequence fine-tuning toolkit
Aggressive Decoding (
NEW): lossless and efficient sequence-to-sequence decoding algorithm
TrOCR: transformer-based OCR w/ pre-trained models
LayoutReader: pre-training of text and layout for reading order detection
XLM-T: multilingual NMT w/ pretrained cross-lingual encoders
General technology for enabling AI capabilities w/ LLMs and MLLMs.
Curating General, Code, Math, and QA Data for Large Language Models.
This project is licensed under the license found in the LICENSE file in the root directory of this source tree. Portions of the source code are based on the transformers project.
Microsoft Open Source Code of Conduct
For help or issues using the pre-trained models, please submit a GitHub issue.
For other communications, please contact Furu Wei ([email protected]).