Back to Nlp Progress

Natural language inference

english/natural_language_inference.md

0.34.3 KB
Original Source

Natural language inference

Natural language inference is the task of determining whether a "hypothesis" is true (entailment), false (contradiction), or undetermined (neutral) given a "premise".

Example:

PremiseLabelHypothesis
A man inspects the uniform of a figure in some East Asian country.contradictionThe man is sleeping.
An older and younger man smiling.neutralTwo men are smiling and laughing at the cats playing on the floor.
A soccer game with multiple males playing.entailmentSome men are playing a sport.

SNLI

The Stanford Natural Language Inference (SNLI) Corpus contains around 550k hypothesis/premise pairs. Models are evaluated based on accuracy.

State-of-the-art results can be seen on the SNLI website.

MultiNLI

The Multi-Genre Natural Language Inference (MultiNLI) corpus contains around 433k hypothesis/premise pairs. It is similar to the SNLI corpus, but covers a range of genres of spoken and written text and supports cross-genre evaluation. The data can be downloaded from the MultiNLI website.

Public leaderboards for in-genre (matched) and cross-genre (mismatched) evaluation are available, but entries do not correspond to published models.

ModelMatchedMismatchedPaper / SourceCode
RoBERTa (Liu et al., 2019)90.890.2RoBERTa: A Robustly Optimized BERT Pretraining ApproachOfficial
XLNet-Large (ensemble) (Yang et al., 2019)90.289.8XLNet: Generalized Autoregressive Pretraining for Language UnderstandingOfficial
MT-DNN-ensemble (Liu et al., 2019)87.987.4Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language UnderstandingOfficial
Snorkel MeTaL(ensemble) (Ratner et al., 2018)87.687.2Training Complex Models with Multi-Task Weak SupervisionOfficial
Finetuned Transformer LM (Radford et al., 2018)82.181.4Improving Language Understanding by Generative Pre-Training
Multi-task BiLSTM + Attn (Wang et al., 2018)72.272.1GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
GenSen (Subramanian et al., 2018)71.471.3Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

SciTail

The SciTail entailment dataset consists of 27k. In contrast to the SNLI and MultiNLI, it was not crowd-sourced but created from sentences that already exist "in the wild". Hypotheses were created from science questions and the corresponding answer candidates, while relevant web sentences from a large corpus were used as premises. Models are evaluated based on accuracy.

ModelAccuracyPaper / Source
Finetuned Transformer LM (Radford et al., 2018)88.3Improving Language Understanding by Generative Pre-Training
Hierarchical BiLSTM Max Pooling (Talman et al., 2018)86.0Natural Language Inference with Hierarchical BiLSTM Max Pooling
CAFE (Tay et al., 2018)83.3A Compare-Propagate Architecture with Alignment Factorization for Natural Language Inference

Go back to the README