Back to Nlp Progress

Sentiment analysis

english/sentiment_analysis.md

0.319.5 KB
Original Source

Sentiment analysis

Sentiment analysis is the task of classifying the polarity of a given text.

IMDb

The IMDb dataset is a binary sentiment analysis dataset consisting of 50,000 reviews from the Internet Movie Database (IMDb) labeled as positive or negative. The dataset contains an even number of positive and negative reviews. Only highly polarizing reviews are considered. A negative review has a score ≤ 4 out of 10, and a positive review has a score ≥ 7 out of 10. No more than 30 reviews are included per movie. Models are evaluated based on accuracy.

ModelAccuracyPaper / Source
XLNet (Yang et al., 2019)96.21XLNet: Generalized Autoregressive Pretraining for Language Understanding
BERT_large+ITPT (Sun et al., 2019)95.79How to Fine-Tune BERT for Text Classification?
BERT_base+ITPT (Sun et al., 2019)95.63How to Fine-Tune BERT for Text Classification?
ULMFiT (Howard and Ruder, 2018)95.4Universal Language Model Fine-tuning for Text Classification
Block-sparse LSTM (Gray et al., 2017)94.99GPU Kernels for Block-Sparse Weights
oh-LSTM (Johnson and Zhang, 2016)94.1Supervised and Semi-Supervised Text Categorization using LSTM for Region Embeddings
Virtual adversarial training (Miyato et al., 2016)94.1Adversarial Training Methods for Semi-Supervised Text Classification
BCN+Char+CoVe (McCann et al., 2017)91.8Learned in Translation: Contextualized Word Vectors

SST

The Stanford Sentiment Treebank contains 215,154 phrases with fine-grained sentiment labels in the parse trees of 11,855 sentences in movie reviews. Models are evaluated either on fine-grained (five-way) or binary classification based on accuracy.

Fine-grained classification (SST-5, 94,2k examples):

ModelAccuracyPaper / Source
BCN+Suffix BiLSTM-Tied+CoVe (Brahma, 2018)56.2Improved Sentence Modeling using Suffix Bidirectional LSTM
BCN+ELMo (Peters et al., 2018)54.7Deep contextualized word representations
BCN+Char+CoVe (McCann et al., 2017)53.7Learned in Translation: Contextualized Word Vectors

Binary classification (SST-2, 56.4k examples):

ModelAccuracyPaper / SourceCode
XLNet-Large (ensemble) (Yang et al., 2019)96.8XLNet: Generalized Autoregressive Pretraining for Language UnderstandingOfficial
MT-DNN-ensemble (Liu et al., 2019)96.5Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language UnderstandingOfficial
Snorkel MeTaL(ensemble) (Ratner et al., 2018)96.2Training Complex Models with Multi-Task Weak SupervisionOfficial
MT-DNN (Liu et al., 2019)95.6Multi-Task Deep Neural Networks for Natural Language UnderstandingOfficial
Bidirectional Encoder Representations from Transformers (Devlin et al., 2018)94.9BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingOfficial
Block-sparse LSTM (Gray et al., 2017)93.2GPU Kernels for Block-Sparse WeightsOffical
bmLSTM (Radford et al., 2017)91.8Learning to Generate Reviews and Discovering SentimentUnoffical
Single layer bilstm distilled from BERT (Tang et al., 2019)90.7Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
BCN+Char+CoVe (McCann et al., 2017)90.3Learned in Translation: Contextualized Word VectorsOfficial
Neural Semantic Encoder (Munkhdalai and Yu, 2017)89.7Neural Semantic Encoders
BLSTM-2DCNN (Zhou et al., 2017)89.5Text Classification Improved by Integrating Bidirectional LSTM with Two-dimensional Max Pooling

Yelp

The Yelp Review dataset consists of more than 500,000 Yelp reviews. There is both a binary and a fine-grained (five-class) version of the dataset. Models are evaluated based on error (1 - accuracy; lower is better).

Fine-grained classification:

ModelErrorPaper / Source
XLNet (Yang et al., 2019)27.80XLNet: Generalized Autoregressive Pretraining for Language Understanding
BERT_large+ITPT (Sun et al., 2019)28.62How to Fine-Tune BERT for Text Classification?
BERT_base+ITPT (Sun et al., 2019)29.42How to Fine-Tune BERT for Text Classification?
ULMFiT (Howard and Ruder, 2018)29.98Universal Language Model Fine-tuning for Text Classification
DPCNN (Johnson and Zhang, 2017)30.58Deep Pyramid Convolutional Neural Networks for Text Categorization
CNN (Johnson and Zhang, 2016)32.39Supervised and Semi-Supervised Text Categorization using LSTM for Region Embeddings
Char-level CNN (Zhang et al., 2015)37.95Character-level Convolutional Networks for Text Classification

Binary classification:

ModelErrorPaper / Source
XLNet (Yang et al., 2019)1.55XLNet: Generalized Autoregressive Pretraining for Language Understanding
BERT_large+ITPT (Sun et al., 2019)1.81How to Fine-Tune BERT for Text Classification?
BERT_base+ITPT (Sun et al., 2019)1.92How to Fine-Tune BERT for Text Classification?
ULMFiT (Howard and Ruder, 2018)2.16Universal Language Model Fine-tuning for Text Classification
DPCNN (Johnson and Zhang, 2017)2.64Deep Pyramid Convolutional Neural Networks for Text Categorization
CNN (Johnson and Zhang, 2016)2.90Supervised and Semi-Supervised Text Categorization using LSTM for Region Embeddings
Char-level CNN (Zhang et al., 2015)4.88Character-level Convolutional Networks for Text Classification

SemEval

SemEval (International Workshop on Semantic Evaluation) has a specific task for Sentiment analysis. Latest year overview of such task (Task 4) can be reached at: http://www.aclweb.org/anthology/S17-2088

SemEval-2017 Task 4 consists of five subtasks, each offered for both Arabic and English:

  1. Subtask A: Given a tweet, decide whether it expresses POSITIVE, NEGATIVE or NEUTRAL sentiment.

  2. Subtask B: Given a tweet and a topic, classify the sentiment conveyed towards that topic on a two-point scale: POSITIVE vs. NEGATIVE.

  3. Subtask C: Given a tweet and a topic, classify the sentiment conveyed in the tweet towards that topic on a five-point scale: STRONGLYPOSITIVE, WEAKLYPOSITIVE, NEUTRAL, WEAKLYNEGATIVE, and STRONGLYNEGATIVE.

  4. Subtask D: Given a set of tweets about a topic, estimate the distribution of tweets across the POSITIVE and NEGATIVE classes.

  5. Subtask E: Given a set of tweets about a topic, estimate the distribution of tweets across the five classes: STRONGLYPOSITIVE, WEAKLYPOSITIVE, NEUTRAL, WEAKLYNEGATIVE, and STRONGLYNEGATIVE.

Subtask A results:

ModelF1-scorePaper / Source
LSTMs+CNNs ensemble with multiple conv. ops (Cliche. 2017)0.685BB twtr at SemEval-2017 Task 4: Twitter Sentiment Analysis with CNNs and LSTMs
Deep Bi-LSTM+attention (Baziotis et al., 2017)0.677DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis

Aspect-based sentiment analysis

Sentihood

Sentihood is a dataset for targeted aspect-based sentiment analysis (TABSA), which aims to identify fine-grained polarity towards a specific aspect. The dataset consists of 5,215 sentences, 3,862 of which contain a single target, and the remainder multiple targets.

Dataset mirror: https://github.com/uclmr/jack/tree/master/data/sentihood

ModelAspect (F1)Sentiment (acc)Paper / SourceCode
QACG-BERT (Wu and Ong, 2020)89.793.8Context-Guided BERT for Targeted Aspect-Based Sentiment AnalysisOfficial
Sun et al. (2019)87.993.6Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary SentenceOfficial
Liu et al. (2018)78.591.0Recurrent Entity Networks with Delayed Memory Update for Targeted Aspect-based Sentiment AnalysisOfficial
SenticLSTM (Ma et al., 2018)78.289.3Targeted Aspect-Based Sentiment Analysis via Embedding Commonsense Knowledge into an Attentive LSTM
LSTM-LOC (Saeidi et al., 2016)69.381.9Sentihood: Targeted aspect based sentiment analysis dataset for urban neighbourhoods

SemEval-2014 Task 4

The SemEval-2014 Task 4 contains two domain-specific datasets for laptops and restaurants, consisting of over 6K sentences with fine-grained aspect-level human annotations.

The task consists of the following subtasks:

  • Subtask 1: Aspect term extraction

  • Subtask 2: Aspect term polarity

  • Subtask 3: Aspect category detection

  • Subtask 4: Aspect category polarity

Preprocessed dataset: https://github.com/songyouwei/ABSA-PyTorch/tree/master/datasets/semeval14
https://github.com/howardhsu/BERT-for-RRC-ABSA (with both subtask 1 and subtask 2)

Subtask 1 results (SemEval-2014 Task 4 for Laptop and SemEval-2016 Task 5 for Restaurant):

ModelLaptop (F1)Restaurant (F1)Paper / SourceCode
ACE + fine-tune (Wang et al., 2020)87.481.3Automated Concatenation of Embeddings for Structured PredictionOfficial
BERT-PT (Hu, Xu, et al., 2019)84.2677.97BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysisofficial
DE-CNN (Hu, Xu, et al., 2018)81.5974.37Double Embeddings and CNN-based Sequence Labeling for Aspect Extractionofficial
MIN (Li, Xin, et al., 2017)77.5873.44[Deep Multi-Task Learning for Aspect Term Extraction with Memory Interaction]
RNCRF (Wang, Wenya. et al., 2016)78.4269.74Recursive Neural Conditional Random Fields for Aspect-based Sentiment Analysisofficial

Subtask 2 results:

ModelRestaurant (acc)Laptop (acc)Paper / SourceCode
BERT-ADA (Rietzler, Alexander, et al., 2019)87.8980.23Adapt or Get Left Behind: Domain Adaptation through BERT Language Model Finetuning for Aspect-Target Sentiment Classificationofficial
LCF-BERT (Zeng, Yang, et al., 2019)87.1482.45LCF: A Local Context Focus Mechanism for Aspect-Based Sentiment Classificationofficial / Link
BERT-PT (Hu, Xu, et al., 2019)84.9578.07BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysisofficial
AOA (Huang, Binxuan, et al., 2018)81.2074.50Aspect Level Sentiment Classification with Attention-over-Attention Neural NetworksLink
TNet (Li, Xin, et al., 2018)80.7976.01Transformation Networks for Target-Oriented Sentiment ClassificationOfficial / Link
RAM (Chen, Peng, et al., 2017)80.2374.49Recurrent Attention Network on Memory for Aspect Sentiment AnalysisLink
MemNet (Tang, Duyu, et al., 2016)80.9572.21Aspect Level Sentiment Classification with Deep Memory NetworkOfficial / Link
IAN (Ma, Dehong, et al., 2017)78.6072.10Interactive Attention Networks for Aspect-Level Sentiment ClassificationLink
ATAE-LSTM (Wang, Yequan, et al. 2016)77.2068.70Attention-based lstm for aspect-level sentiment classificationLink
TD-LSTM (Tang, Duyu, et al., 2016)75.6368.13Effective LSTMs for Target-Dependent Sentiment ClassificationOfficial / Link

Sentiment classification with user and product information

This is the same task on sentiment classification, where the given text is a review, but we are also additionally given (a) the user who wrote the text, and (b) the product which the text is written for. There are three widely used datasets, introduced by Tang et. al (2015): IMDB, Yelp 2013, and Yelp 2014. Evaluation is done using both accuracy and RMSE, but for brevity, we only provide the accuracy here. Please look at the papers for the RMSE values.

ModelIMDB (acc)Yelp 2013 (acc)Yelp 2014 (acc)Paper / SourceCode
MA-BERT (Zhang, et al., 2021)57.370.371.4MA-BERT: Learning Representation by Incorporating Multi-Attribute Knowledge in TransformersLink
IUPC (Lyu, et al., 2020)53.870.571.2Improving Document-Level Sentiment Analysis with User and Product ContextLink
BiLSTM+CHIM (Amplayo, 2019)56.467.869.2Rethinking Attribute Representation and Injection for Sentiment ClassificationLink
BiLSTM + linear-basis-cust (Kim, et al., 2019)-67.1-Categorical Metadata Representation for Customized Text ClassificationLink
CMA (Ma, et al., 2017)54.066.467.6Cascading Multiway Attention for Document-level Sentiment Classification-
DUPMN (Long, et al., 2018)53.966.267.6Dual Memory Network Model for Biased Product Review Classification-
HCSC (Amplayo, et al., 2018)54.265.7-Cold-Start Aware User and Product Attention for Sentiment ClassificationLink
NSC (Chen, et al., 2016)53.365.066.7Neural Sentiment Classification with User and Product AttentionLink
UPDMN (Dou, 2017)46.563.961.3Capturing User and Product Information for Document Level Sentiment Analysis with Deep Memory Network-
UPNN (Tang, et al., 2016)43.559.660.8Learning Semantic Representations of Users and Products for Document Level Sentiment ClassificationLink

Subjectivity analysis

A related task to sentiment analysis is the subjectivity analysis with the goal of labeling an opinion as either subjective or objective.

SUBJ

Subjectivity dataset includes 5,000 subjective and 5,000 objective processed sentences.

ModelAccuracyPaper / Source
AdaSent (Zhao et al., 2015)95.50Self-Adaptive Hierarchical Sentence Model
CNN+MCFA (Amplayo et al., 2018)94.80Translations as Additional Contexts for Sentence Classification
Byte mLSTM (Radford et al., 2017)94.60Learning to Generate Reviews and Discovering Sentiment
USE (Cer et al., 2018)93.90Universal Sentence Encoder
Fast Dropout (Wang and Manning, 2013)93.60Fast Dropout Training

Go back to the README