Back to Nlp Progress

Grammatical Error Correction

english/grammatical_error_correction.md

0.321.4 KB
Original Source

Grammatical Error Correction

Grammatical Error Correction (GEC) is the task of correcting different kinds of errors in text such as spelling, punctuation, grammatical, and word choice errors.

GEC is typically formulated as a sentence correction task. A GEC system takes a potentially erroneous sentence as input and is expected to transform it to its corrected version. See the example given below:

Input (Erroneous)Output (Corrected)
She see Tom is catched by policeman in park at last night.She saw Tom caught by a policeman in the park last night.

CoNLL-2014 Shared Task

The CoNLL-2014 shared task test set is the most widely used dataset to benchmark GEC systems. The test set contains 1,312 English sentences with error annotations by 2 expert annotators. Models are evaluated with MaxMatch scorer (Dahlmeier and Ng, 2012) which computes a span-based F<sub>β</sub>-score (β set to 0.5 to weight precision twice as recall).

The shared task setting restricts that systems use only publicly available datasets for training to ensure a fair comparison between systems. The highest published scores on the the CoNLL-2014 test set are given below. A distinction is made between papers that report results in the restricted CoNLL-2014 shared task setting of training using publicly-available training datasets only (Restricted) and those that made use of large, non-public datasets (Unrestricted).

Restricted:

ModelF0.5Paper / SourceCode
Majority-voting ensemble (7 systems) (Omelianchuk et al., BEA 2024)72.8Pillars of Grammatical Error Correction: Comprehensive Inspection Of Contemporary Approaches In The Era of Large Language Modelsofficial
GRECO (Qorib and Ng, EMNLP 2023)71.12System Combination via Quality Estimation for Grammatical Error Correctionofficial
ESC (Qorib et al., NAACL 2022)69.51Frustratingly Easy System Combination for Grammatical Error Correctionofficial
T5 (t5.1.1.xxl) trained on cLang-8 (Rothe et al., ACL-IJCNLP 2021)68.87A Simple Recipe for Multilingual Grammatical Error CorrectionT5, cLang-8
Tagged corruptions - ensemble (Stahlberg and Kumar, 2021)68.3Synthetic Data Generation for Grammatical Error Correction with Tagged Corruption ModelsOfficial
Sequence tagging + token-level transformations + two-stage fine-tuning, DeBERTa + ELECTRA + RoBERTa ensemble (Mesham et al., EACL 2023)67.93An Extended Sequence Tagging Vocabulary for Grammatical Error CorrectionOfficial
TMTC (Lai et al., ACL Findings 2022)67.02Type-Driven Multi-Turn Corrections for Grammatical Error Correctionofficial
Sequence tagging + token-level transformations + two-stage fine-tuning + (BERT, RoBERTa, XLNet), ensemble (Omelianchuk et al., BEA 2020)66.5GECToR – Grammatical Error Correction: Tag, Not RewriteOfficial
Shallow Aggressive Decoding with BART (12+2), single model (beam=1) (Sun et al., ACL 2021)66.4Instantaneous Grammatical Error Correction with Shallow Aggressive DecodingOfficial
Sequence tagging + token-level transformations + two-stage fine-tuning, DeBERTa (Mesham et al., EACL 2023)66.06An Extended Sequence Tagging Vocabulary for Grammatical Error CorrectionOfficial
DeBERTa(L) + RoBERTa(L) + XLNet (Tarnavskyi et al., ACL 2022)65.3Ensembling and Knowledge Distilling of Large Sequence Taggers for Grammatical Error CorrectionOfficial
Sequence tagging + token-level transformations + two-stage fine-tuning + XLNet, single model (Omelianchuk et al., BEA 2020)65.3GECToR – Grammatical Error Correction: Tag, Not RewriteOfficial
Transformer + Pre-train with Pseudo Data + BERT (Kaneko et al., ACL 2020)65.2Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in Grammatical Error CorrectionOfficial
Transformer + Pre-train with Pseudo Data (Kiyono et al., EMNLP 2019)65.0An Empirical Study of Incorporating Pseudo Data into Grammatical Error CorrectionOfficial
Seq2Edits ensemble + Full sequence rescoring (Stahlberg and Kumar, EMNLP 2020)62.7Seq2Edits: Sequence Transduction Using Span-level Edit OperationsOfficial
Sequence Labeling with edits using BERT, Faster inference (Ensemble) (Awasthi et al., EMNLP 2019)61.2Parallel Iterative Edit Models for Local Sequence TransductionOfficial
Copy-Augmented Transformer + Pre-train (Zhao and Wang, NAACL 2019)61.15Improving Grammatical Error Correction via Pre-Training a Copy-Augmented Architecture with Unlabeled DataOfficial
Sequence Labeling with edits using BERT, Faster inference (Single Model) (Awasthi et al., EMNLP 2019)59.7Parallel Iterative Edit Models for Local Sequence TransductionOfficial
CNN Seq2Seq + Quality Estimation (Chollampatt and Ng, EMNLP 2018)56.52Neural Quality Estimation of Grammatical Error CorrectionOfficial
SMT + BiGRU (Grundkiewicz and Junczys-Dowmunt, 2018)56.25Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine TranslationNA
Transformer (Junczys-Dowmunt et al., 2018)55.8Approaching Neural Grammatical Error Correction as a Low-Resource Machine Translation TaskOfficial
CNN Seq2Seq (Chollampatt and Ng, 2018)54.79A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error CorrectionOfficial

Unrestricted:

ModelF0.5Paper / SourceCode
CNN Seq2Seq + Fluency Boost (Ge et al., 2018)61.34Reaching Human-level Performance in Automatic Grammatical Error Correction: An Empirical StudyNA

Restricted: uses only publicly available datasets. Unrestricted: uses non-public datasets.

CoNLL-2014 10 Annotations

Bryant and Ng, 2015 released 8 additional annotations (in addition to the two official annotations) for the CoNLL-2014 shared task test set (link).

Restricted:

ModelF0.5Paper / SourceCode
GRECO (Qorib and Ng, EMNLP 2023)85.21System Combination via Quality Estimation for Grammatical Error Correctionofficial
SMT + BiGRU (Grundkiewicz and Junczys-Dowmunt, 2018)72.04Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine TranslationNA
CNN Seq2Seq (Chollampatt and Ng, 2018)70.14 (measured by Ge et al., 2018) A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error CorrectionOfficial

Unrestricted:

ModelF0.5Paper / SourceCode
CNN Seq2Seq + Fluency Boost (Ge et al., 2018)76.88Reaching Human-level Performance in Automatic Grammatical Error Correction: An Empirical StudyNA

Restricted: uses only publicly available datasets. Unrestricted: uses non-public datasets.

JFLEG

JFLEG test set released by Napoles et al., 2017 consists of 747 English sentences with 4 references for each sentence. Models are evaluated with GLEU metric (Napoles et al., 2016).

Restricted:

ModelGLEUPaper / SourceCode
Tagged corruptions (Stahlberg and Kumar, 2021)64.7Synthetic Data Generation for Grammatical Error Correction with Tagged Corruption ModelsOfficial
Transformer + Pre-train with Pseudo Data + BERT (Kaneko et al., ACL 2020)62.0Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in Grammatical Error CorrectionOfficial
SMT + BiGRU (Grundkiewicz and Junczys-Dowmunt, 2018)61.50Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine TranslationNA
Transformer (Junczys-Dowmunt et al., 2018)59.9Approaching Neural Grammatical Error Correction as a Low-Resource Machine Translation TaskNA
CNN Seq2Seq (Chollampatt and Ng, 2018)57.47 A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error CorrectionOfficial

Unrestricted:

ModelGLEUPaper / SourceCode
CNN Seq2Seq + Fluency Boost and inference (Ge et al., 2018)62.42Reaching Human-level Performance in Automatic Grammatical Error Correction: An Empirical StudyNA

Restricted: uses only publicly available datasets. Unrestricted: uses non-public datasets.

BEA Shared Task - 2019

BEA shared task - 2019 dataset released for the BEA Shared Task on Grammatical Error Correction provides a newer and bigger dataset for evaluating GEC models in 3 tracks, based on the datasets used for training:

Training and dev sets are released publicly and a GEC model's performance is evaluated by F-0.5 score. The model outputs on the test-set have to be uploaded to Codalab(publicly available) where category-wise error metrics are displayed. The test set consists of 4477 sentences(larger and diverse than the CoNLL-14 dataset) and the outputs are scored via ERRANT toolkit. The released data are collected from 2 sources:

  • Write & Improve, an online web platform that assists non-native English students with their writing.
  • LOCNESS, a corpus consisting of essays written by native English students.

The description of tracks from the BEA site is given below:

Restricted Track: In the restricted track, participants may only use the following learner datasets:

  • FCE (Yannakoudakis et al., 2011)
  • Lang-8 Corpus of Learner English (Mizumoto et al., 2011; Tajiri et al., 2012)
  • NUCLE (Dahlmeier et al., 2013)
  • W&I+LOCNESS (Bryant et al., 2019; Granger, 1998)
    Note that we restrict participants to the preprocessed Lang-8 Corpus of Learner English rather than the raw, multilingual Lang-8 Learner Corpus because participants would otherwise need to filter the raw corpus themselves. We also do not allow the use of the CoNLL 2013/2014 shared task test sets in this track.

Unrestricted Track: In the unrestricted track, participants may use anything and everything to build their systems. This includes proprietary datasets and software.

Low Resource Track (formerly Unsupervised Track): In the low resource track, participants may only use the following learner dataset: W&I+LOCNESS development set.

Since current state-of-the-art systems rely on as much annotated learner data as possible to reach the best performance, the goal of the low resource track is to encourage research into systems that do not rely on large amounts of learner data. This track should be of particular interest to researchers working on GEC for languages where large learner corpora do not exist.

Results on WI-LOCNESS test set:

Restricted track:

ModelF0.5Paper / SourceCode
Majority-voting ensemble (7 systems) (Omelianchuk et al., BEA 2024)81.4Pillars of Grammatical Error Correction: Comprehensive Inspection Of Contemporary Approaches In The Era of Large Language Modelsofficial
GRECO (Qorib and Ng, EMNLP 2023)80.84System Combination via Quality Estimation for Grammatical Error Correctionofficial
ESC (Qorib et al., NAACL 2022)79.90Frustratingly Easy System Combination for Grammatical Error Correctionofficial
TMTC (Lai et al., ACL Findings 2022)77.93Type-Driven Multi-Turn Corrections for Grammatical Error Correctionofficial
RedPenNet (Didenko & Sameliuk, UNLP 2023)77.60RedPenNet for Grammatical Error Correction: Outputs to Tokens, Attentions to Spansofficial
RoBERTa(L) + EditScorer (Sorokin, EMNLP 2022)77.1Improved grammatical error correction by ranking elementary editsofficial
Sequence tagging + token-level transformations + two-stage fine-tuning, DeBERTa + ELECTRA + RoBERTa ensemble (Mesham et al., EACL 2023)76.17An Extended Sequence Tagging Vocabulary for Grammatical Error CorrectionOfficial
DeBERTa(L) + RoBERTa(L) + XLNet (Tarnavskyi et al., ACL 2022)76.05Ensembling and Knowledge Distilling of Large Sequence Taggers for Grammatical Error CorrectionOfficial
GECToR large without synthetic pre-training - ensemble (Tarnavskyi and Omelianchuk, 2021)76.05Improving Sequence Tagging for Grammatical Error CorrectionOfficial
T5 (t5.1.1.xxl) trained on cLang-8 (Rothe et al., ACL-IJCNLP 2021)75.88A Simple Recipe for Multilingual Grammatical Error CorrectionT5, cLang-8
Tagged corruptions - ensemble (Stahlberg and Kumar, 2021)74.9Synthetic Data Generation for Grammatical Error Correction with Tagged Corruption ModelsOfficial
Sequence tagging + token-level transformations + two-stage fine-tuning + (BERT, RoBERTa, XLNet), ensemble (Omelianchuk et al., BEA 2020)73.6GECToR – Grammatical Error Correction: Tag, Not RewriteOfficial
BEA Combination73.18Learning to Combine Grammatical Error Corrections official
Sequence tagging + token-level transformations + two-stage fine-tuning, DeBERTa (Mesham et al., EACL 2023)73.09An Extended Sequence Tagging Vocabulary for Grammatical Error CorrectionOfficial
Shallow Aggressive Decoding with BART (12+2), single model (beam=1) (Sun et al., ACL 2021)72.9Instantaneous Grammatical Error Correction with Shallow Aggressive DecodingOfficial
Sequence tagging + token-level transformations + two-stage fine-tuning + XLNet, single model (Omelianchuk et al., BEA 2020)72.4GECToR – Grammatical Error Correction: Tag, Not RewriteOfficial
Transformer + Pre-train with Pseudo Data (Kiyono et al., EMNLP 2019)70.2An Empirical Study of Incorporating Pseudo Data into Grammatical Error CorrectionNA
Transformer + Pre-train with Pseudo Data + BERT (Kaneko et al., ACL 2020)69.8Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in Grammatical Error CorrectionOfficial
Transformer69.47Neural Grammatical Error Correction Systems with UnsupervisedPre-training on Synthetic DataOfficial: Code to be updated soon
Transformer69.00A Neural Grammatical Error Correction System Built OnBetter Pre-training and Sequential Transfer LearningOfficial
Ensemble of models66.78The LAIX Systems in the BEA-2019 GEC Shared TaskNA

Low-resource track:

ModelF0.5Paper / SourceCode
Transformer64.24Neural Grammatical Error Correction Systems with UnsupervisedPre-training on Synthetic DataOfficial: Code to be updated soon
Transformer58.80A Neural Grammatical Error Correction System Built OnBetter Pre-training and Sequential Transfer LearningOfficial
Ensemble of models51.81The LAIX Systems in the BEA-2019 GEC Shared TaskNA

Reference: