Coreference resolution

Coreference resolution is the task of clustering mentions in text that refer to the same underlying real world entities.

Example:

               +-----------+
               |           |
I voted for Obama because he was most aligned with my values", she said.
 |                                                 |            |
 +-------------------------------------------------+------------+

"I", "my", and "she" belong to the same cluster and "Obama" and "he" belong to the same cluster.

CoNLL 2012

Experiments are conducted on the data of the CoNLL-2012 shared task, which uses OntoNotes coreference annotations. Papers report the precision, recall, and F1 of the MUC, B3, and CEAFφ4 metrics using the official CoNLL-2012 evaluation scripts. The main evaluation metric is the average F1 of the three metrics.

Model	Avg F1	Paper / Source	Code
wl-coref + RoBERTa	81.0	Word-Level Coreference Resolution	Official
s2e+Longformer-Large	80.3	Coreference Resolution without Span Representations	Official
Xu et al. (2020)	80.2	Revealing the Myth of Higher-Order Inference in Coreference Resolution	Official
Joshi et al. (2019)<sup>1</sup>	79.6	SpanBERT: Improving Pre-training by Representing and Predicting Spans	Official
Joshi et al. (2019)<sup>2</sup>	76.9	BERT for Coreference Resolution: Baselines and Analysis	Official
Kantor and Globerson (2019)	76.6	Coreference Resolution with Entity Equalization	Official
Fei et al. (2019)	73.8	End-to-end Deep Reinforcement Learning Based Coreference Resolution
(Lee et al., 2017)+ELMo (Peters et al., 2018)+coarse-to-fine & second-order inference (Lee et al., 2018)	73.0	Higher-order Coreference Resolution with Coarse-to-fine Inference	Official
(Lee et al., 2017)+ELMo (Peters et al., 2018)	70.4	Deep contextualized word representations
Lee et al. (2017)	67.2	End-to-end Neural Coreference Resolution

<a name="myfootnote1">[1]</a> Joshi et al. (2019): (Lee et al., 2017)+coarse-to-fine & second-order inference (Lee et al., 2018)+SpanBERT (Joshi et al., 2019)

<a name="myfootnote2">[2]</a> Joshi et al. (2019): (Lee et al., 2017)+coarse-to-fine & second-order inference (Lee et al., 2018)+BERT (Devlin et al., 2019)

Gendered Ambiguous Pronoun Resolution

Experiments are conducted on GAP dataset. Metrics used are F1 score on Masculine (M) and Feminine (F) examples, Overall, and a Bias factor calculated as F / M.

Model	Overall F1	Masculine F1 (M)	Feminine F1 (F)	Bias (F/M)	Paper / Source	Code
Attree et al. (2019)	92.5	94.0	91.1	0.97	Gendered Ambiguous Pronouns Shared Task: Boosting Model Confidence by Evidence Pooling	GREP
Chada et al. (2019)	90.2	90.9	89.5	0.98	Gendered Pronoun Resolution using BERT and an extractive question answering formulation	CorefQA

Go back to the README