Back to Nlp Progress

Named entity recognition

spanish/named_entity_recognition.md

0.34.6 KB
Original Source

Named entity recognition

Named entity recognition (NER) is the task of tagging entities in text with their corresponding type. Approaches typically use BIO notation, which differentiates the beginning (B) and the inside (I) of entities. O is used for non-entity tokens.

Example:

MarkWatneyvisitedMars
B-PERI-PEROB-LOC

(NER definition taken from english/named_entity_recognition.md)

CANTEMIST 2020

The CANTEMIST-NER 2020 task consists of Spanish oncology clinical reports corpus tagged with one entity type (MORFOLOGIA_NEOPLASIA). Models are evaluated based on span-based F1 on the test set: see evaluation scripts.

The CANTEMIST shared task contains as well an entity linking subtrack (CANTEMIST-NORM) and a document indexing subtrack (CANTEMIST-CODING).

Data link: Zenodo

ModelF1Paper / SourceCode
MRC mBERT-MLP (Xiong et al., 2020)87.0A Joint Model for Medical Named Entity Recognition and NormalizationOfficial
BETO-SciBERT (Garcia-Pablos et al., 2020)86.9Vicomtech at CANTEMIST 2020
BiLSTM-CRF+GloVe+SME+CWE (López-Úbeda et al., 2020)85.5Extracting Neoplasms Morphology Mentions in Spanish Clinical Cases through Word Embeddings
Biaffine Classifier (Lange et al., 2020)85.3NLNDE at CANTEMIST: Neural Sequence Labeling and Parsing Approaches for Clinical Concept Extraction
BETO (Han et al., 2020)85.0Pre-trained Language Model for CANTEMIST Named Entity Recognition
BiLSTM-CRF+FasText+Char (Carreto Fidalgo et al., 2020)84.5Recognai’s Working Notes for CANTEMIST-NER TrackOfficial
BiLSTM-BiLSTM-CRF+FasText+PoS+Char (Santamaria Carrasco et al., 2020)83.4Using Embeddings and Bi-LSTM+CRF Model to Detect Tumor Morphology Entities in Spanish Clinical CasesOfficial

ProfNER 2021

The ProfNER-NER 2021 task consists of Spanish COVID-19 related Twitter corpus tagged with four entity types (PROFESION,SITUACION_LABORAL,ACTIVIDAD,FIGURATIVA). Models are evaluated based on span AND label-based F1 on the test set: see Task 7 of Codalab SMM4H competition.

The ProfNER shared task contains as well a tweet classification subtrack (ProfNER-Track A).

Data link: Zenodo

ModelF1Paper / SourceCode
BETO-Linear-CRF (David Carreto Fidalgo et al., 2021)83.9RecognaiOfficial
3xBiLSTM-CRF+BPE+FastText+BETOemb (Usama Yaseen et al., 2021)82.4MIC-NLP
BiLSTM-LSTM-CRF+Char+STE+SME+BETO+Syllabes+POS (Sergio Santamaría Carrasco et al., 2021)82.3TroyOfficial
BiGRU-BiLSTM-TokenClassification-CRF+STE+Char (David Carreto Fidalgo et al., 2021)76.4RecognaiOfficial
BiLSTM-CRF+Char+STE+SME+WikiFastText (Vasile Pais, et al., 2021)75.7RACAI
30xBETO-BiLSTM (Tong Zhou et al., 2021)73.3CASIA_UnisoundOfficial
Dictionaries-CRF (Alberto Mesa Murgado et al., 2021)72.8SINAIOfficial
BiLSTM-CRF+FLAIR+FastText (Pedro Ruas et al., 2021)72.7Lasige-BioTMOfficial

Go back to the README