Back to Nlp Progress

Named entity recognition

persian/named_entity_recognition.md

0.32.9 KB
Original Source

Named entity recognition

Named entity recognition (NER) is the task of tagging entities in text with their corresponding type. Approaches typically use BIO notation, which differentiates the beginning (B) and the inside (I) of entities. O is used for non-entity tokens.

Example:

MarkWatneyvisitedMars
B-PERI-PEROB-LOC

ArmanPersoNERCorpus

The ArmanPersoNERCorpus dataset contains 7,682 sentences with 250,015 tokens tagged in IOB format in six different classes, Organization, Person, Location, Facility, Event, and Product.

Download Links: ARMAN

ModelF1Paper / SourceCode
ParsBERT (Farahani et al., 2020)99.84ParsBERT: Transformer-based Model for Persian Language UnderstandingOfficial
LSTM-CRF (Hafezi, Rezaeian, 2018)86.55Neural Architecture for Persian Named Entity Recognition-
mBERT (Taher et al., 2020)84.03Beheshti-NER: Persian Named Entity Recognition Using BERTOfficial
Deep-CRF (Bokaei, Mahmoudi, 2018)81.50Improved Deep Persian Named Entity Recognition-
Deep-Local (Bokaei, Mahmoudi, 2018)79.19Improved Deep Persian Named Entity Recognition-
BiLSTM-CRF (Poostchi et al., 2018)77.45BiLSTM-CRF for Persian Named-Entity Recognition-
SVM-HMM (Poostchi et al., 2016)72.59PersoNER: Persian Named-Entity Recognition-

PEYMA

The PEYMA dataset includes 7,145 sentences with 302,530 tokens from which 41,148 tokens are tagged in IOB format in with seven different classes, Organization, Percent, Money, Location, Date, Time, and Person.

Download Links: PEYMA

ModelF1Paper / SourceCode
ParsBERT (Farahani et al., 2020)93.40ParsBERT: Transformer-based Model for Persian Language UnderstandingOfficial
mBERT (Taher et al., 2020)90.59Beheshti-NER: Persian Named Entity Recognition Using BERTOfficial
Rule-Based-CRF (Shahshahani et al., 2018)84.00PEYMA: A Tagged Corpus for Persian Named Entities-