Back to Nlp Progress

Chinese Word Segmentation

chinese/chinese_word_segmentation.md

0.39.0 KB
Original Source

Chinese Word Segmentation

Task

Chinese word segmentation is the task of splitting Chinese text (a sequence of Chinese characters) into words.

Example:

'上海浦东开发与建设同步' → ['上海', '浦东', '开发', ‘与', ’建设', '同步']

Systems

♠ marks the system that uses character unigram as input. ♣ marks the system that uses character bigram as input.

  • Tian et al. (2020): ZEN + key-value memory networks ♠
  • Huang et al. (2019): BERT + model compression + multi-criterial learing ♠
  • Yang et al. (2018): Lattice LSTM-CRF + BPE subword embeddings ♠♣
  • Ma et al. (2018): BiLSTM-CRF + hyper-params search♠♣
  • Yang et al. (2017): Transition-based + Beam-search + Rich pretrain♠♣
  • Zhou et al. (2017): Greedy Search + word context♠
  • Chen et al. (2017): BiLSTM-CRF + adv. loss♠♣
  • Cai et al. (2017): Greedy Search+Span representation♠
  • Kurita et al. (2017): Transition-based + Joint model♠
  • Liu et al. (2016): neural semi-CRF♠
  • Cai and Zhao (2016): Greedy Search♠
  • Chen et al. (2015a): Gated Recursive NN♠♣
  • Chen et al. (2015b): BiLSTM-CRF♠♣

Evaluation

Metrics

F1-score

Dataset

Chinese Treebank 6

ModelF1Paper / SourceCode
Huang et al. (2019)97.6Toward Fast and Accurate Neural Chinese Word Segmentation with Multi-Criteria Learning
Tian et al. (2020)97.3Improving Chinese Word Segmentation with Wordhood Memory NetworksGithub
Ma et al. (2018)96.7State-of-the-art Chinese Word Segmentation with Bi-LSTMs
Yang et al. (2018)96.3Subword Encoding in Lattice LSTM for Chinese Word SegmentationGithub
Yang et al. (2017)96.2Neural Word Segmentation with Rich PretrainingGithub
Zhou et al. (2017)96.2Word-Context Character Embeddings for Chinese Word Segmentation
Chen et al. (2017)96.2Adversarial Multi-Criteria Learning for Chinese Word SegmentationGithub
Liu et al. (2016)95.5Exploring Segment Representations for Neural Segmentation ModelsGithub
Chen et al. (2015b)96.0Long Short-Term Memory Neural Networks for Chinese Word SegmentationGithub

Chinese Treebank 7

ModelF1Paper / SourceCode
Ma et al. (2018)96.6State-of-the-art Chinese Word Segmentation with Bi-LSTMs
Kurita et al. (2017)96.2Neural Joint Model for Transition-based Chinese Syntactic Analysis

AS

ModelF1Paper / SourceCode
Tian et al. (2020)96.6Improving Chinese Word Segmentation with Wordhood Memory NetworksGithub
Huang et al. (2019)96.6Toward Fast and Accurate Neural Chinese Word Segmentation with Multi-Criteria Learning
Ma et al. (2018)96.2State-of-the-art Chinese Word Segmentation with Bi-LSTMs
Yang et al. (2017)95.7Neural Word Segmentation with Rich PretrainingGithub
Cai et al. (2017)95.3Fast and Accurate Neural Word Segmentation for ChineseGithub
Chen et al. (2017)94.8Adversarial Multi-Criteria Learning for Chinese Word SegmentationGithub

CityU

ModelF1Paper / SourceCode
Tian et al. (2020)97.9Improving Chinese Word Segmentation with Wordhood Memory NetworksGithub
Huang et al. (2019)97.6Toward Fast and Accurate Neural Chinese Word Segmentation with Multi-Criteria Learning
Ma et al. (2018)97.2State-of-the-art Chinese Word Segmentation with Bi-LSTMs
Yang et al. (2017)96.9Neural Word Segmentation with Rich PretrainingGithub
Cai et al. (2017)95.6Fast and Accurate Neural Word Segmentation for ChineseGithub
Chen et al. (2017)95.6Adversarial Multi-Criteria Learning for Chinese Word SegmentationGithub

PKU

ModelF1Paper / SourceCode
Huang et al. (2019)96.6Toward Fast and Accurate Neural Chinese Word Segmentation with Multi-Criteria Learning
Tian et al. (2020)96.5Improving Chinese Word Segmentation with Wordhood Memory NetworksGithub
Yang et al. (2017)96.3Neural Word Segmentation with Rich PretrainingGithub
Ma et al. (2018)96.1State-of-the-art Chinese Word Segmentation with Bi-LSTMs
Yang et al. (2018)95.9Subword Encoding in Lattice LSTM for Chinese Word SegmentationGithub
Cai et al. (2017)95.8Fast and Accurate Neural Word Segmentation for ChineseGithub
Chen et al. (2017)94.3Adversarial Multi-Criteria Learning for Chinese Word SegmentationGithub
Liu et al. (2016)95.7Exploring Segment Representations for Neural Segmentation ModelsGithub
Cai and Zhao (2016)95.7Neural Word Segmentation Learning for ChineseGithub

MSR

ModelF1Paper / SourceCode
Tian et al. (2020)98.4Improving Chinese Word Segmentation with Wordhood Memory NetworksGithub
Ma et al. (2018)98.1State-of-the-art Chinese Word Segmentation with Bi-LSTMs
Huang et al. (2019)97.9Toward Fast and Accurate Neural Chinese Word Segmentation with Multi-Criteria Learning
Yang et al. (2018)97.8Subword Encoding in Lattice LSTM for Chinese Word SegmentationGithub
Yang et al. (2017)97.5Neural Word Segmentation with Rich PretrainingGithub
Cai et al. (2017)97.1Fast and Accurate Neural Word Segmentation for ChineseGithub
Chen et al. (2017)96.0Adversarial Multi-Criteria Learning for Chinese Word SegmentationGithub
Liu et al. (2016)97.6Exploring Segment Representations for Neural Segmentation ModelsGithub
Cai and Zhao (2016)96.4Neural Word Segmentation Learning for ChineseGithub

Go back to the README