[1]WANG Quanbin,TAN Ying.Chinese grammatical error correction method based on data augmentation and copy mechanism[J].CAAI Transactions on Intelligent Systems,2020,15(1):99-106.[doi:10.11992/tis.202001014]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
15
Number of periods:
2020 1
Page number:
99-106
Column:
学术论文—自然语言处理与理解
Public date:
2020-01-05
- Title:
-
Chinese grammatical error correction method based on data augmentation and copy mechanism
- Author(s):
-
WANG Quanbin; TAN Ying
-
School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China
-
- Keywords:
-
self-attention mechanism; copy mechanism; sequence to sequence learning; Chinese; grammatical error correction; neural networks; text generation; fluency
- CLC:
-
TP389.1
- DOI:
-
10.11992/tis.202001014
- Abstract:
-
Chinese is a widely used language. However, due to its natural difference between Indo-European languages, Chinese learners tend to make various grammatical errors. This article proposes an automatic grammar correction method for those who will make errors like typos or improper words order. First, we built the C-Transformer model that adopts copy mechanism in the self-attention model to translate wrong text sequence to the correct one. Second, based on the public data set, a pure sequence to sequence method is utilized to generate wrong text corresponding to the correct one, and an error text filter is designed based on fluency, semantic, and syntactic measurements. Finally, since Chinese words are pictographic, based on the collected homographs and homophones dictionaries, some error samples are artificially constructed to expand training data. The experimental results show that our method can well correct typos, improper word order, missing, redundancy and other errors, and achieved the state-of-the-art performance on the standard test set of Chinese text grammatical error correction.