[1]ZHANG Lin,LIU Mingtong,ZHANG Yujie,et al.Explore the low-resource iterative paraphrase generation enhancement method[J].CAAI Transactions on Intelligent Systems,2022,17(4):680-687.[doi:10.11992/tis.202106032]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
17
Number of periods:
2022 4
Page number:
680-687
Column:
学术论文—机器学习
Public date:
2022-07-05
- Title:
-
Explore the low-resource iterative paraphrase generation enhancement method
- Author(s):
-
ZHANG Lin; LIU Mingtong; ZHANG Yujie; XU Jin’an; CHEN Yufeng
-
School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China
-
- Keywords:
-
low-resource; iterative; paraphrase generation; data enhancement; screening algorithm; neural networks model; encoder-decoder; attention mechanism
- CLC:
-
TP18
- DOI:
-
10.11992/tis.202106032
- Abstract:
-
Paraphrase generation aims to convert a given sentence into semantically consistent different sentences within the same language. At present, the success of deep neural network-based paraphrase generation models depends on large-scale paraphrase parallel corpora. When faced with new languages or new domains, the model’s performance drops sharply. We propose a low-resource iterative paraphrase generation enhancement method faced with this dilemma, which maximizes the use of monolingual and small-scale paraphrase parallel corpora to train the paraphrase generation model iteratively and generate paraphrase pseudo data to enhance the model performance. Furthermore, we propose a pseudo data screening algorithm based on fluency, semantic similarity, and expression diversity to select high-quality paraphrased pseudo data in each round of iterative training of the model. Experimental results on Quora, a public dataset, show that our proposed method exceeds the baseline model in semantic and diversity indicators using only 30% of the paraphrase corpus, which verifies the effectiveness of the proposed method.