[1]GULNAZ Alimjan,HURXIDA Jumahun,SUN Tieli,et al.The nearest neighbor text classification method based on support vector[J].CAAI Transactions on Intelligent Systems,2018,13(5):799-807.[doi:10.11992/tis.201711007]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
13
Number of periods:
2018 5
Page number:
799-807
Column:
学术论文—自然语言处理与理解
Public date:
2018-09-05
- Title:
-
The nearest neighbor text classification method based on support vector
- Author(s):
-
GULNAZ Alimjan1; 2; 3; HURXIDA Jumahun1; SUN Tieli2; LIANG Yi1
-
1. Department of Electronics and Information Engineering, Yili Normal University, Yining 835000, China;
2. School of Information Science and Technology, Northeast Normal University, Changchun 130117, China;
3. Department of Geographical Science, Nor
-
- Keywords:
-
stemming; preprocessing; support vector machines; text categorization; classification accuracy
- CLC:
-
TP309
- DOI:
-
10.11992/tis.201711007
- Abstract:
-
Text categorization automatically assigns a set of predefined categories or topics to a document. In text classification, the representation of the document has a great influence on the learning performance of the learning machine. The aim is to achieve Kazakh text classification, according to Kazakh grammar rules, the stemming of Kazakh texts is designed to complete the preprocessing of Kazakh text. A sample distance formula based on the latest support vector machine (SVM) is proposed to avoid the selection of k-parameters. The Kazakh texts are classified by special combination of SVM and KNN classification algorithms (SV-NN). Combining the corpus of Kazakh text corpora constructed by himself, text categorization simulation experiments were conducted. Numerical experiments showed the effectiveness of the proposed algorithm and confirmed the theoretical results.