[1]HE Li,TAN Shuang,JIA Yan,et al.Hierarchical text classification with non-labeled web data[J].CAAI Transactions on Intelligent Systems,2014,9(3):330-335.[doi:10.3969/j.issn.1673-4785.201310014]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
9
Number of periods:
2014 3
Page number:
330-335
Column:
学术论文—自然语言处理与理解
Public date:
2014-06-25
- Title:
-
Hierarchical text classification with non-labeled web data
- Author(s):
-
HE Li; TAN Shuang; JIA Yan; HAN Weihong
-
School of Computer, National University of Defense Technology, Changsha 410073, China
-
- Keywords:
-
hierarchical text classification; topic hierarchy; classification without labeled data; support vector machine
- CLC:
-
TP181
- DOI:
-
10.3969/j.issn.1673-4785.201310014
- Abstract:
-
Traditional text classification methods require a labeled corpus to train classifiers, however, it is costly and time-consuming to label corpus manually. This paper proposes a hierarchical text classification method, which trains the text classifier with web data that does not require any classification labels. This method constructs web inquiry by combining classification knowledge and topic hierarchical information, searches relevant documents and extracts the learning sample from many kinds of web data, finds a classification basis to monitor the learning, and combines a hierarchical support vector machine to train classifiers. The experimental results show that this method is able to train classifiers through non-labeled web data, and gains a better result of classification with a performance that is at a level close to the supervised classification method with labeled training samples.