[1]HU Feng,LI Luzheng,DAI Jin,et al.Active learning combined with clustering boundary sampling[J].CAAI Transactions on Intelligent Systems,2024,19(2):482-492.[doi:10.11992/tis.202205020]
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
19
Number of periods:
2024 2
Page number:
482-492
Column:
人工智能院长论坛
Public date:
2024-03-05
- Title:
-
Active learning combined with clustering boundary sampling
- Author(s):
-
HU Feng; LI Luzheng; DAI Jin; LIU Qun
-
School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
-
- Keywords:
-
active learning; machine learning; cluster boundary; density peak clustering; geometric sampling; entropy; version space; active clustering
- CLC:
-
TP301
- DOI:
-
10.11992/tis.202205020
- Abstract:
-
Active learning is a machine learning method that requires the selection of the most valuable samples for labeling. Currently, active learning encounters certain challenges in its practical application. It relies on prior assumptions of the classifier, which can lead to unexpected declines in classifier performance and requires a specific number of samples as an initial condition. Clustering, which can reduce the complexity of a problem, serves as an effective tool in active learning. Based on density clustering boundary sampling, this study focuses on active learning methods. First, a method of sampling boundary points in density peak clustering is introduced. This method calculates the sample density for a clustering boundary region that is prone to classification errors. Subsequently, with a specified definition of density entropy, an active learning method based on cluster boundary sampling is proposed. This method employs density entropy for the heuristic search of cluster boundary regions. The experimental results show that the proposed algorithm, compared with the five active learning algorithms referenced in the literature, can achieve equal or even higher classification performance with fewer markers. This proves that it is an effective active learning algorithm. When the number of labeled samples is less than 20% of the total number of unlabeled samples, the algorithm achieves better results in the accuracy and F-score metrics.