[1]周晶雨,王士同.对不平衡目标域的多源在线迁移学习[J].智能系统学报,2022,17(2):248-256.[doi:10.11992/tis.202012019]
ZHOU Jingyu,WANG Shitong.Multi-source online transfer learning for imbalanced target domains[J].CAAI Transactions on Intelligent Systems,2022,17(2):248-256.[doi:10.11992/tis.202012019]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
17
期数:
2022年第2期
页码:
248-256
栏目:
学术论文—机器学习
出版日期:
2022-03-05
- Title:
-
Multi-source online transfer learning for imbalanced target domains
- 作者:
-
周晶雨, 王士同
-
江南大学 人工智能与计算机学院,江苏 无锡 214122
- Author(s):
-
ZHOU Jingyu, WANG Shitong
-
School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China
-
- 关键词:
-
多源迁移学习; 在线学习; 目标域; 不平衡数据; 过采样; k近邻; 输入空间; 特征空间
- Keywords:
-
multi-source transfer learning; online learning; target domain; imbalanced data; oversampling; k-nearest neighbor; input space; feature space
- 分类号:
-
TP181
- DOI:
-
10.11992/tis.202012019
- 摘要:
-
多源在线迁移学习已经广泛地应用于相关源域中含有大量的标记数据且目标域中数据以数据流的形式达到的应用中。然而,目标域的类别分布有时是不平衡的,针对目标域每次以在线方式到达多个数据的不平衡二分类问题,本文提出了一种可以对目标域样本过采样的多源在线迁移学习算法。该算法从前面批次的样本中寻找当前批次的样本的k近邻,先少量生成多数类样本,再生成少数类使得当前批次样本的类别分布平衡。每个批次合成样本和真实样本一同训练目标域函数,从而提升目标域函数的分类性能。同时,分别设计了在目标域的输入空间和特征空间过采样的方法,并且在多个真实世界数据集上进行了综合实验,证明了所提出算法的有效性。
- Abstract:
-
Multi-source online transfer learning has been widely used in applications where the relevant source domain contains a large amount of labeled data and the data in the target domain is achieved in the form of data flow. However, the class distribution of the target domain is sometimes imbalanced. Aiming at the unbalanced binary classification problem wherein the target domain reaches multiple data online at a time, this paper proposes a multi-source online transfer learning algorithm by means of oversampling the target domain samples. First, the algorithm finds the k-nearest neighbors of the current batch of samples from the previous batch, then generates a small number of majority class samples, finally generating a minority class to balance the class distribution of the current batch of samples. Each batch of synthetic and real samples train the target domain function together, thereby improving the classification performance of the target domain function. At the same time, methods for oversampling in the input space and feature space of the target domain are designed respectively, and comprehensive experiments are conducted on multiple real-world data sets to prove the effectiveness of the proposed algorithm.
更新日期/Last Update:
1900-01-01