[1]钱冬,王蓓,张涛,等.结合Copula理论与贝叶斯决策理论的分类算法[J].智能系统学报编辑部,2016,11(1):78-83.[doi:10.11992/tis.201509011]
 QIAN Dong,WANG Bei,ZHANG Tao,et al.Classification algorithm based on Copula theory and Bayesian decision theory[J].CAAI Transactions on Intelligent Systems,2016,11(1):78-83.[doi:10.11992/tis.201509011]
点击复制

结合Copula理论与贝叶斯决策理论的分类算法(/HTML)
分享到:

《智能系统学报》编辑部[ISSN:1673-4785/CN:23-1538/TP]

卷:
第11卷
期数:
2016年1期
页码:
78-83
栏目:
出版日期:
2016-02-25

文章信息/Info

Title:
Classification algorithm based on Copula theory and Bayesian decision theory
作者:
钱冬1 王蓓1 张涛2 王行愚1
1. 华东理工大学信息科学与工程学院, 上海 200237;
2. 清华大学自动化系, 北京 100086
Author(s):
QIAN Dong1 WANG Bei1 ZHANG Tao2 WANG Xingyu1
1. School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China;
2. Department of Automation, Tsinghua University, Beijing 100086, China
关键词:
机器学习贝叶斯决策理论Copula理论核密度估计生物电信号
Keywords:
machine learningBayesian decision theoryCopula theorykernel density estimationphysiological signals
分类号:
TP391.4
DOI:
10.11992/tis.201509011
摘要:
传统的贝叶斯决策分类算法易受类条件概率密度函数估计的影响,可能会对分类结果造成干扰。对此本文提出来一种改进的贝叶斯决策分类算法,即Bayesian-Copula判别分类器(BCDC)。该方法无需对类条件概率密度函数的形式进行假设,而是将Copula理论和核密度估计相结合进行函数构建,利用核密度估计平滑特征的概率分布,概率积分变换将特征的累计概率分布转化为均匀分布,Copula函数构建2个类别的边缘累积分布之间的相关性。随后,用极大似然估计方法确定Copula函数的参数,贝叶斯信息准则(BIC)用于选择最合适的Copula函数。通过生物电信号的仿真实验进行模型验证,结果表明相比传统的概率模型,提出的分类算法在分类精度和AUC两个性能指标上表现较好,鲁棒性更强,说明了BCDC模型充分利用Copula理论和核密度估计的优点,提高了估计的准确性和灵活性。
Abstract:
Traditional Bayesian decision classification algorithm is easily affected by the estimation of class-conditional probability densities, a fact that may result in incorrect classification results. Therefore, this paper proposes an improved classification algorithm based on Bayesian decision, i.e., Bayesian-Copula Discriminant Classifier (BCDC). This method constructs class-conditional probability densities by combining Copula theory and kernel density estimation instead of making assumptions on the form of class-conditional probability densities. Kernel density estimation is used to smooth the probability distribution of each feature. By performing probability integral transform, continuous distribution is converted to random variables having a uniform distribution. Then, Copula functions are used to construct the dependency structure between these probability distributions for two categories. Moreover, the maximum likelihood estimation is applied to determine the parameters of Copula functions, and two well-fitted Copula functions for two categories are selected based on Bayesian information criterion. The BCDC method was validated with experimental datasets of physiological signals. The obtained results showed that the proposed method outperforms other traditional methods in terms of classification accuracy and AUC as well as robustness. Moreover, it takes full advantage of Copula theory and kernel density estimation and improves the accuracy and flexibility of the estimation.

参考文献/References:

[1] TIPPING M E. Sparse Bayesian learning and the relevance vector machine[J]. Journal of machine learning research, 2001, 1(3):211-244.
[2] XUE Jinghao, HALL P. Why does rebalancing class-unbalanced data improve AUC for linear discriminant analysis?[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 37(5):1109-1112.
[3] FERNÁNDEZ-DELGADO M, CERNADAS E, BARRO S, et al. Do we need hundreds of classifiers to solve real world classification problems?[J]. Journal of machine learning research, 2014, 15(1):3133-3181.
[4] RODRIGUEZ A, LAIo A. Clustering by fast search and find of density peaks[J]. Science, 2014, 344(6191):1492-1496.
[5] 李宏伟, 刘扬, 卢汉清, 等. 结合半监督核的高斯过程分类[J]. 自动化学报, 2009, 35(7):888-895. LI Hongwei, LIU Yang, LU Hanqing, et al. Gaussian processes classification combined with semi-supervised kernels[J]. Acta automatica sinica, 2009, 35(7):888-895.
[6] BLEI D M, NG A Y, JORDAN M I. Latent dirichlet allocation[J]. Journal of machine learning research, 2001, 3(4-5):993-1022.
[7] BISHOP C M. Pattern Recognition and Machine Learning[M]. New York:Springer, 2006:21-31.
[8] NG A Y, JORDAN M I. On discriminative vs. generative classifiers:a comparison of logistic regression and naïve Bayes[C]//Advances in Neural Information Processing Systems. Vancouver, British Columbia, Canada, 2002, 14:841-848.
[9] 李航. 统计学习方法[M]. 北京:清华大学出版社, 2012:77-91.
[10] JAIN A K, DUIN R P W, MAO Jianchang. Statistical pattern recognition:a review[J]. IEEE transactions on pattern analysis and machine intelligence, 2000, 22(1):4-37.
[11] DUDA R O, HART P E, STORK D G. Pattern Classification[M]. 2nd ed. New York:Wiley, 2001:20-45.
[12] MURPHY K P. Machine Learning:A Probabilistic Perspective[M]. England:MIT, 2012:82-87.
[13] NELSEN R B. An Introduction to Copulas[M]. 2nd ed. Springer:Berlin, 2006.
[14] GENEST C, FAVRE A C. Everything you always wanted to know about Copula modeling but were afraid to ask[J]. Journal of hydrologic engineering, 2007, 12(4):347-368.
[15] EBAN E, ROTHSCHILD G, MIZRAHI A, et al. Dynamic Copula networks for modeling real-valued time series[C]//Proceedings of the 16th International Conference on Artificial Intelligence and Statistics. Scottsdale, AZ, USA, 2013, 4:247-255.
[16] KRISTAN M, LEONARDIS A, SKOC AJ D. Multivariate online kernel density estimation with Gaussian kernels[J]. Pattern recognition, 2011, 44(10-11):2630-2642.
[17] CHERUBINI U, LUCIANO E, VECCHIATO W. Copula Methods in Finance[M]. England:John Wiley & Sons, 2004.
[18] PATTON A J. A review of Copula models for economic time series[J]. Journal of multivariate analysis, 2012, 110:4-18.
[19] AUBASI A. Classification of EMG signals using PSO optimized SVM for diagnosis of neuromuscular disorders[J]. Computers in biology and medicine, 2013, 43(5):576-586.
[20] TAGLUK M E, SEZGIN N, AKIN M. Estimation of sleep stages by an artificial neural network employing EEG, EMG and EOG[J]. Journal of medical systems, 2010, 34(4):717-725.
[21] CICHOCKI A, MANDIC D, DE LATHAUWER L, et al. Tensor decompositions for signal processing applications:from two-way to multiway component analysis[J]. IEEE signal processing, 2015, 32(2):145-163.
[22] KHUSHABA R N, KODAGODA S, LAL S, et al. Driver drowsiness classification using fuzzy wavelet-packet-based feature-extraction algorithm[J]. IEEE transactions on biomedical engineering, 2011, 58(1):121-131.

相似文献/References:

[1]叶志飞,文益民,吕宝粮.不平衡分类问题研究综述[J].智能系统学报编辑部,2009,4(02):148.
 YE Zhi-fei,WEN Yi-min,LU Bao-liang.A survey of imbalanced pattern classification problems[J].CAAI Transactions on Intelligent Systems,2009,4(1):148.
[2]刘奕群,张 敏,马少平.基于非内容信息的网络关键资源有效定位[J].智能系统学报编辑部,2007,2(01):45.
 LIU Yi-qun,ZHANG Min,MA Shao-ping.Web key resource page selection based on non-content inf o rmation[J].CAAI Transactions on Intelligent Systems,2007,2(1):45.
[3]马世龙,眭跃飞,许 可.优先归纳逻辑程序的极限行为[J].智能系统学报编辑部,2007,2(04):9.
 MA Shi-long,SUI Yue-fei,XU Ke.Limit behavior of prioritized inductive logic programs[J].CAAI Transactions on Intelligent Systems,2007,2(1):9.
[4]姚伏天,钱沄涛.高斯过程及其在高光谱图像分类中的应用[J].智能系统学报编辑部,2011,6(05):396.
 YAO Futian,QIAN Yuntao.Gaussian process and its applications in hyperspectral image classification[J].CAAI Transactions on Intelligent Systems,2011,6(1):396.
[5]文益民,强保华,范志刚.概念漂移数据流分类研究综述[J].智能系统学报编辑部,2013,8(02):95.[doi:10.3969/j.issn.1673-4785.201208012]
 WEN Yimin,QIANG Baohua,FAN Zhigang.A survey of the classification of data streams with concept drift[J].CAAI Transactions on Intelligent Systems,2013,8(1):95.[doi:10.3969/j.issn.1673-4785.201208012]
[6]杨成东,邓廷权.综合属性选择和删除的属性约简方法[J].智能系统学报编辑部,2013,8(02):183.[doi:10.3969/j.issn.1673-4785.201209056]
 YANG Chengdong,DENG Tingquan.An approach to attribute reduction combining attribute selection and deletion[J].CAAI Transactions on Intelligent Systems,2013,8(1):183.[doi:10.3969/j.issn.1673-4785.201209056]
[7]胡小生,钟勇.基于加权聚类质心的SVM不平衡分类方法[J].智能系统学报编辑部,2013,8(03):261.
 HU Xiaosheng,ZHONG Yong.Support vector machine imbalanced data classification based on weighted clustering centroid[J].CAAI Transactions on Intelligent Systems,2013,8(1):261.
[8]丁科,谭营.GPU通用计算及其在计算智能领域的应用[J].智能系统学报编辑部,2015,10(01):1.[doi:10.3969/j.issn.1673-4785.201403072]
 DING Ke,TAN Ying.A review on general purpose computing on GPUs and its applications in computational intelligence[J].CAAI Transactions on Intelligent Systems,2015,10(1):1.[doi:10.3969/j.issn.1673-4785.201403072]
[9]孔庆超,毛文吉,张育浩.社交网站中用户评论行为预测[J].智能系统学报编辑部,2015,10(03):349.[doi:10.3969/j.issn.1673-4785.201403019]
 KONG Qingchao,MAO Wenji,ZHANG Yuhao.User comment behavior prediction in social networking sites[J].CAAI Transactions on Intelligent Systems,2015,10(1):349.[doi:10.3969/j.issn.1673-4785.201403019]
[10]姚霖,刘轶,李鑫鑫,等.词边界字向量的中文命名实体识别[J].智能系统学报编辑部,2016,11(1):37.[doi:10.11992/tis.201507065]
 YAO Lin,LIU Yi,LI Xinxin,et al.Chinese named entity recognition via word boundarybased character embedding[J].CAAI Transactions on Intelligent Systems,2016,11(1):37.[doi:10.11992/tis.201507065]

备注/Memo

备注/Memo:
收稿日期:2015-09-06;改回日期:。
基金项目:上海市科委科技创新行动计划-生物医药领域产学研医合作资助项目(12DZ1940903).
作者简介:钱冬,男,1990年生,硕士研究生,主要研究方向为机器学习、生物电信号;王蓓,女,1976年生,副研究员,主要研究方向为智能信息处理和模式分类、复杂系统及其在人工生命科学中的应用。曾参与国家自然科学基金、上海市科委科技创新行动计划等项目。发表学术论文50余篇,被SCI、EI检索30余篇;张涛,男,1969年生,教授,博士生导师,主要研究方向为控制理论及应用、信号处理、机器人控制等。主持或参与国家973项目、国家863项目、国家自然科学基金项目多项。曾获得教育部自然科学奖、军队科技进步奖、中国电子信息科学技术奖等。发表论文200余篇,其中被SCI检索40余篇,EI检索120余篇。
通讯作者:王蓓.E-mail:beiwang@ecust.edu.cn.
更新日期/Last Update: 1900-01-01