[1]张兴,陈昊.差分隐私的高维数据发布研究综述[J].智能系统学报,2021,16(6):989-998.[doi:10.11992/tis.202104023]
 ZHANG Xing,CHEN Hao.A research review of high-dimensional data publishing based on a differential privacy model[J].CAAI Transactions on Intelligent Systems,2021,16(6):989-998.[doi:10.11992/tis.202104023]
点击复制

差分隐私的高维数据发布研究综述(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第16卷
期数:
2021年6期
页码:
989-998
栏目:
综述
出版日期:
2021-11-05

文章信息/Info

Title:
A research review of high-dimensional data publishing based on a differential privacy model
作者:
张兴 陈昊
辽宁工业大学 电子与信息工程学院,辽宁 锦州 121001
Author(s):
ZHANG Xing CHEN Hao
School of Electronics & Information Engineering, Liaoning University of Technology, Jinzhou 121001, China
关键词:
大数据发布隐私保护数据挖掘高维数据特征降维贝叶斯网络粗糙集随机投影差分隐私
Keywords:
big data publishingprivacy protectiondata miningfeature dimension reductionbayesian networkrough setrandom projectionhigh dimensional datadifferential privacy
分类号:
TP309.2
DOI:
10.11992/tis.202104023
摘要:
大数据时代的到来,使得信息量暴增的同时,数据维度也呈现几何式增长。在保护用户隐私的前提下,如何充分挖掘高维数据的可用信息,成为了大数据发布领域的研究热点和难点。差分隐私作为一种强大的隐私保护模型,被越来越多地应用到高维数据发布中。本文归纳了差分隐私及其相关方法在高维数据发布的应用,重点分析了差分隐私和特征降维、特征抽取、贝叶斯网络、树模型以及最新提出的粗糙集和随机投影等方法在高维数据发布中结合应用的优缺点,梳理了各个方法在高维数据方面的应用和对比,最后对未来差分隐私在高维数据发布中的应用方向进行了展望。
Abstract:
With the advent of the era of big data, the amount of digitally-generated information has increased dramatically, and the data dimension has also shown geometric growth. How to fully mine high-dimensional data while maintaining the user’s privacy has become a focus and a difficult research topic in the field of big data publishing. As a powerful privacy protection model, differential privacy is increasingly in use in high-dimensional data publishing. This paper summarizes the application of differential privacy and its related methods in high-dimensional data publishing, focusing on an analysis of the advantages and disadvantages of differential privacy and feature dimension reduction, feature extraction, the Bayesian network, tree model, and the latest rough set and random projection methods in high-dimensional data publishing. Moreover, we survey the application and comparison of each method in high-dimensional data and finally discuss the future application of differential privacy in high-dimensional data publishing.

参考文献/References:

[1] SWEENEY L. k-anonymity: a model for protecting privacy[J]. International journal of uncertainty, fuzziness and knowledge-based systems, 2002, 10(5): 557-570.
[2] MACHANAVAJJHALA A, GEHRKE J, KIFER D, et al. L-diversity: privacy beyond k-anonymity[C]//Proceedings of the 22nd International Conference on Data Engineering. Atlanta, USA, 2006: 24.
[3] LI Ninghui, LI Tiancheng, VENKATASUBRAMANIAN S. t-closeness: Privacy beyond k-anonymity and l-diversity[C]//2007 IEEE 23rd International Conference on Data Engineering. Istanbul, Turkey: IEEE, 2007: 106-115.
[4] WONG R C W, LI Jiuyong, FU A W C, et al. (α, k)-anonymity: An enhanced k-anonymity model for privacy preserving data publishing[C]//Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA, 2006: 754-759.
[5] 杨高明, 杨静, 张健沛. 聚类的(α,k)-匿名数据发布[J]. 电子学报, 2011, 39(8): 1941-1946
YANG Gaoming, YANG Jing, ZHANG Jianpei. Achieving (α,k)-anonymity via clustering in data publishing[J]. Acta electronica sinica, 2011, 39(8): 1941-1946
[6] 胡洁. 高维数据特征降维研究综述[J]. 计算机应用研究, 2008, 25(9): 2601-2606
HU Jie. Survey on feature dimension reduction for high-dimensional data[J]. Application research of computers, 2008, 25(9): 2601-2606
[7] 史庆伟, 从世源, 唐晓亮. LSI__LDA: 一种混合特征降维方法[J]. 计算机应用研究, 2017, 34(8): 2269-2273
SHI Qingwei, CONG Shiyuan, TANG Xiaoliang. LSI_LDA: mixture method for feature dimensionality reduction[J]. Application research of computers, 2017, 34(8): 2269-2273
[8] 吴晓婷, 闫德勤. 数据降维方法分析与研究[J]. 计算机应用研究, 2009, 26(8): 2832-2835
WU Xiaoting, YAN Deyue. Analysis and research on method of data dimensionality reduction[J]. Application research of computers, 2009, 26(8): 2832-2835
[9] 杜子芳. 多元统计分析[M]. 北京: 清华大学出版社, 2016: 240-241.
[10] DWORK C. Differential privacy[C]//Proceedings of the 33rd International Colloquium on Automata, Languages, and Programming. Venice, Italy: Springer, 2006: 1-12.
[11] DWORK C, MCSHERRY F, NISSIM K, et al. Calibrating noise to sensitivity in private data analysis [C]//Proceedings of Theory of Cryptography Conference. New York, USA: Springer, 2006: 265-284.
[12] DWORK C. Differential privacy: A survey of results[C]//Proceedings of the 5th International Conference on Theory and Applications of Models of Computation. Berlin, Heidelberg: Springer, 2008: 1-19.
[13] CHAUDHURI K, SARWATE A D, SINHA K. A near-optimal algorithm for differentially-private principal components[J]. Journal of machine learning research, 2013, 14(1): 2905-2943.
[14] KAPRALOV M, TALWAR K. On differentially private low rank approximation[C]//Proceedings of the twenty-fourth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2013). New York, USA: Society for Industrial and Applied Mathematics, 2013: 1395-1414.
[15] JIANG Xiaoqian, JI Zhanglong, WANG Shuang, et al. Differential-private data publishing through component analysis[J]. Transactions on data privacy, 2013, 6(1): 19-34.
[16] 戚名钰, 黄刘生, 陆潇榕, 等. 采用成分分析的差分隐私数据发布算法[J]. 小型微型计算机系统, 2017, 38(3): 437-443
QI Mingyu, HUANG Liusheng, LU Xiaorong, et al. Differential privacy data publish algorithm with compont analysis[J]. Journal of Chinese computer systems, 2017, 38(3): 437-443
[17] 徐亚红, 杨庚, 白云璐, 等. 面向主成分分析的差分隐私数据发布算法[J]. 网络空间安全, 2018, 9(10): 74-82
XU Yahong, YANG Geng, BAI Yunlu, et al. A differential privacy data publishing algorithm for principal component analysis[J]. Cyberspace security, 2018, 9(10): 74-82
[18] 彭长根, 赵园园, 樊玫玫. 基于最大信息系数的主成分分析差分隐私数据发布算法[J]. 信息网络安全, 2020, 20(2): 37-48
PENG Changgen, ZHAO Yuanyuan, FAN Meimei. A differential private data publishing algorithm via principal component analysis based on maximum information coefficient[J]. Netinfo security, 2020, 20(2): 37-48
[19] 顾贞, 张国印, 马春光, 等. 基于概率主成分分析的差分隐私数据发布方法[J]. 哈尔滨工程大学学报, 2021, 42(8): 1217-1223.
GU Zhen, ZHANG Guoyin, MA Chunguang, et al. Differential privacy data publishing method based on the probabilistic principal component analysis[J]. Journal of Harbin Engineering University, 2021, 42(8): 1217-1223
[20] 姚旭, 王晓丹, 张玉玺, 等. 特征选择方法综述[J]. 控制与决策, 2012, 27(2): 161-166, 192
YAO Xu, WANG Xiaodan, ZHANG Yuxi, et al. Summary of feature selection algorithms[J]. Control and decision, 2012, 27(2): 161-166, 192
[21] 万文强. 分布式的隐私保护特征选择研究[D]. 南京: 南京邮电大学, 2013.
WAN Wenqiang. Privacy preserving feature selection in distributed environment[D]. Nanjing: Nanjing University of Posts and Telecommunications, 2013.
[22] YANG Jun, LI Yun. Differentially private feature selection[C]//2014 International Joint Conference on Neural Networks (IJCNN). Beijing: IEEE, 2014: 4182-4189.
[23] 高原秀男. 数据发布中的隐私保护关键技术研究[D]. 北京: 北京邮电大学, 2018.
GAO Yuanxiunan. Research on the key technologies of privacy preserving data publishing[D]. Beijing: Beijing University of Posts and Telecommunications, 2018.
[24] 刘中锋. 基于局部学习的差分隐私集成特征选择算法[J]. 计算机技术与发展, 2018, 28(10): 79-82
LIU Zhongfeng. An ensemble feature selection algorithm with differential privacy based on local learning[J]. Computer technology and development, 2018, 28(10): 79-82
[25] 慕春棣, 戴剑彬, 叶俊. 用于数据挖掘的贝叶斯网络[J]. 软件学报, 2000, 11(5): 660-666
MU Chundi, DAI Jianbin, YE Jun. Bayesian network for data mining[J]. Journal of software, 2000, 11(5): 660-666
[26] ZHANG Jun, CORMODE G, PROCOPIUC C M, et al. PrivBayes: private data release via Bayesian networks[C]//Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. New York, USA, 2014: 1423-1434.
[27] 王良, 王伟平, 孟丹. 基于加权贝叶斯网络的隐私数据发布方法[J]. 计算机研究与发展, 2016, 53(10): 2343-2353
WANG Liang, WANG Weiping, MENG Dan. Privacy preserving data publishing via weighted Bayesian networks[J]. Journal of computer research and development, 2016, 53(10): 2343-2353
[28] 汤诗一. 基于贝叶斯网络差分隐私发布算法的研究[D]. 大连: 大连海事大学, 2017.
TANG Shiyi. The research on data publication algorithms satisfy in differential privacy[D]. Dalian: Dalian Maritime University, 2017.
[29] LI Mingzhu, MA Xuebin. Bayesian networks-based data publishing method using smooth sensitivity[C]//2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom). Melbourne, Australia: IEEE, 2018: 795-800.
[30] WEI Fengqiong, ZHANG Wei, CHEN Yunfang, et al. Differentially private high-dimensional data publication via Markov network[C]//International Conference on Security and Privacy in Communication Systems. Singapore, Singapore, 2018: 133-148.
[31] 唐雨薇. 高维数据的优化贝叶斯差分隐私方法研究[D]. 桂林: 广西师范大学, 2019.
TANG Yuwei. Research on the optimization of Bayesian differential privacy method for high-dimensional data[D]. Guilin: Guangxi Normal University, 2019.
[32] 任雪斌, 徐静怡, 杨新宇, 等. 基于Bayes网络的高维感知数据本地隐私保护发布[J]. 中国科学:信息科学, 2019, 49(12): 1586-1605
REN Xuebin, XU Jingyi, YANG Xinyu, et al. Bayesian network-based high-dimensional crowdsourced data publication with local differential privacy[J]. Science China:Information Science, 2019, 49(12): 1586-1605
[33] 肖彪, 闫宏强, 罗海宁, 等. 基于差分隐私的贝叶斯网络隐私保护算法的改进研究[J]. 信息网络安全, 2020, 20(11): 75-86
XIAO Biao, YAN Hongqiang, LUO Haining, et al. Research on improvement of Bayesizan network privacy protection algorithm based on differential privacy[J]. Information network security, 2020, 20(11): 75-86
[34] 裘国永, 张娇. 基于二分K-均值的SVM决策树自适应分类方法[J]. 计算机应用研究, 2012, 29(10): 3685-3687, 3709
QIU Guoyong, ZHANG Jiao. Adaptive SVM decision tree classification algorithm based on bisecting K-means[J]. Application research of computers, 2012, 29(10): 3685-3687, 3709
[35] CHEN Rui, XIAO Qian, ZHANG Yu, et al. Differentially private high-dimensional data publication via sampling-based inference[C]//Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2015: 129-138.
[36] 张啸剑, 陈莉, 金凯忠, 等. 基于联合树的隐私高维数据发布方法[J]. 计算机研究与发展, 2018, 55(12): 2794-2809
ZHANG Xiaojian, CHEN Li, JIN Kaizhong, et al. Private high-dimensional data publication with junction tree[J]. Journal of computer research and development, 2018, 55(12): 2794-2809
[37] 苏炜航, 程祥. 一种基于隐树模型的满足差分隐私的高维数据发布算法[J]. 小型微型计算机系统, 2018, 39(4): 681-685
SU Weihang, CHENG Xiang. Latent tree model based differentially private high-dimension data publishing algorithm[J]. Journal of Chinese computer systems, 2018, 39(4): 681-685
[38] 郝志峰, 王日宇, 蔡瑞初, 等. 基于贝叶斯网络与语义树的隐私数据发布方法[J]. 计算机工程, 2019, 45(4): 124-129
HAO Zhifeng, WANG Riyu, CAI Ruichu, et al. Privacy data publishing method based on Bayesian network and semantic tree[J]. Computer engineering, 2019, 45(4): 124-129
[39] 陆叶, 卢菁. 基于差分隐私与前缀树的搜索日志隐私保护研究[J]. 小型微型计算机系统, 2016, 37(3): 540-544
LU Ye, LU Jing. Differential privacy and prefix tree based research for search log privacy protection[J]. Small and micro computer systems, 2016, 37(3): 540-544
[40] 王晓男. 多维数据发布的差分隐私保护系统的研究与实现[D]. 北京: 北京邮电大学, 2017.
WANG Xiaonan. Research and implementation of differential privacy protection system for Multidimensional data publishing[D]. Beijing: Beijing University of Posts and Telecommunications, 2017.
[41] 邓蔚, 陈秀婷, 张清华, 等. 基于树模型的差分隐私保护算法[J]. 重庆邮电大学学报(自然科学版), 2020, 32(5): 848-856
DENG Wei, CHEN Xiuting, ZHANG Qinghua, et al. Differential privacy protection algorithms based on tree model[J]. Journal of Chongqing University of Posts and Telecommunications (natural science edition), 2020, 32(5): 848-856
[42] PAWLAK Z. AI and intelligent industrial applications: the rough set perspective[J]. Cybernetics and systems, 2000, 31(3): 227-252.
[43] 王国胤, 姚一豫, 于洪. 粗糙集理论与应用研究综述[J]. 计算机学报, 2009, 32(7): 1229-1246
WANG Guoyin, YAO Yiyu, YU Hong. A survey on rough set theory and applications[J]. Chinese journal of computers, 2009, 32(7): 1229-1246
[44] 王一斌. 基于属性重要度算法改进及应用[D]. 西安: 西安科技大学, 2015.
WANG Yibin. Algorithm improvement and application based on attribute signifficance[D]. Xi’an: Xi’an University of Science and Technology, 2015.
[45] 孙志鹏. 高维数据聚类算法的研究及应用[D]. 无锡: 江南大学, 2017.
SUN Zhipeng. Research and application of clustering algorithm on the high dimensional datasets[D]. Wuxi: Jiangnan University, 2017.
[46] LI Xianxian, LUO Chunfeng, LIU Peng, et al. Injecting differential privacy in rules extraction of rough set[C]//The International Conference on Healthcare Science and Engineering. Singapore: Springer, 2019, DOI: 10.1007/978-981-13-6837-0_13.
[47] LI Xianxian, LUO Chunfeng, LIU Peng, et al. Information entropy differential privacy: a differential privacy protection data method based on rough set theory[C]//2019 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). Fukuoka, Japan, 2019: 918-923.
[48] 张嵩, 景华炯. 基于Log-Gabor特征的非局部均值去噪算法及其加速方案研究[J]. 模式识别与人工智能, 2015, 28(3): 266-274
ZHANG Song, JING Huajiong. Log-gabor feature-based nonlocal means denoising algorithm and its acceleration scheme[J]. Pattern recognition and artificial intelligence, 2015, 28(3): 266-274
[49] DASGUPTA S, GUPTA A. An elementary proof of the Johnson-Lindenstrauss Lemma[J]. Random structures and algorithms, 1999, 22(1).
[50] 杨静, 赵家石, 张健沛. 一种面向高维数据挖掘的隐私保护方法[J]. 电子学报, 2013, 41(11): 2187-2192
YANG Jing, ZHAO Jiashi, ZHANG Jianpei. A privacy preservation method for high dimensional data mining[J]. Acta electronica sinica, 2013, 41(11): 2187-2192
[51] HARDT M, ROTH A. Beyond worst-case analysis in private singular vector computation[C]//Proceedings of the Forty-fifth Annual ACM Symposium on Theory of Computing. New York, USA: ACM, 2013: 331-340.
[52] 赵家石. 基于随机投影数据扰动的隐私保护技术研究[D]. 哈尔滨: 哈尔滨工程大学, 2014.
ZHAO Jiashi. Research on privacy-preservation technique based on random projection data perturbation[D]. Harbin: Harbin Engineering University, 2014.
[53] XU Chugui, REN Ju, ZHANG Yaoxue, et al. DPPro: differentially private high-dimensional data release via random projection[J]. IEEE transactions on information forensics and security, 2017, 12(12): 3081-3093.
[54] 孙慧中, 杨健宇, 程祥, 等. 一种基于随机投影的本地差分隐私高维数值型数据收集算法[J]. 大数据, 2020, 6(01): 3-11
SUN Huizhong, YANG Jianyu, CHENG Xiang, et al. A high-dimensional numeric data collection algorithm for local difference privacy based on random projection[J]. Big data research, 2020, 6(01): 3-11

相似文献/References:

[1]吉根林,姚 瑶.一种分布式隐私保护的密度聚类算法[J].智能系统学报,2009,4(02):137.
 JI Gen-lin,YAO Yao.Densitybased privacy preserving distributed clustering algorithm[J].CAAI Transactions on Intelligent Systems,2009,4(6):137.
[2]王健宗,肖京,朱星华,等.联邦推荐系统的协同过滤冷启动解决方法[J].智能系统学报,2021,16(1):178.[doi:10.11992/tis.202009032]
 WANG Jianzong,XIAO Jing,ZHU Xinghua,et al.Cold starts in collaborative filtering for federated recommender systems[J].CAAI Transactions on Intelligent Systems,2021,16(6):178.[doi:10.11992/tis.202009032]

备注/Memo

备注/Memo:
收稿日期:2021-04-12。
基金项目:国家自然科学基金项目(61802161);辽宁省教育厅科学研究经费项目(JZL202015402,JZL202015404)
作者简介:张兴,教授,主要研究方向为大数据安全与隐私保护。获授权或公开发明专利10项,发表学术论文60余篇;陈昊,硕士研究生,主要研究方向为大数据安全与隐私保护
通讯作者:张兴.E-mail:1123361380@qq.com
更新日期/Last Update: 2021-12-25