[1]杨正理,史文,陈海霞,等.大数据背景下高校招生策略预测[J].智能系统学报,2019,14(02):323-329.[doi:10.11992/tis.201709011]
 YANG Zhengli,SHI Wen,CHEN Haixia,et al.The strategy of college enrollment predicted with big data[J].CAAI Transactions on Intelligent Systems,2019,14(02):323-329.[doi:10.11992/tis.201709011]
点击复制

大数据背景下高校招生策略预测(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第14卷
期数:
2019年02期
页码:
323-329
栏目:
出版日期:
2019-03-05

文章信息/Info

Title:
The strategy of college enrollment predicted with big data
作者:
杨正理 史文 陈海霞 王长鹏
三江学院 机械与电气工程学院, 江苏 南京 210012
Author(s):
YANG Zhengli SHI Wen CHEN Haixia WANG Changpeng
School of mechanical and electrical engineering, SanJiang University, Nanjing 210012, China
关键词:
大数据机器学习深度学习学习算法高校招生策略预测随机森林云计算
Keywords:
big datamachine learningdeep learninglearning algorithmcollege enrollmentstrategy predictionrandom forestcloud computing
分类号:
TP311
DOI:
10.11992/tis.201709011
摘要:
在应届高中生生源不断下降、高等院校招生规模不断扩大、招生方式多元化不断发展、各院校之间招生竞争日趋激烈的条件下,利用海量招生异构数据,准确定位生源对象,做好前期招生宣传是各高等院校需要考虑的重要问题。结合云计算技术,利用并行化计算模型MapReduce和内存并行化计算框架Spark对高校招生历史数据进行分析,提出采用并行化随机森林预测高校招生策略模型,缩短了模型的预测时间、提高了模型的预测精度、增强了模型对大数据的处理能力。实验结果表明,并行化随机森林算法在不同数据集上的多方面性能均优于常用的决策树预测方法。
Abstract:
Considering the decline in the enrollment of high school students and the expansion in the scale of enrollment of colleges and universities, methods of enrollment are developing continuously, and the competition among colleges and universities is becoming fierce. Under this background, an important issue that colleges and universities need to consider is to accurately locate the source of students by using the tremendous amount of heterogeneous enrollment data and accomplish the pre-enrollment propagation. Combined with the cloud computing technology, the parallel computing model MapReduce and the memory parallel computing framework Spark are used to analyze historical enrollment data. The paralleled random forest algorithm is proposed to predict the strategy of college enrollment. This model has a shorter prediction time, improved prediction accuracy, and improved big data processing ability. The experimental result shows that the performance of the paralleled random forest algorithm in different datasets is significantly superior to the widely used decision tree prediction method.

参考文献/References:

[1] TOLLE K M, TANSLEY D S W, HEY A J G. The fourth paradigm:data-intensive scientific discovery[J]. Proceedings of the IEEE, 2011, 99(8):1334-1337.
[2] MAYER-SCHONBERGER V, CUKIER K. Big data:a revolution that will transform how we live, work and think[M]. Boston:Hodder Press, 2013.
[3] RUSITSCHKA S, EGER K, GERDES C. Smart grid data cloud:a model for utilizing cloud computing in the smart grid domain[C]//Proceedings of the First IEEE International Conference on Smart Grid Communications. Gaithersburg, MD, USA, 2010:483-488.
[4] 刘琪琛, 雷景生, 郝珈玮, 等. 基于Spark平台和并行随机森林回归算法的短期电力负荷预测[J]. 电力建设, 2017, 38(10):84-92 LIU Qichen, LEI Jingsheng, HAO Jiawei, et al. Short-Term power load forecasting based on spark platform and parallel random forest regression algorithm model[J]. Electric power construction, 2017, 38(10):84-92
[5] 王德文, 孙志伟. 电力用户侧大数据分析与并行负荷预测[J]. 中国电机工程学报, 2015, 35(3):527-537 WANG Dewen, SUN Zhiwei. Big data analysis and parallel load forecasting of electric power user side[J]. Proceedings of the CSEE, 2015, 35(3):527-537
[6] 陈旻骋, 袁景凌, 王啸岩, 等. 基于弱相关化特征子空间选择的离散化随机森林并行分类算法[J]. 计算机科学, 2016, 43(6):55-58, 90 CHEN Mincheng, YUAN Jingling, WANG Xiaoyan, et al. Parallelization of random forest algorithm based on discretization and selection of weak-correlation feature subspaces[J]. Computer science, 2016, 43(6):55-58, 90
[7] 程光, 王贵锦, 何礼, 等. 人体姿势估计中随机森林训练算法的并行化[J]. 计算机应用研究, 2014, 31(5):1558-1561, 1576 CHENG Guang, WANG Guijin, HE Li, et al. Parallelization for randomized forests used in human pose estimation[J]. Application research of computers, 2014, 31(5):1558-1561, 1576
[8] 孙晓莹, 郭飞燕. 数据挖掘在高校招生预测中的应用研究[J]. 计算机仿真, 2012, 29(4):387-391 SUN Xiaoying, GUO Feiyan. Research on data mining for college enrolment prediction[J]. Computer simulation, 2012, 29(4):387-391
[9] 韩娜, 廖晨, 许杰维, 等. 基于大数据的高校招生预测系统的设计与实现[J]. 信息技术, 2016(12):80-83 HAN Na, LIAO Chen, XU Jiewei, et al. Design and implementation of college enrollment forecasting system based on big data[J]. Information technology, 2016(12):80-83
[10] 朱丽丽. 数据挖掘在高校招生中的应用研究[J]. 计算机与现代化, 2012(8):190-194 ZHU Lili. Research on application of data mining technology in enrollment of vocational colleges[J]. Computer and modernization, 2012(8):190-194
[11] 马世龙, 乌尼日其其格, 李小平. 大数据与深度学习综述[J]. 智能系统学报, 2016, 11(6):728-742 MA Shilong, WUNIRI Qiqige, LI Xiaoping. Deep learning with big data:State of the art and development[J]. CAAI transactions on intelligent systems, 2016, 11(6):728-742
[12] 龚冬颖, 黄敏, 张洪博, 等. RGBD人体行为识别中的自适应特征选择方法[J]. 智能系统学报, 2017, 12(1):1-7 GONG Dongying, HUANG Min, ZHANG Hongbo, et al. Adaptive feature selection method for action recognition of human body in RGBD data[J]. CAAI transactions on intelligent systems, 2017, 12(1):1-7
[13] 张钢, 谢晓珊, 黄英, 等. 面向大数据流的半监督在线多核学习算法[J]. 智能系统学报, 2014, 9(3):355-363 ZHANG Gang, XIE Xiaoshan, HUANG Ying, et al. An online multi-kernel learning algorithm for big data[J]. CAAI transactions on intelligent systems, 2014, 9(3):355-363
[14] RADFORD A, METZ L, CHINTALA S. Unsupervised representation learning with deep convolutional generative adversarial networks[J]. Computer science, 2015.
[15] 孟祥萍, 周来. 基于hadoop云平台的智能电网HDFS资源存储技术研究[J]. 电测与仪表, 2014, 51(19):23-30 MENG Xiangping, ZHOU Lai. Research on resource storage technologies of HDFS for smart grid based on hadoop cloud platform[J]. Electrical measurement & instrumentation, 2014, 51(19):23-30
[16] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of go with deep neural networks and tree search[J]. Nature, 2016, 529(7587):484-489.
[17] 冯兴杰, 吴稀钰, 赵杰, 等. QAR数据仓库在Hive中的构建[J]. 计算机工程与应用, 2017, 53(11):90-94 FENG Xingjie, WU Xiyu, ZHAO Jie, et al. Data warehouse of QAR based on hive[J]. Computer engineering and applications, 2017, 53(11):90-94
[18] 马学森, 王晓洁, 韩江洪, 等. MapReduce框架下的Skyline结果优化算法[J]. 传感器与微系统, 2017, 36(2):146-149 MA Xuesen, WANG Xiaojie, HAN Jianghong, et al. Skyline result optimization algorithm based on MapReduce framework[J]. Transducer and microsystem technologies, 2017, 36(2):146-149
[19] 李帅, 吴斌, 杜修明, 等. 基于Spark的BIRCH算法并行化的设计与实现[J]. 计算机工程与科学, 2017, 39(1):35-41 LI Shuai, WU Bin, DU Xiuming, et al. Design and implementation of BIRCH algorithm parallelization based on Spark[J]. Computer engineering & science, 2017, 39(1):35-41
[20] 黄春华, 陈忠伟, 李石君. 贝叶斯决策树方法在招生数据挖掘中的应用[J]. 计算机技术与发展, 2016, 26(4):114-118 HUANG Chunhua, CHEN Zhongwei, LI Shijun. Application of Bayesian decision tree method in admission data mining[J]. Computer technology and development, 2016, 26(4):114-118
[21] 李战怀, 王国仁, 周傲英. 从数据库视角解读大数据的研究进展与趋势[J]. 计算机工程与科学, 2013, 35(10):1-11 LI Zhanhuai, WANG Guoren, ZHOU Aoying. Research progress and trends of big data from a database perspective[J]. Computer engineering & science, 2013, 35(10):1-11
[22] 吴倩红, 高军, 侯广松, 等. 实现影响因素多源异构融合的短期负荷预测支持向量机算法[J]. 电力系统自动化, 2016, 40(15):67-72, 92 WU Qianhong, GAO Jun, HOU Guangsong, et al. Short-term load forecasting support vector machine algorithm based on multi-source heterogeneous fusion of load factors[J]. Automation of electric power systems, 2016, 40(15):67-72, 92
[23] 陶永才, 丁雷道, 石磊, 等. MapReduce在线抽样分区负载均衡研究[J]. 小型微型计算机系统, 2017, 38(2):238-242 TAO Yongcai, DING Leidao, SHI Lei, et al. Research on MapReduce on-line load balancing based on sample partition[J]. Journal of Chinese computer systems, 2017, 38(2):238-242
[24] 黄有福. 数据挖掘技术在招生数据平台的应用研究[J]. 电脑知识与技术, 2015, 11(31):3-4 HUANG Youfu. Application of data mining technology in the enrollment data platform[J]. Computer knowledge and technology, 2015, 11(31):3-4

相似文献/References:

[1]叶志飞,文益民,吕宝粮.不平衡分类问题研究综述[J].智能系统学报,2009,4(02):148.
 YE Zhi-fei,WEN Yi-min,LU Bao-liang.A survey of imbalanced pattern classification problems[J].CAAI Transactions on Intelligent Systems,2009,4(02):148.
[2]刘奕群,张 敏,马少平.基于非内容信息的网络关键资源有效定位[J].智能系统学报,2007,2(01):45.
 LIU Yi-qun,ZHANG Min,MA Shao-ping.Web key resource page selection based on non-content inf o rmation[J].CAAI Transactions on Intelligent Systems,2007,2(02):45.
[3]马世龙,眭跃飞,许 可.优先归纳逻辑程序的极限行为[J].智能系统学报,2007,2(04):9.
 MA Shi-long,SUI Yue-fei,XU Ke.Limit behavior of prioritized inductive logic programs[J].CAAI Transactions on Intelligent Systems,2007,2(02):9.
[4]姚伏天,钱沄涛.高斯过程及其在高光谱图像分类中的应用[J].智能系统学报,2011,6(05):396.
 YAO Futian,QIAN Yuntao.Gaussian process and its applications in hyperspectral image classification[J].CAAI Transactions on Intelligent Systems,2011,6(02):396.
[5]文益民,强保华,范志刚.概念漂移数据流分类研究综述[J].智能系统学报,2013,8(02):95.[doi:10.3969/j.issn.1673-4785.201208012]
 WEN Yimin,QIANG Baohua,FAN Zhigang.A survey of the classification of data streams with concept drift[J].CAAI Transactions on Intelligent Systems,2013,8(02):95.[doi:10.3969/j.issn.1673-4785.201208012]
[6]杨成东,邓廷权.综合属性选择和删除的属性约简方法[J].智能系统学报,2013,8(02):183.[doi:10.3969/j.issn.1673-4785.201209056]
 YANG Chengdong,DENG Tingquan.An approach to attribute reduction combining attribute selection and deletion[J].CAAI Transactions on Intelligent Systems,2013,8(02):183.[doi:10.3969/j.issn.1673-4785.201209056]
[7]胡小生,钟勇.基于加权聚类质心的SVM不平衡分类方法[J].智能系统学报,2013,8(03):261.
 HU Xiaosheng,ZHONG Yong.Support vector machine imbalanced data classification based on weighted clustering centroid[J].CAAI Transactions on Intelligent Systems,2013,8(02):261.
[8]辛雨璇,闫子飞.基于手绘草图的图像检索技术研究进展[J].智能系统学报,2015,10(02):167.[doi:10.3969/j.issn.1673-4785.201401045]
 XIN Yuxuan,YAN Zifei.Research progress of image retrieval based on hand-drawn sketches[J].CAAI Transactions on Intelligent Systems,2015,10(02):167.[doi:10.3969/j.issn.1673-4785.201401045]
[9]丁科,谭营.GPU通用计算及其在计算智能领域的应用[J].智能系统学报,2015,10(01):1.[doi:10.3969/j.issn.1673-4785.201403072]
 DING Ke,TAN Ying.A review on general purpose computing on GPUs and its applications in computational intelligence[J].CAAI Transactions on Intelligent Systems,2015,10(02):1.[doi:10.3969/j.issn.1673-4785.201403072]
[10]孔庆超,毛文吉,张育浩.社交网站中用户评论行为预测[J].智能系统学报,2015,10(03):349.[doi:10.3969/j.issn.1673-4785.201403019]
 KONG Qingchao,MAO Wenji,ZHANG Yuhao.User comment behavior prediction in social networking sites[J].CAAI Transactions on Intelligent Systems,2015,10(02):349.[doi:10.3969/j.issn.1673-4785.201403019]
[11]马世龙,乌尼日其其格,李小平.大数据与深度学习综述[J].智能系统学报,2016,11(6):728.[doi:10.11992/tis.201611021]
 MA Shilong,WUNIRI Qiqige,LI Xiaoping.Deep learning with big data: state of the art and development[J].CAAI Transactions on Intelligent Systems,2016,11(02):728.[doi:10.11992/tis.201611021]

备注/Memo

备注/Memo:
收稿日期:2017-09-11。
基金项目:江苏省高校自然科学研究面上项目(17KJB470011).
作者简介:杨正理,男,1971年生,副教授,主要研究方向为复杂系统与计算智能、软件工程。参与2个省部级项目。发表学术论文40余篇。;史文,女,1983年生,讲师,主要研究方向为云计算与大数据、计算机软件形式化方法。参与2个省部级项目。发表10余篇学术论文。;陈海霞,女,1978年生,副教授,主要研究方向为海量信息处理的计算模型、自动推理。参与3个省部级项目。发表20余篇学术论文。
通讯作者:杨正理.E-mail:zhengli-yang@163.com
更新日期/Last Update: 2019-04-25