[1]杨正理,史文,陈海霞,等.大数据背景下高校招生策略预测[J].智能系统学报,2019,14(2):323-329.[doi:10.11992/tis.201709011]
YANG Zhengli,SHI Wen,CHEN Haixia,et al.The strategy of college enrollment predicted with big data[J].CAAI Transactions on Intelligent Systems,2019,14(2):323-329.[doi:10.11992/tis.201709011]
点击复制
《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷:
14
期数:
2019年第2期
页码:
323-329
栏目:
学术论文—机器学习
出版日期:
2019-03-05
- Title:
-
The strategy of college enrollment predicted with big data
- 作者:
-
杨正理, 史文, 陈海霞, 王长鹏
-
三江学院 机械与电气工程学院, 江苏 南京 210012
- Author(s):
-
YANG Zhengli, SHI Wen, CHEN Haixia, WANG Changpeng
-
School of mechanical and electrical engineering, SanJiang University, Nanjing 210012, China
-
- 关键词:
-
大数据; 机器学习; 深度学习; 学习算法; 高校招生; 策略预测; 随机森林; 云计算
- Keywords:
-
big data; machine learning; deep learning; learning algorithm; college enrollment; strategy prediction; random forest; cloud computing
- 分类号:
-
TP311
- DOI:
-
10.11992/tis.201709011
- 摘要:
-
在应届高中生生源不断下降、高等院校招生规模不断扩大、招生方式多元化不断发展、各院校之间招生竞争日趋激烈的条件下,利用海量招生异构数据,准确定位生源对象,做好前期招生宣传是各高等院校需要考虑的重要问题。结合云计算技术,利用并行化计算模型MapReduce和内存并行化计算框架Spark对高校招生历史数据进行分析,提出采用并行化随机森林预测高校招生策略模型,缩短了模型的预测时间、提高了模型的预测精度、增强了模型对大数据的处理能力。实验结果表明,并行化随机森林算法在不同数据集上的多方面性能均优于常用的决策树预测方法。
- Abstract:
-
Considering the decline in the enrollment of high school students and the expansion in the scale of enrollment of colleges and universities, methods of enrollment are developing continuously, and the competition among colleges and universities is becoming fierce. Under this background, an important issue that colleges and universities need to consider is to accurately locate the source of students by using the tremendous amount of heterogeneous enrollment data and accomplish the pre-enrollment propagation. Combined with the cloud computing technology, the parallel computing model MapReduce and the memory parallel computing framework Spark are used to analyze historical enrollment data. The paralleled random forest algorithm is proposed to predict the strategy of college enrollment. This model has a shorter prediction time, improved prediction accuracy, and improved big data processing ability. The experimental result shows that the performance of the paralleled random forest algorithm in different datasets is significantly superior to the widely used decision tree prediction method.
更新日期/Last Update:
2019-04-25