[1]刘璐,贾彩燕.基于文本扩展模型的网络视频聚类方法[J].智能系统学报,2017,12(06):799-805.[doi:10.11992/tis.201706036]
 LIU Lu,JIA Caiyan.Web video clustering method based on an extended text model[J].CAAI Transactions on Intelligent Systems,2017,12(06):799-805.[doi:10.11992/tis.201706036]
点击复制

基于文本扩展模型的网络视频聚类方法(/HTML)
分享到:

《智能系统学报》[ISSN:1673-4785/CN:23-1538/TP]

卷:
第12卷
期数:
2017年06期
页码:
799-805
栏目:
出版日期:
2017-12-25

文章信息/Info

Title:
Web video clustering method based on an extended text model
作者:
刘璐12 贾彩燕12
1. 北京交通大学 交通数据分析与挖掘北京市重点实验室, 北京 100044;
2. 北京交通大学 计算机与信息技术学院, 北京 100044
Author(s):
LIU Lu12 JIA Caiyan12
1. Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing 100044, China;
2. School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China
关键词:
网络视频聚类共点击视频相关查询词文本聚类
Keywords:
web video clusteringco-click videosrelevant inquiry wordtext clustering
分类号:
TP391
DOI:
10.11992/tis.201706036
摘要:
随着视频分享网站的兴起和快速发展,互联网上的视频数量呈爆炸式增长,对视频的组织及分类成为视频有效使用的基础。视频聚类技术由于只需要考虑视频数据内在的簇结构、不需要人工干预,越来越受到人们的青睐。现有的视频聚类方法有基于视频关键帧视觉相似性的方法、基于视频标题文本聚类的方法、文本和视觉多模态融合的方法。基于视频标题文本聚类的视频聚类方法由于其简便性与高效性而被企业界广泛使用,但视频标题由于其短文本的语义稀疏特性,聚类效果欠佳。为此,本文面向社会媒体视频,提出了一种社会媒体平台上视频相关多源文本融合的视频聚类方法,以克服由于视频标题的短文本带来的语义稀疏问题。不同文本聚类算法上的实验结果证明了多源文本数据融合方法的有效性。
Abstract:
With the rapid rise and development of video sharing websites, there has been an explosive increase in web videos on the Internet. Effective organization and classification are necessary for the valid use of such videos. Video clustering technology has gained increasing popularity because it considers the internal cluster structure of video data, and no manual intervention is necessary. There are many video clustering algorithms in existence, such as those based on the visual similarity of key frames, text clustering of video titles, and multi-model fusion by integrating text and visual features. The video clustering method based on the text clustering of titles has become a widely used method in business because of its simplicity and efficiency. However, it performs poorly due to the semantic sparsity of short titles. Therefore, this paper proposes a video clustering method with related text fusion from multiple sources on social media platforms to overcome the semantic sparsity of short text. The experimental results on different text clustering algorithms demonstrate the effectiveness of this method.

参考文献/References:

[1] WU X, ZHAO W L, NGO C W. Towards google challenge: combining contextual and social information for web video categorization[C]//International Conference on Multimedia 2009. Vancouver, Canada, 2009: 1109-1110.
[2] YANG L, LIU J, YANG X, et al. Multi-modality web video categorization[C]//ACM Sigmm International Workshop on Multimedia Information Retrieval. Augsburg, Germany, 2007: 265-274.
[3] HINDLE A, SHAO J, LIN D, et al. Clustering Web video search results based on integration of multiple features[J]. World wide web, 2011, 14(1): 53-73.
[4] NGUYEN P Q, NGUYEN-THI A T, NGO T D, et al. Using textual semantic similarity to improve clustering quality of web video search results[C]//2015 IEEE Seventh International Conference on Knowledge and Systems Engineering (KSE). Ho Chi Minh, Vietnam, 2015: 156-161.
[5] LIU S, ZHU M, ZHENG Q. Mining similarities for clustering web video clips[C]//International Conference on Computer Science and Software Engineering. Wuhan, China, 2008: 759-762.
[6] KAMIE M, HASHIMOTO T, KITAGAWA H. Effective web video clustering using playlist information[C]//Proceedings of the 27th Annual ACM Symposium on Applied Computing. Trento, Italy, 2012: 949-956.
[7] HUANG H, LU Y, ZHANG F, et al. A multi-modal clustering method for web videos[J]. Communications in computer and information science, 2013, 320: 163-169.
[8] ZHANG D Q, LIN C Y, CHANG S F, et al. Semantic video clustering across sources using bipartite spectral clustering [C]//IEEE International Conference on Multimedia and Expo. Taipei, China, 2004: 117-120.
[9] ZHANG J R, SONG Y, LEUNG T. Improving video classification via youtube video co-watch data[C]//Proceedings of the 2011 ACM Workshop on Social and Behavioural Networked Media Access. Scottsdale, USA, 2011: 21-26.
[10] YIN J, WANG J. A dirichlet multinomial mixture model-based approach for short text clustering[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA, 2014: 233-242.
[11] YAN X, GUO J, LIU S, et al. Learning topics in short texts by non-negative matrix factorization on term correlation matrix[C]//Proceedings of the 2013 SIAM International Conference on Data Mining. Austin, USA, 2013: 749-757.
[12] SAHAMI M, HEILMAN T D. A Web-based kernel function for measuring the similarity of short text snippets[C]//International Conference on World Wide Web, WWW 2006. Edinburgh, Scotland, UK, 2006: 377-386.
[13] YIH W, MEEK C. Improving similarity measures for short segments of text[J]. Proceedings of artificial intelligence, Pune, India, 2007: 1489-1494.
[14] BANERJEE S, RAMANATHAN K, GUPTA A. Clustering short texts using wikipedia[C]//SIGIR 2007: Proceedings of the, International ACM SIGIR Conference on Research and Development in Information Retrieval. Amsterdam, the Netherlands, 2007: 787-788.
[15] GABRILOVICH E, MARKOVITCH S. Feature generation for text categorization using world knowledge[C]//International Joint Conference on Artificial Intelligence. Morgan Kaufmann Publishers Inc, 2005: 1048-1053.
[16] HU X, SUN N, ZHANG C, et al. Exploiting internal and external semantics for the clustering of short texts using world knowledge[C]//ACM Conference on Information and Knowledge Management 2009. Hong Kong, China, 2009: 919-928.
[17] HOTHO A, STAAB S, STUMME G. Wordnet improves text document clustering[C]//Proceedings of Semantic Web Workshop, the 26th annual International ACM SIGIR Conference. Toronto, Canada, 2003: 541-544.
[18] SONG Y, WANG H, WANG Z, et al. Short text conceptualization using a probabilistic knowledgebase[C]//Proceedings of the, International Joint Conference on Artificial Intelligence. Barcelona, Spain, 2011: 2330-2336.
[19] BLEI D M, NG A Y, JORDAN M I. Latent dirichlet allocation[J]. Journal of machine learning research, 2003, 3: 993-1022.
[20] YANG L, QIU M, GOTTIPATI S, et al. CQArank: jointly model topics and expertise in community question answering[C]//Proceedings of the 22nd ACM International Conference on Information and Knowledge Management. San Francisco, USA, 2013:99-108.
[21] ARTHUR D, VASSILVITSKⅡ S. k-means++:the advantages of careful seeding[C]//Eighteenth Acm-Siam Symposium on Discrete Algorithms 2007. New Orleans, USA, 2007: 1027-1035.
[22] CAI D, HE X, HAN J, et al. Graph regularized nonnegative matrix factorization for data representation[J]. IEEE transactions on pattern analysis and machine intelligence, 2010, 33(8): 1548-1560.
[23] YAN X, GUO J, LAN Y, et al. A biterm topic model for short texts[C]//International Conference on World Wide Web. Rio, Brazil, 2013: 1445-1456.

备注/Memo

备注/Memo:
收稿日期:2017-06-09;改回日期:。
基金项目:国家自然科学基金项目(61473030).
作者简介:刘璐,女,1994年生,硕士研究生,主要研究方向为数据挖掘、文本聚类;贾彩燕,女,1976年生,教授,博士生导师,博士,中国人工智能学会“粗糙集与软计算专业委员会”委员,主要研究方向为数据挖掘、社会计算、生物信息学。发表学术论文50余篇。
通讯作者:贾彩燕.E-mail:cyjia@bjtu.edu.cn.
更新日期/Last Update: 2018-01-03