<-上一篇/Previous Article 下一篇/Next Article->

[1]谭营,朱元春.反垃圾电子邮件方法研究进展[J].智能系统学报,2010,5(3):189-201.
　TAN Ying,ZHU Yuan-chun.Advances in antispam techniques[J].CAAI Transactions on Intelligent Systems,2010,5(3):189-201.

点击复制

反垃圾电子邮件方法研究进展

PDF下载 HTML

《智能系统学报》[ISSN 1673-4785/CN 23-1538/TP] 卷: 5 期数: 2010年第3期页码: 189-201 栏目: 综述出版日期: 2010-06-25

Title:: Advances in antispam techniques

文章编号:: 1673-4785(2010)03-0189-13

作者:: 谭营^1,2,朱元春^1,2; 1.北京大学机器感知与智能教育部重点实验室,北京 100871；
2.北京大学信息科学技术学院，北京100871

Author(s):: TAN Ying^1，2, ZHU Yuan-chun^1,2; 1.Key Laboratory of Machine Perception (MOE), Peking University, Beijing 100871, China;
2.School of Electronics Engineering and Computer Science, Peking University， Beijing 100871, China

关键词:: 反垃圾电子邮件; 特征提取; 智能检测技术; 性能评估

Keywords:: antispam; feature extraction; intelligent detection technique; performance evaluation

分类号:: TP393

文献标志码:: A

摘要:: 随着垃圾电子邮件对互联网技术的威胁日益严峻，反垃圾电子邮件研究已成为当今的研究热点.综述了反垃圾电子邮件研究的历史、现状和最新进展.首先介绍并分析了3种类型的邮件特征提取方法——基于文本、图片和行为的特征提取方法.然后，在此基础上，详细论述了当前的反垃圾邮件技术——法律手段、简单方法和智能型处理技术.接着，介绍了反垃圾邮件系统性能评估准则和标准数据集.最后，对反垃圾电子邮件现状进行了分析和总结，并指明了未来反垃圾电子邮件研究的发展方向，包括改进邮件特征提取技术、完善相关法案和引入新的智能反垃圾邮件方法.

Abstract:: As the threat of spam on the Internet grows increasingly severe, antispam techniques have become a hotspot for researchers. The authors reviewed the history, current situation, and latest advances in research on spam control. First, we introduced and analyzed three different types of feature extraction methods for email. These were textbased, imagebased, and behaviorbased approaches. Then, current antispam techniques were described and discussed. These included laws, simple methods, and intelligent approaches. After that, performance evaluation methods and standard data sets were discussed. Finally, we summarized the current research on antispam techniques and pointed out directions for future research, including improvements to email feature extraction techniques, improvements to laws, and new intelligent antispam approaches. 

参考文献/References:: ［1］CRANOR L F, LAMACCHIA B A. Spam!［J］. Communications of the ACM, 1998, 41(8): 7483.
［2］GANSTERER W, ILGER M, LECHNER P, et al. Antispam methods—stateoftheart［EB/OL］. ［20091105］. http://spam.ani.univie.ac.at/files/FA3840181.pdf.
［3］中国互联网协会反垃圾邮件中心. 2008年第一次中国反垃圾邮件状况调查报告［EB/OL］. ［20091105］. http://www.antispam.cn/.
［4］Symantec Inc.. The state of spam, a monthly report—February 2009［EB/OL］. ［20091105］. http://eval.symantec.com/mktginfo/enterprise/other_resources/bstate_of_spam_report_022009.enus.pdf.
［5］JENNINGS R. Cost of spam is flattening—our 2009 prediction［EB/OL］. ［20091105］. http://www.ferris.com/2009/01/28/costofspamisflatteningour2009predictions/.
［6］Sophos Inc.. Security threat report, July 2009 update: a look at the challenge ahead［EB/OL］. ［20091107］. http://www.inuit.se/pub/1214/sophossecuritythreatreportjul2009nawpus.pdf.
［7］中国互联网协会反垃圾邮件中心. 2009年第一季度中国反垃圾邮件状况调查报告［EB/OL］. ［20091107］. http://www.antispam.cn/pdf/2009_01_mail_survey.pdf. 
［8］中国互联网协会反垃圾邮件中心. 2008年第四季度中国反垃圾邮件状况调查报告［EB/OL］. ［20091107］. http://www.antispam.cn/pdf/2008_4_dc.pdf. 
［9］Wikipedia. KullbackLeibler divergence［EB/OL］. ［20091107］. http://en.wikipedia.org/wiki/Information_gain.
［10］KOPRINSKA I, POON J, CLARK J, et al. Learning to classify email［J］. Information Sciences, 2007, 177: 21672187.
［11］YANG Y M, PEDERSEN J O. A comparative study on feature selection in text categorization［C］//Proceedings of International Conference on Machine Learning(ICML’97). San Francisco, USA: Morgan Kaufmann Publishers Inc., 1997: 412420.
［12］GUZELLA T S, CAMINHAS M. A review of machine learning approaches to spam filtering［J］. Expert Systems with Applications, 2009, 36: 1020610222.
［13］BLANZIERI E, BRYL A. A survey of learningbased techniques of email spam filtering［EB/OL］. ［20091107］. http://eprints.biblio.unitn.it/archive/00001070/.
［14］ANDROUTSOPOULOS I, PALIOURAS G, MICHELAKIS E. Learning to filter unsolicited commercial email, technique report No. 2004/2［R］. Agia Paraskevi, Greece: NCSR “Demokritos”, 2004.
［15］SCHNEIDER K M. A comparison of event models for naive Bayes antispam email filtering［C］//Proceedings of the 10th Conference of European Chapter of the Association for Computational Linguistics. Morristown, USA: Association for Computational Linguistics, 2003: 307314.
［16］YERAZUNIS W S. Sparse binary polynomial hashing and the CRM114 discriminator［EB/OL］. ［20091107］. http://fozzolog.fozzilinymoo.org/images/CRM114_slides.pdf.
［17］SIEFKES C, ASSIS F, CHHABRA S, et al. Combining winnow and orthogonal sparse bigrams for incremental spam filtering［C］//Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases. New York, USA： SpringerVerlag, 2004: 410421.
［18］ODA T, WHITE T. Developing an immunity to spam［J］. Lecture Notes in Computer Science, 2003, 2723: 231242.
［19］RUAN Guangchen, TAN Ying. A threelayer backpropagation neural network for spam detection using artificial immune concentration［J］. Soft Computing, 2010, 14: 139150.
［20］KRASSER S, TANG Y C, GOULD J, et al. Identifying image spam based on header and file properties using C4.5 decision trees and support vector machine learning［C］//Proceedings of IEEE SMC Information Assurance and Security Workshop. New York, USA, 2007: 255261.
［21］NHUNG N P, PHUONG T M. An efficient method for filtering image based spam［J］. Lecture Notes in Computer Science, 2007, 4673: 945953.
［22］YEH C Y, WU C H, DOONG S H. Effective spam classification based on metaheuristics［C］//Proceedings of 2005 IEEE International Conference on Systems, Man, and Cybernetics. Waikoloa, HI, USA, 2005: 38723877.
［23］TASI C H, WU C H. Learning typed behaviors of spam emails using backpropagation neural networks［D］. Kaohsiung, China: ShuTe University, 2004.
［24］WU C H, TSAI C H. A timerobust spam classifier based on backpropagation neural networks and behaviorbased features［C］//Proceedings of the Sixth International Conference on Machine Learning and Cybernetics. Hong Kong, 2007: 1922.
［25］COSTALES B, ALLMAN E. Sendmail［M］. 3rd ed. Sebastopol, USA: O’Reilly & Associates, Inc., 2002.
［26］LIU M, LI Y C, LI W. Spam filtering by stages［C］//Proceedings of 2007 International Conference on Convergence Information Technology. Washington, DC, USA: IEEE Computer Society, 2007: 22092213.
［27］YUE X, ABRAHAM A, CHI Z X, et al. Artificial immune system inspired behaviorbased antispam filter［J］. Soft Computing, 2007, 11: 729740.
［28］GUO Y H, ZHANG Y L, LIU J Y, et al. Research on the comprehensive antispam filter［C］//Proceedings of IEEE International Conference on Industrial Informatics. Singapore, 2006: 10691074.
［29］BHATTACHARYYA M, SCHULTZ M G, ESKIN E, et al. MET: an experimental system for malicious email tracking［C］//Proceedings of the 2002 New Security Paradigms Workshop. Virginia Beach, VA, USA, 2002: 310.
?［30］HERSHKOP S. Behaviorbased email analysis with application to spam detection［D］. New York, USA: Columbia University, 2006.
［31］MARTIN S, SEWANI A, NELSON B, et al. Analyzing behavioral features for email classification［C］//Proceedings of Conference on Email and Anti Spam. Stanford University, USA, 2005.
［32］STOLFO S J, HERSHKOP S, HU C W, et al. Behaviorbased modeling and its application to email analysis［J］. ACM Transactions on Internet Technology, 2006, 6(2): 187221.
［33］BRENDEL R, KRAWCZYK H. Detection methods of dynamic spammers’ behavior［C］//Proceedings of 2nd International Conference on Dependability of Computer Systems. Washington, DC, USA： IEEE Computer Society, 2007: 145152.
［34］RAMACHANDRAN A, FEAMSTER N. Understanding the networklevel behavior of spammers［C］//Proceedings of the 2006 Conference on Applications, Technologies, Architectures,  and Protocols for Computer Communications. New York, USA: ACM, 2006: 291302.
［35］陈建发，吴顺祥. 一种基于用户行为分析的协同反垃圾邮件策略［J］. 电脑知识与技术: 学术交流, 2007(7): 3637.
CHEN Jianfa, WU Shunxiang. An cooperate antispam strategy based on user’s behavioral analysis［J］. Computer Knowledge and Technology: Academic Exchange, 2007(7): 3637.
［36］SPAM LAWS. The CANSPAM Act of 2003 ［EB/OL］. ［20091107］. http://www.spamlaws.com/federal/index.shtml.
［37］GRIMES G A. Compliance with CANSPAM Act of 2003［J］. Communications of the ACM, 2007, 50: 5562.
［38］Rundfunk and Telekom RegulierungsGmbH. Telekommunikationsgesetz 2003(TKG 2003)［EB/OL］. ［20091107］. http://www.rtr.at/de/tk/TKG2003#p107.
［39］HOANCA B. How good are our weapons in the spam wars?［J］. IEEE Technology and Society Magazine, 2006, 25(1): 2230.
［40］HARRIS E. The next step in the spam control war: greylisting［EB/OL］. ［20091107］. http://projects.puremagic.com/greylisting/whitepaper.html.
［41］LODER T, ALSTYNE M V, WASH R. An economic answer to unsolicited communication［C］//Proceedings of the 5th ACM Conference on Electronic Commerce. New York, USA: ACM, 2004: 4050.
［42］SAHAMI M, DUMAIS S, HECKERMAN D, et al. A Bayesian approach 〖KG*1/2〗 tofiltering 〖KG*1/2〗 junk〖KG*1/2〗email［C］//Procee dings of the 1998 Workshop on Learning for Text Categorization. Madison, USA, 1998: 5562.
［43］ANDROUTSOPOULOS I, KOUTSIAS J, CHANDRINOS K V, et al. An evaluation of naive Bayesian antispam filtering［C］//Proceedings of the Workshop on Machine Learning in the New Information Age. Barcelona, Spain, 2000: 917.
［44］SHRESTHA R, LIN Y P. Improved Bayesian spam filtering based on coweighted multiarea information［J］. Lecture Notes in Computer Science, 2005, 3518: 650660.
［45］LI Yang, FANG Binxing, GUO Li, et al. Research of a novel antispam technique based on users’ feedback and improved naive Bayesian approach［C］//Proceedings of the International Conference on Networking and Services. Washington, DC, USA： IEEE Computer Society, 2006: 86. 
［46］SAKKIS G, ANDROUTSOPOULOS I, PALIOURAS G, et al. A memorybased approach to antispam filtering for mailing lists［J］. Information Retrieval, 2003, 6(1): 4973.
［47］SCHAPIRE R E, SINGER Y. BoosTexter: a boostingbased system for text categorization［J］. Machine Learning, 2000, 39(2): 135168.
［48］CARRERAS X, MARQUEZ L. Boosting trees for antispam email filtering［C］//Proceedings of 4th International Conference on Recent Advances in Natural Language Processing. Tzigov Chark, Bulgaria， 2001: 5864.
［49］NICHOLAS T. Using AdaBoost and decision stumps to identify spam email［EB/OL］. ［20091107］. http://nlp.stanford.edu/courses/cs224n/2003/fp/tyronen/ report.pdf.
［50］VAPNIK V N. Estimation of dependencies based on empirical data［M］. New York: SpringerVerlag, 1982.［51］VAPNIK V N. The nature of statistical learning theory［M］. 2nd ed. New York: SpringerVerlag, 2000.
［52］DRUCKER H, BURGES C J C, KAUFFMAN L, et al. Support vector regression machines［C］//Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 1997: 155161.
［53］DRUCKER H, WU D, VAPNIK V N. Support vector machines for spam categorization［J］. IEEE Transactions on Neural Networks, 1999, 10(5): 10481054.
［54］COHEN W W. Fast effective rule induction［C］//Procee dings of 12th International Conference on Machine Learning. San Mateo, USA: Morgan Kaufmann, 1995: 115123.
［55］SCHAPIRE R E, SINGER Y, SINGHAL A. Boosting and Rocchio applied to text filtering［C］//Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 1998: 215223.
［56］JOACHIMS T. A probabilistic〖KG*1/2〗 analysis〖KG*1/2〗 of〖KG*1/2〗 the 〖KG*1/2〗Rocchio 〖KG*1/2〗algorithm 〖KG*1/2〗with TFIDF for text categorization［C］//Procee dings of 14th International Conference on Machine Learning. San Francisco, USA: Morgan Kaufman Publishers Inc., 1997: 143151.
［57］SASAKI M, SHINNOU H. Spam detection using text clustering［C］//Proceedings of International Conference on Cyberworlds. Washington, DC, USA: IEEE Computer Society, 2005: 316319.
［58］DHILLON I S, MODHA D S. Concept decompositions for large sparse text data using clustering［J］. Machine Learning, 2001, 42(1/2): 143175.
［59］CLARK J, KOPRINSKA I, POON J. A neural network based approach to automated email classification［C］//Proceedings of IEEE/WIC International Conference on Web Intelligence. Washington, DC, USA: IEEE Computer Society, 2003: 702.
［60］STUART I, CHA S H, TAPPERT C. A neural network classifier for junk email［J］. Lecture Notes in Computer Science, 2004, 3163: 442450.
［61］SECKER A, FREITAS A A, TIMMIS J. AISEC: an artificial immune system for email 〖KG*1/3〗classification［C］//Procee dings of the Congress on Evolutionary Computation. Canberra， Australia, 2003: 131139.
［62］ODA T, WHITE T. Spam detection using an artificial immune system［EB/OL］. ［20091109］. http://terri.zone12.com/doc/academic/crossroads/.
［63］MEDLOCK B. An adaptive， semistructured language model approach to spam filtering on a new corpus［C］//Proceedings of 3rd Conference on Email and Antispam. Mountain View, USA, 2006.
［64］MEDLOCK B. GenSpam ［EB/OL］. ［20091109］. http://www.benmedlock.co.uk/genspam.html.
［65］ZHANG L, ZHU J, YAO T. An evaluation of statistical spam filtering techniques［J］. ACM Transactions on Asian Language Information Processing, 2004, 3(4): 243269.
［66］ZHANG L, ZHU J, YAO T. Index of /lzhang10/spam［EB/OL］. ［20091109］. http://homepages.inf.ed.ac.uk/lzhang10/spam/.

相似文献/References:: [1]黄剑华,唐降龙,刘家锋,等.一种基于Homogeneity的文本检测新方法[J].智能系统学报,2007,2(1):69.
　HUANG Jian-hua,TANG Xiang-long,LIU Jia-feng,et al.A new method for text detection based on Homogeneity[J].CAAI Transactions on Intelligent Systems,2007,2():69.
[2]王斐,张育中,宁廷会,等.脑-机接口研究进展[J].智能系统学报,2011,6(3):189.
　WANG Fei,ZHANG Yuzhong,NING Tinghui,et al.Research progress in a braincomputer interface[J].CAAI Transactions on Intelligent Systems,2011,6():189.
[3]刘琚,孙建德.独立分量分析的图像/视频分析与应用[J].智能系统学报,2011,6(6):495.
　LIU Ju,SUN Jiande.Independent component analysisbased image/video analysis and applications[J].CAAI Transactions on Intelligent Systems,2011,6():495.
[4]谭营,王军.手指静脉身份识别技术最新进展[J].智能系统学报,2011,6(6):471.
　TAN Ying,WANG Jun.Recent advances in finger vein based biometric techniques[J].CAAI Transactions on Intelligent Systems,2011,6():471.
[5]吴家伟,严京旗,方志宏,等.基于图像显著性特征的铸坯表面缺陷检测[J].智能系统学报,2012,7(1):75.
　WU Jiawei,YAN Jingqi,FANG Zhihong,et al.Defect detection on a steel slab surface based on the characteristics of an image’s saliency region[J].CAAI Transactions on Intelligent Systems,2012,7():75.
[6]张毅,罗明伟,罗元.脑电信号的小波变换和样本熵特征提取方法[J].智能系统学报,2012,7(4):339.
　ZHANG Yi,LUO Mingwei,LUO Yuan.EEG feature extraction method based on wavelet transform and sample entropy[J].CAAI Transactions on Intelligent Systems,2012,7():339.
[7]刘忠宝,王士同.从Parzen窗核密度估计到特征提取方法：新的研究视角[J].智能系统学报,2012,7(6):471.
　LIU Zhongbao,WANG Shitong.From Parzen window estimation to feature extraction: a new perspective[J].CAAI Transactions on Intelligent Systems,2012,7():471.
[8]孙倩茹,王文敏,刘宏.视频序列的人体运动描述方法综述[J].智能系统学报,2013,8(3):189.
　SUN Qianru,WANG Wenmin,LIU Hong.Study of human action representation in video sequences[J].CAAI Transactions on Intelligent Systems,2013,8():189.
[9]许可乐,唐涛,蒋咏梅.一种SAR图像稳健特征点提取方法[J].智能系统学报,2013,8(4):287.[doi:10.3969/j.issn.1673-4785.201304038]
　XU Kele,TANG Tao,JIANG Yongmei.A stable feature point extraction approach for SAR image registration[J].CAAI Transactions on Intelligent Systems,2013,8():287.[doi:10.3969/j.issn.1673-4785.201304038]
[10]陈阳,董肖莉,李卫军,等.基于仿生形象思维方法的图像检索算法的改进[J].智能系统学报,2015,10(2):209.[doi:10.3969/j.issn.1673-4785.201411022]
　CHEN Yang,DONG Xiaoli,LI Weijun,et al.Improvement of an image retrieval algorithm based on biomimetic imaginal thinking[J].CAAI Transactions on Intelligent Systems,2015,10():209.[doi:10.3969/j.issn.1673-4785.201411022]

备注/Memo

收稿日期：2009-11-20.
基金项目：国家“863”计划资助项目（2007AA01Z453）；国家自然科学基金资助项目（60673020，60875080）.
通信作者：谭营.E-mail: ytan@pku.edu.cn.
作者简介：

谭营,男,1964年生，教授、博士生导师、博士，IEEE Senior Member. IJSIR副编辑，IES Journal B, Intelligent Devices and Systems副编辑，Journal of Computer Science and Systems Biology副编辑, International Journal of KES编委，Springer和多个重要国际期刊的专刊的编辑，ICSI2010大会主席，ISNN2008程序委员会主席.主要研究方向为计算智能、群体智能、智能信息处理、计算机安全、数据挖掘与模式识别等.负责国家“863”计划、国家自然基金等科研项目30余项.获得2009年度国家自然科学奖二等奖.发表学术论文200余篇.

朱元春，男，1985年生，博士研究生，主要研究方向为群体智能、人工免疫系统、智能信息处理算法、计算机安全、模式识别等.

更新日期/Last Update: 2010-07-14

反垃圾电子邮件方法研究进展 PDF下载HTML

备注/Memo

反垃圾电子邮件方法研究进展

PDF下载 HTML