论文部分内容阅读
基于规则的垃圾邮件过滤技术是目前常用的垃圾邮件过滤方法之一。基于规则的垃圾邮件过滤方法是通过训练样本,归纳总结出其中规律性的内容来得到显式规则,从而实现垃圾邮件分类的目的。为了对基于规则算法的邮件过滤方法进行比较,本文简要分析了Ripper、C4.5决策树和Adaboost这三种基于规则的常用算法,并利用开源的数据挖掘平台WEKA对三种算法的垃圾邮件过滤方法进行实验比较。实验结果表明:Ripper、C4.5决策树和Adaboost这三种算法都获得了80%以上的查准率和查全率,但相比较而言,Adaboost算法的查准率和查全率结果较好,获得了90%以上的查准率和查全率结果。
Rule-based spam filtering technology is currently one of the spam filtering methods. The rule-based spam filtering method is to achieve the purpose of spam classification by training samples, summarizing the regular content to get explicit rules. In order to compare the mail filtering methods based on the rule-based algorithm, this paper briefly analyzes the common rules-based algorithms Ripper, C4.5 decision tree and Adaboost, and uses the open source data mining platform WEKA to filter the three algorithms of spam Methods for experimental comparison. The experimental results show that the Ripper, C4.5 decision tree and Adaboost all get more than 80% accuracy and recall, but compared with the Adaboost algorithm, Well, get more than 90% accuracy and recall results.