论文部分内容阅读
针对客户端垃圾邮件过滤器难以获取足够训练样本的问题,提出一种基于小样本学习的垃圾邮件过滤方法,利用容易获取的未标记样本提高垃圾邮件过滤的性能。该方法使用已标记的小样本邮件实例集训练一个初始Na?veBayes分类器,以此标注未标记邮件,再使用所有数据训练新的分类器,利用EM算法进行迭代直至收敛。实验结果证明,当给定5个~20个已标记小样本训练邮件时,该方法可有效提高垃圾邮件过滤性能。
Aiming at the problem that it is difficult for client-side spam filters to obtain enough training samples, a small sample-based spam filtering method is proposed to improve the performance of spam filtering by using unlabeled samples that are easily available. This method uses a labeled small sample mail set to train an initial Na? Ve Bayes classifier to label unmarked mails, then uses all the data to train a new classifier and uses the EM algorithm to iterate until convergence. The experimental results show that this method can effectively improve the spam filtering performance when 5 ~ 20 labeled small samples are trained to train mail.