论文部分内容阅读
随着手机短消息服务的发展,垃圾短信呈爆炸性的增长趋势,给人们的日常生活带来了不便,也给通信运营商带来了损失.本文利用短信用户之间的联系构建短信社会网络,分析用户在短信社会网络中的关系及其行为模式,从中抽取出具有类别区分度的社会网络特征,提出了一种新的离线垃圾短信过滤模型.模型针对海量短信数据处理时存在的时空效率的瓶颈问题,提出了线性特征统计算法LFSA,并结合高斯核密度估计和贝叶斯分类器,对短信进行分类过滤.我们使用某省通信运营商提供的30亿条短信数据对模型进行测试,做实验分析.实验结果表明,我们提出的模型能满足运营商的各项性能指标要求,并已部署和应用.
With the development of mobile short message service, spam messages have an explosive growth trend, which brings inconvenience to people’s daily life and brings loss to communication operators.This paper builds SMS social network by using the connection between SMS users, This paper analyzes the relationship between users and their behavior patterns in SMS social networks and extracts the characteristics of social networks with category discrimination and puts forward a new model of offline spam filtering.This model aims at the spatial and temporal efficiency of massive SMS data processing Bottleneck problem, this paper proposes a linear feature statistical algorithm LFSA, combined with Gaussian kernel density estimation and Bayesian classifier, to classify and filter the messages.We use the 3 billion SMS data provided by a provincial communications operator to test the model and do Experimental Analysis The experimental results show that the proposed model meets the performance requirements of operators and has been deployed and applied.