论文部分内容阅读
微博空间内充斥着大量广告信息,这些广告信息对舆情分析造成了极不利的影响.分析广告型微博特点,提出了一种广告型微博识别方法:在传统文本特征的基础上,引入“非活跃期微博数”、“微博重复度”、“特征词对权重”三类特征,并结合支持向量机模型对微博文本进行分类,识别广告微博发布者;分析广告微博发布者与普通用户的差异,提取广告微博发布者的“主题”特征,并面向用户对微博文本进行过滤,实现对广告型微博的识别.实验结果正确率为87.6%,召回率为97.2%,F值为91.6%,证明该方法能高效准确地识别广告型微博.
The microblogging space is filled with a large amount of advertising information, which has a very adverse effect on public opinion analysis.Analyzing the characteristics of advertising-based microblogging, this paper proposes a method of advertising microblogging recognition based on the traditional text features “The number of inactive microblogging ”, “microblogging repeatability ”, “feature word to weight ” three types of features, combined with support vector machine model of microblogging text classification, recognition of advertising microblogging release Analyze the differences between publishers and ordinary users of advertising microblog, extract the “subject ” characteristics of the publishers of advertising microblogs, and filter the weibo texts for the users to recognize the advertising microblogs. The experimental results are correct The rate was 87.6%, the recall rate was 97.2% and the F value was 91.6%. It proves that this method can identify the advertising microblog effectively and accurately.