论文部分内容阅读
针对个人微博聚类时缺乏考虑文本语义特征的问题,提出一种结合语义特征的个人微博聚类方法.该方法充分考虑了微博文本的语义特征,可将意义相关的微博更为准确地聚类.其要点如下:首先,利用随机游走算法产生每个词汇的语义标签及其概率,游走图基于知网的语义关系图产生;其次,利用排列算法将两篇微博中词汇的各个语义项进行相似度求解,得到意思集合;最后,利用余弦相似度计算两条微博的语义相关度,并将大于相似度阈值的聚在一起.为了提高算法效能,在计算微博的相似度时进行了分段和优化.实验表明,利用语义特征得到的聚类结果,F-度量值较利用词共现和word2vec聚类方法有明显地提高.
Aiming at the lack of consideration of semantic features of texts in personal Weibo clustering, this paper proposes a personal Weibo clustering method combining semantic features.This method fully considers the semantic features of Weibo texts and makes the relevance-related Weibo more The main points are as follows: Firstly, the random walk algorithm is used to generate the semantic labels of each vocabularies and their probabilities. The walk graph is generated based on the semantic relation diagram of the nets. Secondly, Finally, using the cosine similarity to calculate the semantic relevancy of the two Weibo together, we can get together more than the similarity threshold.In order to improve the performance of the algorithm, we calculate the Weibo The results of segmentation and optimization show that the F-metrics obtained by using semantic features are significantly improved compared with word co-occurrence and word2vec clustering.