论文部分内容阅读
[目的 /意义]跨领域关联实体一直是实体解析研究的主题,本文旨在不同的社交媒体(跨社交媒体)中找到属于同一用户的账户。[方法/过程]在传统近似字符串匹配技术的基础上,提出使用属性值结合社交媒体中的链接和文本内容的方法,比较两个不同社交媒体账户的属性相似度、邻域相似度和关键词相似度这三个匹配函数,以此提高识别这两个账户是否是同一个人的精确度。并利用社交媒体Facebook和Twitter数据作为实验数据集,针对匹配函数的不同组合进行试验。[结果 /结论]结果表明,三个匹配函数的组合能够得到更多的账户匹配为同一用户,同时精确度也很高,达到0.923。本文提出的方法在Facebook和Twitter上的成功运用,给其他社交媒体平台或者其他领域的实体关联的研究提供了一条新的路径。
[Purpose / Significance] Cross-domain related entities have been the subject of substantive analytic research. This article aims to find accounts belonging to the same user in different social media (social media). [Methods / Processes] Based on the traditional approximate string matching technique, a method of combining attribute values with links and text content in social media is proposed to compare attribute similarity, neighborhood similarity and key of two different social media accounts Word similarity These three matching functions, in order to improve the identification of these two accounts are the same person’s accuracy. Using social media Facebook and Twitter as experimental datasets, different combinations of matching functions were tested. [Results / Conclusions] The results show that the combination of the three matching functions can get more accounts to match for the same user, and the accuracy is also high, reaching 0.923. The successful application of the proposed method on Facebook and Twitter provides a new path to the research of other social media platforms or other related entities.