论文部分内容阅读
【目的】为准确识别研究内容相似但使用不同关键词的作者关系,解决传统共现分析方法缺乏语义关联的问题,提出一种基于关键词语义网络构建的作者研究兴趣相似性度量方法。【方法】通过引入word2vec模型对作者关键词进行词向量表示,将关键词表示成语义级别的低维实值分布;计算关键词之间的语义相关度并构造关键词语义网络,采用JS距离对构建的作者研究兴趣矩阵进行相似性度量。【结果】该方法能计算出共现及非共现词对的相关性,有效地挖掘出作者之间的潜在合作关系。【局限】训练语料的数量和准确性有待进一步提高,提出的度量方法仅考虑两个作者之间的潜在合作关系。【结论】研究结果对改进基于传统的共现分析方法度量作者合作关系具有重要的参考价值。
【Objective】 To solve the problem of lacking of semantic association in traditional co-occurrence analysis methods, this paper proposes a new approach to measure the similarity of authors’ research interests based on the construction of key words and semantic networks in order to accurately identify authors with similar research content but use different keywords. 【Method】 By introducing the word2vec model, the word vectors of author keywords are expressed, and the keywords are expressed as low-dimensional real-valued distributions of semantic level. The semantic relevancy between keywords is calculated and the keyword semantic network is constructed. JS distance is used The authors construct a research interest matrix for similarity measures. 【Result】 The method can calculate the correlation between co-occurrence and non-co-occurrence word pairs and effectively find the potential cooperation between authors. [Limitations] The quantity and accuracy of training corpus needs to be further improved. The proposed measurement method considers only the potential cooperation between the two authors. [Conclusion] The results of the study have important reference value for improving the author-cooperative relationship based on the traditional co-occurrence analysis method.