Keyword Extraction Based on tf/idf for Chinese News Document

来源 :Wuhan University Journal of Natural Sciences | 被引量 : 0次 | 上传用户:vener123
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
Keyword extraction is an important research topic of information retrieval. This paper gave the specification of key- words in Chinese news documents based on analyzing linguistic characteristics of news documents and then proposed a new key- word extraction method based on tf/idf with multi-strategies. The approach selected candidate keywords of uni-, bi-and tri-grams, and then defines the features according to their morphological characters and context information. Moreover, the paper proposed several strategies to amend the incomplete words gotten from the word segmentation and found unknown potential keywords in news documents. Experimental results show that our proposed method can significantly outperform the baseline method. We also applied it to retrospective event detection. Experimental results show that the accuracy and efficiency of news retrospective event detection can be significantly improved. Keyword extract is an important research topic of information retrieval. This paper gave the specification of key-words in Chinese news documents based on analyzing linguistic characteristics of news documents and then proposed a new key-word extraction method based on tf / idf with multi- strategies. The approach selected candidate keywords of uni-, bi-and tri-grams, and then defining the features according to their morphological characters and context information. Moreover, the paper proposed several strategies to amend the incomplete words gotten from the word segmentation and found results in potential keywords in news documents. Experimental results show that we proposed method can significantly outperform the baseline method. We also applied it to retrospective event detection.
其他文献