论文部分内容阅读
【目的】解决科技查新领域检索词选择时的主观性强、手工工作量大、不规范、费时费力的问题。【应用背景】为了实现检索词抽取过程的自动化、智能化、规范化,本文提出利用科技查新过程检出的实时相关语料作为领域知识的来源,并对语料组成类型与关键词抽取效果之间的关系进行讨论。【方法】通过关键词抽取、领域特征扩展相结合的递进式迭代抽取方式实现科技查新领域检索词的智能抽取。【结果】通过与实际查新案例所采用的检索词对比,发现使用本方法两次迭代后抽取10个检索词,召回率达到80%。【结论】基于查新过程中检出文献构成的动态相关语料进行检索词的迭代抽取有助于快速、准确锁定绝大部分检索词,提高检索的效率和效果。
[Objective] To solve the problem of subjectivity, manual workload, non-standard, time-consuming and labor-intensive when choosing search terms in the field of scientific and technological research. [Background] In order to automate, intelligently and standardize the process of word retrieval, this paper proposes to use real-time related corpus detected by the technology search process as the source of domain knowledge and to analyze the relationship between corpus composition types and keyword extraction Relationship to discuss. 【Method】 By means of progressive iterative extraction combined with keyword extraction and domain feature expansion, the intelligent extraction of search terms in science and technology search area was realized. [Results] By comparing with the search terms used in the actual case search, we found that using this method to extract 10 terms after two iterations, the recall rate reached 80%. 【Conclusion】 The iterative extraction of the search terms based on the dynamic corpora formed by the detected documents during the search process helps to quickly and accurately lock up most of the search terms and improve the retrieval efficiency and effectiveness.