论文部分内容阅读
航天领域相关文本正在飞速累积,随着人类探索太空的进程快速推进,相关文本的累积速度进一步加快。人工阅读整理此类文本显得效率低下,因此,针对航天文本研究信息抽取技术实现信息自动抽取变得十分有价值,而命名实体识别技术又是自动信息抽取的基础,建立一种航天命名实体识别的高效方法具有很重要的现实意义。本文针对航天领域命名实体识别这一特定问题,通过人工构建训练数据集、特征设计和CRF模型训练,建立了一种航天命名实体识别的高效方法。通过与基于字符串匹配的方法相比较,本文所提方法有着更高的准确率和召回率,验证了本文所提方法的有效性。
Relevant texts in the aerospace field are rapidly accumulating, and as the process of human exploration of space progresses rapidly, the related texts are further accumulated. Therefore, it is very valuable to extract the information automatically from the information extraction technology of aerospace texts, and named entity recognition technology is the basis of automatic information extraction. Therefore, Efficient method has very important practical significance. In this paper, aiming at the specific problem of naming entity recognition in aerospace field, an efficient method for identifying naming entities in aerospace is established by constructing training dataset, feature design and CRF model training. Compared with the method based on string matching, the proposed method has higher accuracy and recall, which verifies the effectiveness of the proposed method.