论文部分内容阅读
阐述网络信息提取系统的研究现状。从信息提取技术和自动化程度两方面对现有网络信息提取系统进行对比,由此将网络信息提取系统分为非自动化、半自动化和全自动化三类。综合考虑标记方法、提取规则类型和特征、学习算法、用户参与度、适用性以及输出接口等因素,对三类系统的性能优劣进行评估。最后对网络信息提取系统进一步的研究工作进行了展望。
The research status of network information extraction system is expounded. This paper compares the existing network information extraction system from the information extraction technology and the degree of automation, and divides the network information extraction system into three categories: non-automation, semi-automation and fully automation. Considering the marking method, the types and characteristics of the extraction rules, learning algorithms, user participation, applicability and output interface, the performance of the three systems is evaluated. Finally, the prospect of further research on network information extraction system is prospected.