论文部分内容阅读
在分析研究现有开源信息采集系统的基础上,综合应用开源框架,以开源爬虫Crawler4j为基础,设计开发基于开源框架的分布式定向资源采集系统,实现对网络信息实时精确的采集,以满足网络监测系统的及时性和准确性的要求。主要介绍系统的结构设计和功能实现,并详细阐述精确采集的方法和技术路线。
On the basis of analyzing and studying existing open source information acquisition system, based on the comprehensive application of open source framework and open source crawler Crawler4j, we design and develop distributed directional resource acquisition system based on open source framework to realize real-time accurate collection of network information to meet the demand of network Timeliness and accuracy of monitoring system requirements. Mainly introduces the structural design and function realization of the system, and elaborates on the precise acquisition method and technical route.