论文部分内容阅读
传统的Web资源自动发现是基于Web页面内容实现的。本文试图从超链分析的角度探讨Web资源的自动发现技术。超链分析技术起源于社会网络分析和科学引文分析理论,它只分析页面之间的关系,而不关心页面本身的属性。通过试验证明,单纯使用超链,根据用户提供的网页实例,我们能够自动发现与学科资源相关的网站。该技术可以有效的减少网络爬行器的无谓爬行,提高采集效率,减轻网络负担,在学科资源建设中起了重要的作用。
Traditional automatic discovery of Web resources is based on Web page content. This paper attempts to explore the automatic discovery of Web resources from the perspective of hyperlink analysis. Hyperlink analysis originated in social network analysis and scientific citation analysis theory, it only analyzes the relationship between pages, and does not care about the properties of the page itself. Experiments show that simply by using hyperlinks, we can automatically discover sites related to academic resources based on user-provided examples of web pages. The technology can effectively reduce the unnecessary crawling of crawler, improve the collection efficiency and reduce the network burden, which plays an important role in the construction of disciplinary resources.