论文部分内容阅读
介绍基于内容评价的、基于链接结构评价的和基于巩固学习的三种采集算法的优缺点;介绍一种依据词典构建主题Ontology的方法,该方法有助于提高Ontology的构建速度;最后,在分析传统采集算法的基础上,提出一种新的基于Ontology的面向主题的网页采集算法,并通过试验证明其优越性。
This paper introduces the advantages and disadvantages of three kinds of acquisition algorithms based on content evaluation, based on link structure evaluation and consolidation learning, and introduces a method of constructing thematic Ontology based on dictionary, which can help to improve the speed of Ontology construction. Finally, Based on the traditional acquisition algorithm, this paper proposes a new topic-oriented webpage collection algorithm based on Ontology, and proves its superiority through experiments.