论文部分内容阅读
当前文本分类方法由于没有考虑页面之间相互关系,因而分类效率有限。鉴于此,受万有引力定律启发,提出基于万有引力定律和PageRank的页面分类方法。该方法的基本思路是通过分析页面之间的链接关系,将类属未知的页面划分到对其影响较大的一类。以上述分类方法为基础,构建了基于万有引力定律和PageRank的页面分类系统。该系统包括页面预处理、页面向量表示、页面分类以及分类结果评价等模块。真实数据集上的比较实验表明所提方法的有效性。
The current text classification method does not consider the relationship between the pages, so the classification efficiency is limited. In view of this, inspired by the law of universal gravitation, proposed page classification based on the law of gravity and PageRank. The basic idea of this method is to classify unknown pages into a class that has a great influence on them by analyzing the links between pages. Based on the above classification method, a page classification system based on the law of gravity and PageRank is constructed. The system includes page preprocessing, page vector representation, page classification and classification results evaluation module. Comparison experiments on real data sets show the effectiveness of the proposed method.