论文部分内容阅读
随着网络信息资源的迅速增加,对于主题Web文本信息的搜索与分类日益成为信息处理领域的一个重要问题。本文建立了一个面向化工领域的Web文本搜索与分类系统,该系统在crawler子系统搜集Web文档的基础上,利用支持向量机对网页进行二次分类,找出化工专业中文网页;然后利用向量空间模型,对分类好的专业网页进行多子类分类。与综合搜索引擎相比,具有速度快、搜索信息准确度高和具备学习能力的特点。
With the rapid increase of network information resources, the search and classification of topic Web text information has become an important issue in the field of information processing. This paper establishes a web-based text search and classification system for the chemical industry. Based on the web crawler subsystem collecting web documents, the system uses support vector machines to classify the web pages twice to find out the chemical professional Chinese web pages. Then, Model, a good classification of professional web pages more sub-categories. Compared with the integrated search engine, with high speed, search information accuracy and possess the ability to learn.