论文部分内容阅读
从互联网上获取信息进行分析,已经成为人们进行决策的重要手段。有效地从海量数据中获取正确的目标信息是当前的重点和难点问题。通用搜索引擎检索的结果由于主题相关性不强,无法满足特定用户的需求。文章在改进SVM参数寻优算法的基础上,提出了结合关键词过滤算法和适用于大数据分类的支持向量机算法,并利用设计的财务管理相关主题信息分类算法,构建了财务管理相关主题爬虫系统。实验结果表明,基于关键词与改进支持向量机的财务管理主题相关爬虫能有效地采集目标信息,能够较好地适用于财务管理舆情管理和财务管理危机管理等相关领域。
Access to information from the Internet for analysis has become an important tool for people to make decisions. Effectively obtaining the correct target information from the massive data is the current key and difficult issue. General search engine search results due to the relevance of the theme is not strong, can not meet the needs of specific users. Based on the improved SVM parameter optimization algorithm, this paper proposes a combination of keyword filtering algorithm and support vector machine (SVM) algorithm which is suitable for big data classification. By using the designed topic information classification algorithm of financial management, this paper builds a series of related topics system. The experimental results show that the related crawler based on the keywords and the improved support vector machine can effectively collect the target information and can be well applied to the related fields such as the financial management public opinion management and the financial management crisis management.