论文部分内容阅读
【目的】弥补目前科研项目布局分析往往局限在单一资助机构简单数量统计上的不足,从内容上提高资助机构布局差异的揭示能力。【方法】针对多源项目数据,提出一种基于文本K-means++聚类算法的分析方法,尝试从科研项目内容揭示不同资助机构的资助方向和布局重点,分析比较各资助机构在研究方向上的资助差异。【结果】使用美国NSF与欧盟FP资助的项目信息进行方法验证与案例分析,发现相对于多个关键词,基于单个关键词构建的文本特征空间有更好的聚类效果。进一步去除项目申请书摘要中项目背景、未来影响等干扰信息,只保留研究内容、研究方法等实质性研究描述文本,K-means++算法的聚类效果有进一步提升。【局限】数据清洗尚不能完全自动实现,聚类参数的预设与调整也需人工参与。【结论】实验与案例证明该方法是可行的,分析结果能够比较直观地反映资助机构的布局差异,对科研管理与决策者审视宏观科研布局、前瞻科技发展方向起到一定的辅助作用。
[Objective] To make up for the shortcomings of the current research projects layout analysis in the simple quantitative statistics of single subvented organizations and to improve the disclosure ability of subvented organization layouts. 【Method】 According to the data of multi-source project, this paper proposes an analysis method based on the text K-means ++ clustering algorithm. It tries to reveal the funding direction and layout focus of different funding agencies from the content of the research projects, and analyzes and compares the research results of different funding agencies Funding differences. [Results] Using the project information of NSF and EU FP funded by the United States to verify the method and case analysis, it is found that the text feature space constructed based on a single keyword has a better clustering effect than the multiple keywords. To further remove the interference from project background and future impact in the project application abstract, only the substantive research description text such as research content and research method is retained, and the clustering effect of K-means ++ algorithm is further improved. [Limitations] Data cleaning can not be completely automatic yet. The preset and adjustment of clustering parameters also need to be manually involved. 【Conclusion】 Experiments and cases show that this method is feasible. The results of the analysis can reflect the distribution differences of the subvented organizations more intuitively and play a supplementary role in the scientific research management and decision-makers' scrutiny of the macro scientific research layout and the development of forward-looking science and technology.