News Keyword Extraction Algorithm Based on Semantic Clustering and Word Graph Model

来源 :清华大学学报自然科学版(英文版) | 被引量 : 0次 | 上传用户:wreck2
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
The internet is an abundant source of news every day.Thus,efficient algorithms to extract keywords from the text are important to obtain information quickly.However,the precision and recall of mature keyword extraction algorithms need improvement.TextRank,which is derived from the PageRank algorithm,uses word graphs to spread the weight of words.The keyword weight propagation in TextRank focuses only on word frequency.To improve the performance of the algorithm,we propose Semantic Clustering TextRank (SCTR),a semantic clustering news keyword extraction algorithm based on TextRank.Firstly,the word vectors generated by the Bidirectional Encoder Representation from Transformers (BERT) model are used to perform k-means clustering to represent semantic clustering.Then,the clustering results are used to construct a TextRank weight transfer probability matrix.Finally,iterative calculation of word graphs and extraction of keywords are performed.The test target of this experiment is a Chinese news library.The results of the experiment conducted on this text set show that the SCTR algorithm has greater precision,recall,and F1 value than the traditional TextRank and Term Frequency-Inverse Document Frequency (TF-IDF) algorithms.
其他文献
Deep Neural Networks (DNNs) are demonstrated to be vulnerable to adversarial examples,which are elaborately crafted to fool learning models.Since the accuracy and robustness of DNNs are at odds for the adversarial training method,the adversarial example d
The rational fabrication of highly efficient electrocatalysts with low cost toward oxygen evolution reaction (OER) is greatly desired but remains a formidable challenge.In this work,we present a facile and straightforward method of incorporating NiCo-laye
In this paper,we present the design and implementation of an avatar-based interactive system that facilitates rehabilitation for people who have received total knee replacement surgeries.The system empowers patients to carry out exercises prescribed by a
Interwell connectivity,an important element in reservoir characterization,especially for water flooding,is used to make decisions for better oil production.The existing methods in literature directly use related data of wells to infer interwell connectivi
传统故障辨识方法受机械液压传动系统故障影响,存在故障辨识率低、有效性差问题,提出基于随机矩阵谱分析的机械液压传动系统故障辨识方法.通过分解故障振动信号,得到故障信号的特征向量函数,利用线性分析提取故障信号的随机变量;根据故障信号求解,提取机械液压传动系统故障特征;利用随机矩阵谱分析方法描述机械液压传动系统的状态空间,推算机械液压传动系统的状态方程;利用机械液压传动系统的残差阈值,检测到机械液压传动系统故障;通过对比机械液压传动系统故障的贴近度,选取最大值作为机械液压传感系统的故障信息,实现机械液压传动系统
Industrial Control Systems (ICSs) are the lifeline of a country.Therefore,the anomaly detection of ICS traffic is an important endeavor.This paper proposes a model based on a deep residual Convolution Neural Network(CNN) to prevent gradient explosion or g
本文采用基于不可逆热力学原理的韧性金属复杂应力损伤演化模型,对ABAQUS二次开发将损伤模型引入有限元计算,实施含腐蚀传动装置磨粒磨损分析;磨粒-传动轴-配合面耦合模型中,磨粒采用三维球形分形模型,对点蚀和面蚀配合面与磨粒微观接触时损伤演化及传动特性变化进行模拟,发现磨粒挤入配合面后应力三轴度增大,传动轴转动数周后接触面即出现凸峰与沟犁,而且缺陷间存在损伤连通趋势,使结构传动运动能力快速下降;考虑环境温度循环,温度升高使磨粒局部产生应力梯度,对结构传动能力也有不利影响.基于材料损伤演化模型建立细观磨粒磨损
With the rapid development of mobile devices,the use of Mobile Crowd Sensing (MCS) mode has become popular to complete more intelligent and complex sensing tasks.However,large-scale data collection may reduce the quality of sensed data.Thus,quality contro
Wireless edge caching has been proposed to reduce data traffic congestion in backhaul links,and it is being envisioned as one of the key components of next-generation wireless networks.This paper focuses on the influences of different caching strategies i
Deep learning frameworks promote the development of artificial intelligence and demonstrate considerable potential in numerous applications.However,the security issues of deep learning frameworks are among the main risks preventing the wide application of