A Text Similarity Measurement Based on Semantic Fingerprint of Characteristic Phrases

来源 :电子学报(英文) | 被引量 : 0次 | 上传用户:countrygary
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
Text similarity measurements are the basis for measuring the degree of matching between two or more texts.Traditional large-scale similarity detection methods based on a digital fingerprint have the advantage of high detection speed,which are only suitable for accurate detection.We propose a method of Chinese text similarity measurement based on feature phrase semantics.Natural language processing (NLP) technology is used to pre-process text and extract the keywords by the Term frequency-Inverse document frequency (TF-IDF) model and further screen out the feature words.We get the exact meaning of a word and semantic similarities between words and a HowNet semantic dictionary.We substitute concepts to get the feature phrases and generate a semantic fingerprint and calculate similarity.The experimental results indicate that the method proposed is superior in similarity detection in terms of its accuracy rate,recall rate,and F-value to the traditional and digital fingerprinting method.
其他文献
The absolute and sum-of-squares in-dicators are used to evaluate the Global avalanche characteristics (GAC) of Boolean functions in a global manner.The GAC prop
With the emergence of Intet of things (IoT),sensor hubs,which integrate data from different sensors,play increasingly important role.Energy efficiency is one of
Unmanned aerial vehicles (UAVs) are products of deep integration of aviation technology and Information technology (IT).The core factor of why the UAV industry
Diagnosability is an important property in the field of fault diagnosis.In this paper,a novel approach based on logical formula is proposed to verify diagnosabi
We illustrate the principle of Digital satellite TV differential timing (DSTVDT) and propose an optimal weighting method that reduces the timing error introduce
期刊
期刊
期刊
A hybrid Analog to digital converter (ADC) is presented for long-wave infrared focal plane arrays.A two-stage quantization structure is applied in the folding i
We propose a novel Fast dimensionreduc-ing ranked query method (FDRQM) with high security for encrypted cloud data.We use Principal component analysis (PCA) alg