A Text Similarity Measurement Based on Semantic Fingerprint of Characteristic Phrases

来源 :电子学报（英文） | 被引量 : 0次 | 上传用户：countrygary

【摘要】

：

Text similarity measurements are the basis for measuring the degree of matching between two or more texts.Traditional large-scale similarity detection methods b

【作者】

：

PANG Shanchen YAO Jiamin LIU Ting ZHAO Hua CHEN Hongqi

【机构】

：

College of Computer and Communication Engineering,China University of Petroleum,Qingdao 266580,China

【出处】

：

电子学报（英文）

【发表日期】

：

2020年2期

【关键词】

：

Term frequency-Inverse document frequency(TF-IDF) model Semantic fingerprint Sim

下载到本地 , 更方便阅读

下载此文赞助VIP

声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架

论文部分内容阅读

Text similarity measurements are the basis for measuring the degree of matching between two or more texts.Traditional large-scale similarity detection methods based on a digital fingerprint have the advantage of high detection speed,which are only suitable for accurate detection.We propose a method of Chinese text similarity measurement based on feature phrase semantics.Natural language processing (NLP) technology is used to pre-process text and extract the keywords by the Term frequency-Inverse document frequency (TF-IDF) model and further screen out the feature words.We get the exact meaning of a word and semantic similarities between words and a HowNet semantic dictionary.We substitute concepts to get the feature phrases and generate a semantic fingerprint and calculate similarity.The experimental results indicate that the method proposed is superior in similarity detection in terms of its accuracy rate,recall rate,and F-value to the traditional and digital fingerprinting method.

其他文献

The GAC Property of a Class of 1-Resilient Functions with High Nonlinearity

The absolute and sum-of-squares in-dicators are used to evaluate the Global avalanche characteristics (GAC) of Boolean functions in a global manner.The GAC prop

期刊

Boolean functionsGlobal avalanche characteristicsResiliencyNonlinearityStrea

A Multi-queue Approach of Energy Efficient Task Scheduling for Sensor Hubs

With the emergence of Intet of things (IoT),sensor hubs,which integrate data from different sensors,play increasingly important role.Energy efficiency is one of

期刊

Internet of things (IoT)Sensor hubEnergy efficiencyTask schedulingQueueing m

Review on the Technological Development and Application of UAV Systems

Unmanned aerial vehicles (UAVs) are products of deep integration of aviation technology and Information technology (IT).The core factor of why the UAV industry

期刊

Unmanned aerial vehicle (UAV)Information technology (IT)Information networkDa

Verifying Diagnosability of Discrete Event System with Logical Formula

Diagnosability is an important property in the field of fault diagnosis.In this paper,a novel approach based on logical formula is proposed to verify diagnosabi

期刊

Fault diagnosisDiagnosabilityDiscrete event systemFinite state machineConjun

Optimal Weighting Method for Reducing Digital Satellite TV Differential Timing Error

We illustrate the principle of Digital satellite TV differential timing (DSTVDT) and propose an optimal weighting method that reduces the timing error introduce

期刊

Digital satellite TV differential timing (DSTVDT)Satellite ephemeris errorTimi

The FMEDA Based DC Calculation for Railway Safety Computer

期刊

Optimized Fault Detection Algorithm Aided by BDS Baseband Signal for Train Positioning

期刊

Research on Electromagnetic Compatibility of Chinese High Speed Railway System

期刊

A 16-bit Hybrid ADC with Circular-Adder-Based Counting for 15μm Pitch 640×512 LWIR FPAs

A hybrid Analog to digital converter (ADC) is presented for long-wave infrared focal plane arrays.A two-stage quantization structure is applied in the folding i

期刊

Infrared focal plane arrays (IRFPAs)Readout integrated circuitHybrid ADC3T me

A Novel Fast Dimension-Reducing Ranked Query Method with High Security for Encrypted Cloud Data

We propose a novel Fast dimensionreduc-ing ranked query method (FDRQM) with high security for encrypted cloud data.We use Principal component analysis (PCA) alg

期刊

PCAEncryption searchDimension reduction matrixSecurity searchFDRQM

A Text Similarity Measurement Based on Semantic Fingerprint of Characteristic Phrases

其他学术论文