论文部分内容阅读
Text similarity measurements are the basis for measuring the degree of matching between two or more texts.Traditional large-scale similarity detection methods based on a digital fingerprint have the advantage of high detection speed,which are only suitable for accurate detection.We propose a method of Chinese text similarity measurement based on feature phrase semantics.Natural language processing (NLP) technology is used to pre-process text and extract the keywords by the Term frequency-Inverse document frequency (TF-IDF) model and further screen out the feature words.We get the exact meaning of a word and semantic similarities between words and a HowNet semantic dictionary.We substitute concepts to get the feature phrases and generate a semantic fingerprint and calculate similarity.The experimental results indicate that the method proposed is superior in similarity detection in terms of its accuracy rate,recall rate,and F-value to the traditional and digital fingerprinting method.