论文部分内容阅读
本文以领域特征明显的词和短语作为聚类对象,在分类系统的大规模语料库中,利用文本分类的特征提取方法进行词语的领域聚类,从而获得大规模的领域知识,用于文本分类和主题分析。
In this paper, words and phrases with obvious domain characteristics are used as clustering objects. In the large-scale corpus of classification system, the method of feature extraction based on text categorization is used to cluster the words in terms of domain so as to obtain large-scale domain knowledge for text categorization and Theme analysis