论文部分内容阅读
聚类分析是数据挖掘最常见的技术之一.数据的规模、维数和稀疏性都是制约聚类分析的不同方面.本文提出一种有效的高属性维稀疏数据聚类方法.给出稀疏相似度、等价关系的相似度、广义的等价关系的定义.基于对象间的稀疏相似度和等价关系原理形成初始等价类.通过等价关系的相似度修正初始等价关系.使得最终聚类结果更合理.该算法聚类过程不依赖于输入样本的排列顺序.高维稀疏数据的有效压缩提高算法在维数较高时的执行效率.适合于高维稀疏数据的聚类分析.
Clustering analysis is one of the most common data mining techniques.The size, dimensionality and sparsity of data all restrict different aspects of clustering analysis.This paper presents an efficient clustering method for high-attribute dimensionality sparse data, Similarity, the similarity of equivalence relations and the definition of generalized equivalence relations.The initial equivalence classes are formed based on the sparse similarity between objects and the principle of equivalence relations.Improved the initial equivalence relations through the similarity of equivalence relations, The final clustering result is more reasonable.The clustering process of this algorithm does not depend on the order of input samples.The efficient compression of high-dimensional sparse data improves the efficiency of the algorithm when the dimensionality is high.It is suitable for clustering analysis of high-dimensional sparse data .