论文部分内容阅读
针对大规模数据集减法聚类时间复杂度高的问题,提出一种基于Nystrm密度值逼近的减法聚类方法。特别适用于大规模数据集的减法聚类问题,可极大程度降低减法聚类的时间复杂度。基于Nystrm逼近理论,结合经典减法聚类样本密度值计算的特点,巧妙地将Nystrm理论用于减法聚类未采样样本之间密度权值矩阵的逼近,从而实现了对所有样本的密度值逼近,最后沿用经典减法聚类修正样本密度值的方法,实现整个减法聚类过程。将本文算法在人工数据、标准彩色图像及UCI数据集上进行了实验,详细说明了本文算法利用少数采样样本逼近多数未采样样本密度权值、密度值以及进行减法聚类的详细过程,并给出了聚类准确率、耗时及算法性能加速比。实验结果表明,与经典的减法聚类相比,本文算法在不影响聚类结果的情况下,对于较大规模数据集,可显著降低减法聚类的时间复杂度,极大程度地提高减法聚类的实时性能。
Aiming at the problem of time complexity of large-scale dataset subtraction clustering, a subtractive clustering method based on Nystrm density value approximation is proposed. It is especially suitable for the subtraction clustering problem of large-scale data sets, which can greatly reduce the time complexity of subtractive clustering. Based on the Nystrm approximation theory, combined with the classical subtraction clustering sample density value calculation characteristics, skillfully apply Nystrm theory to the approximation of density weight matrix between subtractive clustering unsampled samples, The density value is approximated, and finally the method of classical subtraction clustering to correct the sample density value is adopted to realize the whole subtraction clustering process. Experiments on artificial data, standard color images and UCI datasets are carried out in this paper. The detailed algorithm of this algorithm is described in detail. This method uses a few samples to approximate the density weight, density value and subtractive clustering of most unsampled samples, Out of the clustering accuracy, time-consuming and algorithmic speedup. Experimental results show that, compared with classical subtractive clustering, the proposed algorithm can significantly reduce the time complexity of subtractive clustering and greatly improve the performance of subtractive clustering Real-time class performance.