论文部分内容阅读
The problem of scalable classification by clustering in large databases was discussed. Clustering based classification method first generates clusters using clustering algorithms . To classify new coming data points , it finds the k nearest clusters of the data point as neighbors , and assign each data point to the dominant class of these neighbors . Existing algorithms incorporated class information in making clustering decisions and produced pure clusters (each cluster associated with only one class) . We presented hybrid cluster based algorithms , which produce clusters by unsupervised clustering and allow each cluster associated with multiple classes . Experimental results show that hybrid cluster based algorithms outperform pure ones in both classification accuracy and training speed.