论文部分内容阅读
为了解决传统的入侵检测聚类算法准确率较低这个问题,结合半监督学习的思想,提出了一种面向入侵检测的半监督聚类算法。首先利用样本数据集中的部分标记数据,生成用于初始化聚类的种子集,通过计算样本数据集中标记点与每个类簇中标记点均值的欧氏距离,得到每类的初始聚类中心,实现了入侵检测数据的准确识别。该算法有效地避免了传统聚类算法中初始聚类中心选择的盲目性和随机性,提高了检测率。实验结果表明,在处理入侵检测数据时,该算法能够充分利用少量类标记信息进行半监督学习,较传统的K-means算法聚类效果更好,检测准确率更高。
In order to solve the problem that the traditional intrusion detection clustering algorithm has low accuracy, combined with the idea of semi-supervised learning, a semi-supervised clustering algorithm for intrusion detection is proposed. Firstly, using the partial marker data in the sample data set to generate the seed set for initializing the cluster, the initial cluster centers of each type are obtained by calculating the Euclidean distance between the marker points in the sample data set and the average of the marker points in each cluster. Intrusion detection data to achieve an accurate identification. The algorithm effectively avoids the blindness and randomness of the initial clustering center selection in the traditional clustering algorithm and improves the detection rate. Experimental results show that the proposed algorithm can make full use of a small amount of labeled information for semi-supervised learning when dealing with intrusion detection data. Compared with the traditional K-means clustering algorithm, this algorithm performs better and has higher detection accuracy.