论文部分内容阅读
在对多类不均衡的网络流量进行分类时,基于机器学习的分类模型倾向于多数类,导致少数类召回率较低.针对该问题,提出一种基于统计频率的特征选择方法.该方法首先根据样本的统计频率计算出度量每个特征区分能力的特征选择系数,然后根据特征选择系数构建特征选择矩阵,最后为每个类选择与之相关性较强的特征.在实验阶段,使用该方法选择的特征对多类不均衡的网络流量进行分类获得了较高的整体准确率、少数类召回率和g-mean值,证明该方法可以减轻多类不均衡问题带来的不良影响.
In the classification of many types of unbalanced network traffic, the classification model based on machine learning tends to the majority, resulting in a few types of recall rate is low.To solve this problem, this paper proposes a feature selection method based on statistical frequency.This method first According to the statistical frequency of the sample, the feature selection coefficient that measures the ability of distinguishing each feature is calculated, then the feature selection matrix is constructed according to the feature selection coefficient, and finally the feature with strong relativity is selected for each class.In the experiment stage, The selected features classify many types of unbalanced network traffic and obtain higher overall accuracy, minority recall and g-mean values, which proves that this method can mitigate the adverse effects caused by many types of unbalanced problems.