论文部分内容阅读
KNN是常用有效的分类算法,在数据分类及故障诊断中有很多的应用,同时KNN也被应用于缺失数据的补值算法中。传统的KNN方法通过计算数据点之间的欧式距离并排序,得到K个距离最近的点,通过K个点的数值对缺失值进行补值计算。但在此过程中,距离的计算受到变量数据分布的特点及不同变量数据分布不一致等的影响,从而影响补值结果。本论文提出了一种基于次序的KNN计算方法 KNNOI(KNN based on Order Imputation),通过对变量进行排序,应用变量的排序序号差计算数据点的距离,代替原来的距离计算方法。将此算法应用于数据补值,研究了排序方式及参数的选择对补值的影响,与传统的KNN算法进行了比较。实验结果表明,基于排序的算法补值得到的结果要优于传统的KNN方法。
KNN is a commonly used and effective classification algorithm, which has many applications in data classification and fault diagnosis. At the same time, KNN is also applied to the complementary algorithm of missing data. The traditional KNN method obtains the K nearest points by calculating the Euclidean distance between the data points and complements the missing values by the value of K points. However, in this process, the calculation of the distance is affected by the characteristics of the data distribution of the variables and the inconsistent distribution of the data of different variables, thereby affecting the replacement value. In this paper, a KNNI (KNNI on Order Imputation) method based on order is proposed, which ranks the data points by sorting the variables and using the ordinal number difference of the variables instead of the original distance calculation method. The algorithm is applied to the data complement, the influence of the choice of sorting method and parameters on the complement is studied, and compared with the traditional KNN algorithm. The experimental results show that the result of recursion based on ranking algorithm is superior to the traditional KNN method.