论文部分内容阅读
在人口普查,环境监测及医学科学纵向研究等大型资料分析中,经常会遇到数据缺失或不能利用的问题,它给数据分析与应用带来很多困难.因此衡量一种数据分析方法的优劣性时,其在数据缺失情况下依然能够保持分析结果的稳定性就显得尤为重要.为了模拟最普遍的数据缺失情况,本文设计了一种构造对比数列的方法,以便对完整信号做高斯分布和指数分布的随机数据去除,使其存在缺失值,并运用两种熵测度—基本尺度熵和近似熵对存在数据缺失的信号进行复杂性对比分析.结果显示,去除数据的比例以及缺失数据片段的长度均值这两个关键性参数都会引起序列复杂性的改变,而缺失数据片段的长度服从一个怎样的分布,对分析结果影响不大.而且,近似熵方法对数据缺失有较强的敏感性,不适用于分析存在缺失的心率变异性信号,而运用基本尺度熵方法计算出的结果有较好的稳定性,分析实际信号有其独特的优越性.
In large-scale data analysis such as population census, environmental monitoring and longitudinal research of medical science, the data are often missing or not available, which brings many difficulties in data analysis and application. Therefore, to measure the pros and cons of a data analysis method In the absence of data, it is particularly important to maintain the stability of the results.In order to simulate the most common data loss, a method to construct the contrast sequence is designed in this paper to make Gaussian distribution of the complete signal and Exponential distribution of random data removal, so that there is a missing value, and the use of two kinds of entropy measure - basic scale entropy and approximate entropy for the presence of missing data complexity analysis showed that the ratio of data removal and missing data fragments Length of the two key parameters will cause the complexity of the sequence changes, and the length of the missing data subject to what kind of distribution, the impact of the analysis of little.Moreover, the approximate entropy method has a strong sensitivity to data loss, Not suitable for the analysis of missing heart rate variability signals, and using the basic scale entropy method to calculate the knot Better stability, the actual analysis of the signal has its own unique advantages.