论文部分内容阅读
针对非线性高维医学数据降维的困难,引入了一种新的非线性降维方法Isomap,并从算法原理的角度讨论了方法在医学数据处理中的适用性。该文将Isomap应用在两个典型医学数据集(肺癌基因表达数据和乳腺癌病理数据)的分析中,发现它们的本质维数都低于3,因而可以得到在低维投影空间中的可视化表示。实验进一步将Isomap和主成份分析(PCA)的投影结果相比较,并统计类内距离,结果显示Isomap优于传统的线性降维技术。这说明了非线性降维技术在高维医学数据分析中的潜力。
Aiming at the difficulty of dimensionality reduction in nonlinear high-dimensional medical data, a new nonlinear dimensionality reduction method, Isomap, is introduced and the applicability of the method in medical data processing is discussed from the perspective of algorithmic principle. In this paper, Isomap is applied to the analysis of two typical medical datasets (lung cancer gene expression data and breast cancer pathology data) and found that their intrinsic dimensions are less than 3, so that the visual representation in low-dimensional projection space can be obtained . The experiment further compares the projection results of Isomap and principal component analysis (PCA), and calculates the intra-class distance. The results show that Isomap is superior to the traditional linear dimensionality reduction technique. This illustrates the potential of non-linear dimensionality reduction in high-dimensional medical data analysis.