论文部分内容阅读
文章研究了基于微阵列基因表达数据的胃癌亚型分类。微阵列基因表达数据样本少、纬度高、噪声大的特点,使得数据降维成为分类成功的关键。作者将主成分分析(PCA)和偏最小二乘(PLS)两种降维方法应用于胃癌亚型分类研究,以支持向量机(SVM)、K-近邻法(KNN)为分类器对两套胃癌数据进行亚型分类。分类效果相比传统的医理诊断略高,最高准确率可达100%。研究结果表明,主成分分析和偏最小二乘方法能够有效地提取分类特征信息,并能在保持较高的分类准确率的前提下大幅度地降低基因表达数据的维数。
This article studies the classification of gastric cancer subtypes based on microarray gene expression data. Microarray gene expression data samples less latitude high noise characteristics, making data dimension reduction has become the key to the success of classification. The author applies Principal Component Analysis (PCA) and Partial Least Squares (PLS) two dimensional reduction methods to the classification of gastric cancer subtypes. Using Support Vector Machine (SVM) and K-nearest neighbors (KNN) Gastric cancer data were sub-classified. Classification effect is slightly higher than the traditional medical diagnosis, the highest accuracy rate of up to 100%. The results show that principal component analysis and partial least squares method can effectively extract classification feature information and greatly reduce the dimensionality of gene expression data while maintaining a high classification accuracy.