论文部分内容阅读
目的探索随机生存森林在大规模测序肺癌随访研究资料中的降维效果,为进一步建立预后预测模型提供依据。方法利用随机生存森林法对120位肺癌患者399个单核苷酸多态性(single nucleotide polymorphisms,SNPs)位点进行降维分析,筛选出重要性评分较高且错分率较低的SNPs子集,再对该子集建立多元Cox比例风险模型,并利用交叉验证法评价模型的预测效果。结果随机生存森林法筛选出25个重要的SNPs,控制临床协变量(临床分期、是否手术、组织病理学类型)的多元Cox比例风险模型显示有4个位点有统计学意义。交叉验证结果表明,该模型的平均准确度达83.63%。结论对高维关联性研究数据利用随机生存森林法先去噪降维,再作进一步分析,有助于后续预后预测模型的建立。
Objective To explore the dimensionality reduction effect of random survival forest in large-scale lung cancer follow-up data for sequencing and provide the basis for further establishment of prognosis prediction model. Methods Using random survival forest method, 399 single nucleotide polymorphisms (SNPs) sites of 120 lung cancer patients were analyzed by dimensionality reduction. SNPs with high score and low misclassification rate Set up a multiple Cox proportional hazards model for the subset, and use the cross-validation method to evaluate the predictive effect of the model. Results Twenty-five important SNPs were screened out by random survival forest method. Multivariate Cox proportional hazards models for controlling clinical covariates (clinical stage, surgery, histopathology) showed that there were four sites with statistical significance. The cross-validation results show that the average accuracy of the model is 83.63%. Conclusions It is helpful to establish the model of follow-up prognosis by using the method of random survival forest to denoise and reduce dimensions of high-dimensional correlation data.