论文部分内容阅读
目的将核主成分分析(KPCA)与logistic回归模型相结合,提出一种核主成分logistic(KPCA-based logis-tic)回归模型,用于复杂疾病基因定位的非线性关联分析。方法针对病例对照研究设计的关联分析,对候选基因区域内的单核苷酸多肽性(SNPs)进行核主成分分析,以核主成分为自变量构建logistic回归模型,并对GAW16类风湿关节炎数据中PTPN22和RNF186两个基因区域进行分析,以验证KPCA-based logistic回归模型的有效性和实用性。结果对PTPN22和RNF186两个基因区域的分析结果显示,KPCA-based logistic回归模型既能够检测出单点检验所能发现的区域(PTPN22),也能检测出单点检验所不能发现的区域(RNF186)。结论 KPCA-based logistic回归模型是一种有效的非线性关联分析方法,能够发现更多的易感区域。
Objective To combine kernel principal component analysis (KPCA) with logistic regression model and propose a KPCA-based logistic regression model for the non-linear association analysis of complex disease gene loci. Methods According to the case-control study design, the principal component analysis (SNP) of single nucleotide polymorphisms (SNPs) in the candidate gene regions was constructed. The logistic regression model was constructed by using the core components as independent variables, and GAW16 rheumatoid arthritis Data PTPN22 and RNF186 two gene regions were analyzed to verify the KPCA-based logistic regression model is valid and practical. Results The analysis of the two regions of PTPN22 and RNF186 showed that KPCA-based logistic regression model was able to detect both regions of PTPN22 detected by single-point test and regions not found by single-point test (RNF186 ). Conclusion KPCA-based logistic regression model is an effective non-linear correlation analysis method, which can find more susceptible regions.