论文部分内容阅读
目的探讨多因子降维法(multifactor dimensionality reduction,MDR)在筛选糖尿病的基因与环境危险因素交互作用时的用法以及MDR的优缺点,并与传统的Logistic回归做比较。方法通过模拟研究确定MDR所适用的样本量大小,实例分析采用针对糖尿病的病例对照研究数据,分别用MDR方法、Logistic回归进行分析,将两种结果进行对比进而对两种方法作出评价。结果模拟实验结果表明MDR在分析1阶交互作用时效能不如Logistic回归,但随着样本量的增大,两种检测方法效能趋于一致;在分析高阶交互作用时,MDR所表现出的效能高于Logistic回归,而且在分析2阶交互作用中,样本量200时MDR已经表现良好,当样本量为300时,MDR显示了微弱的优势。实例分析中MDR的最优模型为174G/C和是否总胆固醇和总甘油三酯同时偏高之间的交互作用,预测准确率为0.5822,交叉验证一致性为8/10,置换检验有统计学意义(P=0.021)。结论 MDR在分析小样本、高维度数据中表现良好,可有效的分析糖尿病基因与环境的交互作用。
Objective To investigate the use of multifactor dimensionality reduction (MDR) in screening for the interaction of genetic risk factors with environmental risk factors and the advantages and disadvantages of MDR, and compare them with the traditional Logistic regression. Methods To determine the sample size suitable for MDR through simulation study. Case-control study data of diabetes were collected and analyzed by MDR and Logistic regression respectively. The two results were compared to evaluate the two methods. Results Simulation results show that MDR is not as efficient as Logistic regression in analyzing first-order interactions, but as the sample size increases, the efficiency of the two detection methods tends to be consistent. When analyzing higher-order interactions, the performance of MDR Higher than Logistic regression, and MDR has performed well at sample size 200 in analyzing second-order interactions, with a marginal advantage when the sample size is 300. The optimal model of MDR in the case study was 174G / C and whether the interaction between total cholesterol and total triglycerides were high, the prediction accuracy was 0.5822, the cross-validation consistency was 8/10, and the replacement test was statistically significant Significance (P = 0.021). Conclusion MDR performs well in analyzing small samples and high-dimensional data, and can effectively analyze the interaction between diabetes gene and environment.