论文部分内容阅读
由于四嗪衍生物抗癌活性与其结构之间可能存在非线性关系,本文引入非线性的流形学习方法对计算出的四嗪衍生物分子描述符进行特征提取,以提高其预测模型的准确性。便于分析,分别采用特征选择的逐步回归法、线性特征提取的主成份分析法以及非线性特征提取的流形学习方法对四嗪衍生物分子描述符进行筛选,然后基于偏最小二乘和支持向量回归机构建其定量构效关系模型。计算结果表明,本文中四嗪衍生物的描述符数据为非线性流形,并且它们的结构与活性之间呈非线性关系,基于支持向量回归机模型的最优预测结果达到了97.4%。所以,利用非线性特征提取的流形学习预处理的QSAR模型可以为此类化合物抗癌活性的预测提供指导。
Due to the possible non-linear relationship between the antitumor activity of tetrazine derivatives and their structure, this paper introduces a nonlinear manifold learning method to extract the characteristic descriptors of the tetrazine derivatives so as to improve the accuracy of the prediction model . Which is convenient for analysis. The stepwise regression method of feature selection, the principal component analysis of linear feature extraction and the manifold learning method of nonlinear feature extraction are respectively used to screen the molecular descriptors of tetrazine derivatives. Based on partial least squares and support vector The regression mechanism builds its QSAR model. The calculation results show that the descriptor data of the tetrazine derivatives in this paper are nonlinear manifolds, and their structure and activity are nonlinear. The optimal prediction results based on the support vector regression model reach 97.4%. Therefore, the QSAR model of manifold learning preprocessing using non-linear feature extraction can provide guidance for the prediction of anticancer activity of such compounds.