论文部分内容阅读
提出了两层混合分类器来预测蛋白质半胱氨酸氧化还原状态,第一层总体线性分类器利用氨基酸百分含量作为输入信息,第二层局部SVM分类器利用半胱氨酸周围局部序列作为输入信息。以2002年4月份的PISCES culled PDB数据库中的 639条蛋白质多肽链作为研究对象,共含有584条二硫键,2 904个半胱氨酸。经严格的折叠刀方法检验,预测半胱氨酸的氧化还原状态准确率最高可达84.1%(半胱氨酸水平)和80.1%(蛋白质水平)。结果表明这种将蛋白质总体信息与局部上下文序列信息结合起来构建的两层混和分类器具有较高的预测准确率。研究结果也表明总体氨基酸百分含量和半胱氨酸周围局部序列都携带有二硫键形成的相关信息,暗示了半胱氨酸是否形成二硫键不但取决于蛋白质全局的结构信息同时也受到局部序列信息的影响。
A two-layer hybrid classifier is proposed to predict the redox status of protein cysteines. The first linear totalizer uses the amino acid percentage as input. The second local SVM classifier uses the local sequence around cysteines as Enter information. Based on the 639 protein polypeptide chains in PISCES culled PDB database in April 2002, a total of 584 disulfide bonds and 2,904 cysteines were found. A rigorous folding knife test predicts cysteine redox status up to 84.1% (cysteine level) and 80.1% (protein level). The results show that the two-level hybrid classifier constructed by combining the overall protein information with the local context sequence information has higher prediction accuracy. The results also show that both the percentage of total amino acids and the local sequences around cysteines carry disulfide bond formation information suggesting that cysteine disulfide bond formation depends not only on the global structural information of the protein but also on Effect of local sequence information.