论文部分内容阅读
基于详细HP模型,先把20种氨基酸分为四类。再建立一个连续的坐标空间,对蛋白质序列进行混沌游走(CGR),把蛋白质序列中的每个氨基酸映射到平面坐标系中,每个氨基酸对应着一个点。然后用四类氨基酸的坐标值的平均值组成8维特征向量对蛋白质序列进行数值刻画,并定义向量间欧式距离为序列间的距离。最后基于这种对蛋白质序列的数学描述,对包括人类在内的8个物种的p53基因蛋白质序列进行了相似性分析和模糊聚类。结果与实际相符,表明了用此方法研究蛋白质序列相似的合理性。这对研究蛋白质序列有着重要意义。
Based on a detailed HP model, the 20 amino acids are first classified into four categories. Then a continuous coordinate space is established to chaos the protein sequence (CGR). Each amino acid in the protein sequence is mapped to a plane coordinate system, and each amino acid corresponds to a point. Then the average of the four kinds of amino acid coordinates were used to construct the 8-dimensional eigenvector to characterize the protein sequence and to define the distance between vectors as the Euclidean distance. Finally, based on the mathematical description of the protein sequence, similarity analysis and fuzzy clustering were performed on the p53 gene protein sequences of 8 species including humans. The results are in line with the actual, indicating that using this method to study the similarity of protein sequence rationality. This is of great importance for the study of protein sequences.