论文部分内容阅读
It is well established that different sites within a protein evolve at different rates according to their role within the protein; identification of these correlated mutations can aid in tasks such as ab initio protein structure,structure function analysis or sequence alignment.Mutual Information is a standard measure for coevolution between two sites but its application is limited by signal to noise ratio.In this work we report a preliminary study to investigate whether larger sequence sets could circumvent this problem by calculating mutual information arrays for two sets of drug naive sequences from the HIV gp120 protein for the B and C subtypes.Our results suggest that while the larger sequences sets can improve the signal to noise ratio,the gain is offset by the high mutation rate of the HIV virus which makes it more difficult to achieve consistent alignments.Nevertheless,we were able to predict a number of coevolving sites that were supported by previous experimental studies as well as a region close to the C terminal of the protein that was highly variable in the C subtype but highly conserved in the B subtype.