论文部分内容阅读
从大量的DNA序列中获取丰富的生物信息具有重要的理论意义和应用价值.计算得到了Z-curve与Voss两种映射下功率谱与信噪比的理论关系,而且通过核苷酸频数分布快速计算实数映射时的信噪比;提出了4种阈值确定方法和三个评价准则对不同的物种基因阈值进行了分析,累积概率分布交叉阈值法较其他方法最优;分析了固定长度窗口滑动法由于存在“毛刺”问题而造成预测精度下降,在此基础上提出了变步长阶梯平滑滑动法,与固定长度窗口滑动法相比较预测准确率提高了12.16%达到83%,同时还预测了6个未知序列的编码区域.
Obtaining abundant biological information from a large number of DNA sequences has important theoretical significance and application value.The theoretical relationship between power spectrum and signal-to-noise ratio under the two mapping of Z-curve and Voss is calculated, and through the rapid frequency distribution of nucleotides The signal to noise ratio (SNR) of real mapping was calculated. Four kinds of threshold determination methods and three evaluation criteria were proposed to analyze the genetic thresholds of different species. The crossover threshold method of cumulative probability distribution was the best compared with other methods. The fixed-length window sliding method Because of the existence of “burr ” problem, the prediction accuracy is reduced. On the basis of this, a step-variable step smooth sliding method is proposed. Compared with the fixed-length window sliding method, the prediction accuracy is improved by 12.16% to 83% 6 unknown sequence coding region.