论文部分内容阅读
在很长的DNA序列中,DFT的大量运算会影响基因识别的效率,对此提出了在Voss映射下仅依赖于碱基在三种不同位置上出现频率的信噪比计算公式,推导发现在Voss与Z-curve映射下其频谱之间存在4倍缩放关系,仿真实验验证了此结果.针对阈值判别的主观性和经验性等缺陷,运用仿真实验方法确定不同类基因的阈值.从敏感性、特异性和精确度三方面对不同阈值下的测试结果进行了评估.同时提出了基于Bootstrap重复抽样的基因最优阈值算法,对不同类基因的最优阈值进行了预测,其中人和鼠类的最优阈值为1.930,并分析了算法的有效性和可行性,其精确度达到了92.8%.
In a long DNA sequence, the large number of operations of DFT will affect the efficiency of gene recognition. In this paper, we propose a formula for calculating the signal-to-noise ratio that depends only on the frequency of bases in three different positions under the Voss mapping. Voss and Z-curve mapping between the spectrum of its scaling 4 times, the simulation results verify this result.For the subjectivity and experience of threshold discrimination and other defects, the use of simulation experiments to determine the threshold of different types of genes from the sensitivity , Specificity and accuracy of the three kinds of genes were evaluated in different thresholds.At the same time, an optimal thresholding algorithm based on Bootstrap resampling was proposed to predict the optimal threshold of different kinds of genes, of which human and mouse The optimal threshold is 1.930, and the validity and feasibility of the algorithm are analyzed. The accuracy reaches 92.8%.