论文部分内容阅读
近缘物种基因组间保留了祖先的大量信息,具有较好的保守性。通过比较近缘物种的基因组序列可以获得大片段的共线性区域,而这些区域内包含了丰富的同源信息,可用来发现未知基因、改善基因组注释的质量。本研究中,首先,借助同样的基因组注释平台对它们进行了基因组注释。其次,通过比较两个全基因组序列获得共线性信息,然后基于共线性信息对他们的基因组注释进行改善。最终,在野生黄瓜中新注释出了909个基因,栽培黄瓜中新注释出了853个基因。结合野生与栽培黄瓜的转录组信息,在野生黄瓜中发现了87例开放阅读框(ORF)较长的基因被错误注释成多个ORF短基因,40例多个ORF较短的基因被过度预测成单个长ORF基因;相应地在栽培黄瓜中分别确定了166例和36例错误注释。
The relatives of the ancestral genome retained a large number of information, has a better conservative. By comparing the genome sequences of closely related species, large regions of colinearity can be obtained, and these regions contain abundant homologous information, which can be used to discover unknown genes and improve the quality of genome annotation. In this study, we first genomically annotated them using the same genomic annotation platform. Second, collinear information was obtained by comparing two whole genome sequences, and their genome annotation was then improved based on colinearity information. Finally, 909 new genes were annotated in wild cucumbers and 853 genes were newly annotated in cultivated cucumbers. According to the transcriptome information of wild and cucumber cultivars, 87 long ORF genes were annotated into multiple ORF short genes in wild cucumber and 40 short ORF genes were overestimated Into a single long ORF gene; accordingly, 166 and 36 erroneous annotations were identified in cultivated cucumber, respectively.