论文部分内容阅读
采用新一代高通量测序技术Illumina Hi Seq 2000对铁皮石斛(Dendrobium officinale)转录组进行测序,共获得11 153 295 000 nt数据。对测序获得数据(reads)进行序列拼接组装,共获得121 596个单基因簇,序列平均长度为660 bp,整体序列信息达到了40.16 Mb。再应用生物信息学相关数据库进行比对,结果表明,本测定获得的52 345个Unigene能够在数据库中检索到相关功能注释。通过GO数据库比对,测序获得Unigene功能分类可分为3大类57个分支,其中有大量的Unigene与细胞、催化活性、细胞部分、细胞器等相关功能。通过COG数据库比对,测序获得Unigene功能注释到25类直系同源蛋白分类中如转录、复制,重组和修复、翻译,核糖体结构和生物起源等。以KEGG数据库作为参考,依测序获得Unigene可定位到128个代谢途径分支,如脂类代谢、氨基酸代谢、碳水化合物代谢等。进一步利用软件查找SSR位点发现,从Unigene中共找到9 892个SSR位点。SSR不同重复基序类型中,出现频率最高的为AG/CT,其次是AAG/CTT、CCG/CGG和AGG/CCT。
A new generation of high throughput sequencing technology, Illumina Hi Seq 2000, was used to sequence the transcriptome of Dendrobium officinale, and 11 153 295 000 nt data were obtained. A total of 121 596 single gene clusters were obtained. The average length of the sequence was 660 bp, and the overall sequence information reached 40.16 Mb. Bioinformatics related databases were used for comparison. The results showed that 52 345 Unigene obtained in this study can retrieve related functional annotation in the database. Through the GO database alignment and sequencing, the Unigene functional classification can be divided into 57 major categories and 57 branches, including a large number of Unigene and cell, catalytic activity, cell parts, organelles and other related functions. Through the comparison and sequencing of COG database, Unigene functional annotation has been annotated to 25 types of orthologous protein classification such as transcription, replication, recombination and repair, translation, ribosome structure and biological origin. According to the KEGG database, Unigene can be mapped to 128 metabolic pathways according to the sequence, such as lipid metabolism, amino acid metabolism, carbohydrate metabolism and so on. Further searching for SSR loci using software revealed that a total of 9 892 SSR loci were found from Unigene. Among SSR types, the most frequent occurrences were AG / CT, followed by AAG / CTT, CCG / CGG and AGG / CCT.