三江黄牛全基因组数据分析

来源 :中国农业科学 | 被引量 : 0次 | 上传用户:liuxin87675241
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
【目的】研究三江黄牛群体遗传多样性,从基因组层面讨论其群体遗传变异情况。【方法】提取50个体基因组总DNA,等浓度等体积混合,构建混合样本DNA池,利用CovarisS2进行随机打断基因组DNA,电泳回收长度500 bp的DNA片段,构建DNA文库。应用Illumina HiSeq 2000测序,最终得到测序数据。利用BWA软件将短序列比对到牛参考基因组(UMD 3.1),来检测三江黄牛基因组突变情况。SAMtools、Picard-tools、GATK、Reseqtools对重测序数据进行分析,Ensemb1、DAVID、dbSNP数据库对SNPs和indels进行注释。【结果】全基因组重测序分析共计得到77.8 Gb序列数据,测序深度为25.32×,覆盖率为99.31%。测序得到778 403 444个reads和77 840 344 400个碱基,比对到参考基因组(UMD 3.1)的reads为673 670 505,碱基为67 341 451 555,匹配率分别为86.55%和86.51%,成对比对上的reads数为635 242 898(81.61%),成对比对上的碱基数为63 512636 924(81.59%);共确定了20 477 130个SNPs位点和1 355 308个indels,其中2 147 988个SNPs(2.4%)和90 180个indels(6.7%)是新发现的。总SNPs中,鉴定出纯合SNPs989 686(4.83%),杂合SNPs19 487 444(95.17%),纯合/杂合SNP比为1:19.7。转换数为14 800 438个,颠换为6 680 058个,转换/颠换(TS/TV)为2.215。剪切位点突变SNP727个,开始密码子变非开始密码子SNP117个,提前终止密码子的SNP 530个,终止密码子变非终止密码子SNP88个。检测到非同义突变数为57 621,同义突变为83 797,非同义/同义比率为0.69。检测到非同义SNPs分布在9 017个基因上,其中发现567个基因与已报道的重要经济性状相符,肉质、抗病、产奶、生长性状、生殖等相关基因的数量分别为471、77、21、10、8个,其中包括功能相重叠的基因;indels数据中,缺失数量为693 180(51.15%),插入数量为662 148(48.85%),纯合indels数量为161 198(11.89%),杂合indels数量1 194 110(88.11%),大部分的变异都位于基因间隔区和内含子区;三江黄牛全基因组杂合度(H)、核苷酸多样性(Pi)及theta W分别为7.6×10~(-3)、0.0 039、0.0 040,说明其遗传多样性较为丰富。三江黄牛群体Tajima’D为-0.06 832,推测可能由于群体内存在不平衡选择所致。【结论】本研究为进一步分析与经济性状相关的遗传学机制和保护三江黄牛品种遗传多样性提供了基因组数据支持。 【Objective】 The genetic diversity of Sanjiang cattle population was studied, and the population genetic variation was discussed from the genome level. 【Method】 Total DNA of 50 individuals was extracted and mixed in equal volume of equal volume to construct DNA pool of mixed samples. Genomic DNA was randomly interrupted by CovarisS2 and a DNA fragment of 500 bp in length was recovered by electrophoresis to construct a DNA library. Illumina HiSeq 2000 sequencing was performed to finally obtain sequencing data. BWA software was used to align the short sequence to the bovine reference genome (UMD 3.1) to detect the genomic mutation in Sanjiang cattle. SAMtools, Picard-tools, GATK, Reseqtools to analyze heavy resequencing data, Ensemb1, DAVID, dbSNP database to annotate SNPs and indels. 【Result】 The results of genome-wide resequencing analysis showed that 77.8 Gb sequence data were obtained. The sequence depth was 25.32 × and the coverage rate was 99.31%. Sequencing revealed 778 403 444 reads and 77 840 344 400 bases. The aligned reads to the reference genome (UMD 3.1) were 673 670 505 and the bases were 67 341 451 555 with the matching rates of 86.55% and 86.51%, respectively, The number of reads was 635 242 898 (81.61%) in paired alignments, and the number of bases in pairwise alignments was 63 512636 924 (81.59%). A total of 20 477 130 SNPs and 1 355 308 indels were identified, Of these, 2,147,988 SNPs (2.4%) and 90,180 indels (6.7%) were newly discovered. Of the total SNPs, homozygous SNPs989 686 (4.83%) and heterozygous SNPs19 487 444 (95.17%) were identified with a homozygous / heterozygous SNP ratio of 1: 19.7. The number of conversions was 14 800 438, transposed into 6 680 058 transits / transversions (TS / TV) was 2.215. There were 727 mutations of SNP at the cleavage site, 117 mutations at the start codon and 530 SNPs at the early stop codon, and 88 mutations at the stop codon. The number of non-synonymous mutations detected was 57 621, the synonymous mutation was 83 797, and the non-synonymous / synonymous ratio was 0.69. Detecting non-synonymous SNPs distributed in 9 017 genes, of which 567 genes were found to be consistent with reported important economic traits. The numbers of related genes such as meat quality, disease resistance, milk production, growth traits and reproduction were 471 and 77 , 21, 10, and 8, including overlapping genes. In the indels data, the number of indels was 693 180 (51.15%), the number of insertions was 662 148 (48.85%), and the number of homozygous indels was 161 198 (11.89% ), The number of heterozygous indels was 1 194 110 (88.11%), most of them were located in the intergenic region and intron region. The whole genome heterozygosity (H), nucleotide diversity (Pi) and theta W Respectively, 7.6 × 10 -3, 0.0 039,0.0 040, indicating that their genetic diversity is more abundant. The Tajima’D of Sanjiang cattle population was -0.06 832, presumably due to imbalanced selection in the population. 【Conclusion】 This study provided genomic data support for further analysis of genetic mechanisms related to economic traits and conservation of genetic diversity in Sanjiang cattle breeds.
其他文献
<正>一1922年,刚恒毅(Celso Costantini,1876-1958)来到中国。这位擅长雕塑、深谙艺术史、倡导天主教"本土化"的枢机主教,对于天主教绘画艺术的中国化做出了诸多贡献。在他看
随着经济全球化、社会信息化,世界各国之间的文化交流也更加密切,作为世界文化长河中,一朵独具魅力的浪花,茶文化在中西方文化交流中发挥着不可取代的作用。以英国为代表的西
近年来中国房地产市场越炒越热,房价一路水涨船高,且有继续上涨的趋势。很多人认为房地产市场存在过热现象。房价在近年来迅速上升的原因主要包括人们的收入水平普遍提高,城
数学能力是在后天的学习、实践中发展起来的.因此,要在初中数学阶段加强对学生数学能力的培养,尤其是加强数学创新能力的培养.
作为一项同场竞技类项目,篮球运动一直以其高强度的激烈对抗性为主要特征。有数据表明,在一场篮球比赛中,运动员的总跑量为4000~8000m,其中20%~40%为短距离冲刺跑。运动员在比赛
由工信部下属的中国企业品牌研究中心及直属研究咨询机构Chnbrand实施的中国品牌力指数(C-BPI)此前发布。结果显示,有65%的行业第一名由本土品牌获得,35%的行业由国际品牌瓜分,
"大众创业""万众创新"的背景下,以实地调查为主,走入抚顺市区内五所高校进行调查,囊括一本、二本、三本、高职高专不同层次院校。立足抚顺大学生创业现状,全面分析制约我市大
本文以土族的起源、发展、形成与延续为背景,探讨在历史时空变幻中,土族的历史记忆和族群认同经历了怎样的传递和调整过程,以及当前土族精英分子如何利用部分历史记忆和学术
自器官移植技术被临床应用以来,在半个世纪内得到了快速的发展。然而,我国刑法一直没有对其进行规范,目前刑法修正案(八)顺应形势对其作出规定,正式将组织出卖人体器官、欺骗
"市民社会"一词具有古典含义和近代含义。古典含义的"市民社会"主要是指建立了国家的"文明社会",相对于野蛮部落而言;近代含义则是指国家控制之外的社会经济生活,相对于国家政权而