论文部分内容阅读
【目的】研究三江黄牛群体遗传多样性,从基因组层面讨论其群体遗传变异情况。【方法】提取50个体基因组总DNA,等浓度等体积混合,构建混合样本DNA池,利用CovarisS2进行随机打断基因组DNA,电泳回收长度500 bp的DNA片段,构建DNA文库。应用Illumina HiSeq 2000测序,最终得到测序数据。利用BWA软件将短序列比对到牛参考基因组(UMD 3.1),来检测三江黄牛基因组突变情况。SAMtools、Picard-tools、GATK、Reseqtools对重测序数据进行分析,Ensemb1、DAVID、dbSNP数据库对SNPs和indels进行注释。【结果】全基因组重测序分析共计得到77.8 Gb序列数据,测序深度为25.32×,覆盖率为99.31%。测序得到778 403 444个reads和77 840 344 400个碱基,比对到参考基因组(UMD 3.1)的reads为673 670 505,碱基为67 341 451 555,匹配率分别为86.55%和86.51%,成对比对上的reads数为635 242 898(81.61%),成对比对上的碱基数为63 512636 924(81.59%);共确定了20 477 130个SNPs位点和1 355 308个indels,其中2 147 988个SNPs(2.4%)和90 180个indels(6.7%)是新发现的。总SNPs中,鉴定出纯合SNPs989 686(4.83%),杂合SNPs19 487 444(95.17%),纯合/杂合SNP比为1:19.7。转换数为14 800 438个,颠换为6 680 058个,转换/颠换(TS/TV)为2.215。剪切位点突变SNP727个,开始密码子变非开始密码子SNP117个,提前终止密码子的SNP 530个,终止密码子变非终止密码子SNP88个。检测到非同义突变数为57 621,同义突变为83 797,非同义/同义比率为0.69。检测到非同义SNPs分布在9 017个基因上,其中发现567个基因与已报道的重要经济性状相符,肉质、抗病、产奶、生长性状、生殖等相关基因的数量分别为471、77、21、10、8个,其中包括功能相重叠的基因;indels数据中,缺失数量为693 180(51.15%),插入数量为662 148(48.85%),纯合indels数量为161 198(11.89%),杂合indels数量1 194 110(88.11%),大部分的变异都位于基因间隔区和内含子区;三江黄牛全基因组杂合度(H)、核苷酸多样性(Pi)及theta W分别为7.6×10~(-3)、0.0 039、0.0 040,说明其遗传多样性较为丰富。三江黄牛群体Tajima’D为-0.06 832,推测可能由于群体内存在不平衡选择所致。【结论】本研究为进一步分析与经济性状相关的遗传学机制和保护三江黄牛品种遗传多样性提供了基因组数据支持。
【Objective】 The genetic diversity of Sanjiang cattle population was studied, and the population genetic variation was discussed from the genome level. 【Method】 Total DNA of 50 individuals was extracted and mixed in equal volume of equal volume to construct DNA pool of mixed samples. Genomic DNA was randomly interrupted by CovarisS2 and a DNA fragment of 500 bp in length was recovered by electrophoresis to construct a DNA library. Illumina HiSeq 2000 sequencing was performed to finally obtain sequencing data. BWA software was used to align the short sequence to the bovine reference genome (UMD 3.1) to detect the genomic mutation in Sanjiang cattle. SAMtools, Picard-tools, GATK, Reseqtools to analyze heavy resequencing data, Ensemb1, DAVID, dbSNP database to annotate SNPs and indels. 【Result】 The results of genome-wide resequencing analysis showed that 77.8 Gb sequence data were obtained. The sequence depth was 25.32 × and the coverage rate was 99.31%. Sequencing revealed 778 403 444 reads and 77 840 344 400 bases. The aligned reads to the reference genome (UMD 3.1) were 673 670 505 and the bases were 67 341 451 555 with the matching rates of 86.55% and 86.51%, respectively, The number of reads was 635 242 898 (81.61%) in paired alignments, and the number of bases in pairwise alignments was 63 512636 924 (81.59%). A total of 20 477 130 SNPs and 1 355 308 indels were identified, Of these, 2,147,988 SNPs (2.4%) and 90,180 indels (6.7%) were newly discovered. Of the total SNPs, homozygous SNPs989 686 (4.83%) and heterozygous SNPs19 487 444 (95.17%) were identified with a homozygous / heterozygous SNP ratio of 1: 19.7. The number of conversions was 14 800 438, transposed into 6 680 058 transits / transversions (TS / TV) was 2.215. There were 727 mutations of SNP at the cleavage site, 117 mutations at the start codon and 530 SNPs at the early stop codon, and 88 mutations at the stop codon. The number of non-synonymous mutations detected was 57 621, the synonymous mutation was 83 797, and the non-synonymous / synonymous ratio was 0.69. Detecting non-synonymous SNPs distributed in 9 017 genes, of which 567 genes were found to be consistent with reported important economic traits. The numbers of related genes such as meat quality, disease resistance, milk production, growth traits and reproduction were 471 and 77 , 21, 10, and 8, including overlapping genes. In the indels data, the number of indels was 693 180 (51.15%), the number of insertions was 662 148 (48.85%), and the number of homozygous indels was 161 198 (11.89% ), The number of heterozygous indels was 1 194 110 (88.11%), most of them were located in the intergenic region and intron region. The whole genome heterozygosity (H), nucleotide diversity (Pi) and theta W Respectively, 7.6 × 10 -3, 0.0 039,0.0 040, indicating that their genetic diversity is more abundant. The Tajima’D of Sanjiang cattle population was -0.06 832, presumably due to imbalanced selection in the population. 【Conclusion】 This study provided genomic data support for further analysis of genetic mechanisms related to economic traits and conservation of genetic diversity in Sanjiang cattle breeds.