论文部分内容阅读
The identification and analysis of tissue specificity of genes and gene expressions have a direct and profound impacton the further understanding of a wide array of problems of much significance.Despite its significance, our understanding of gene tissue specificity is still quite fragmented and incomplete.We developeda large-scale and extensible knowledge acquisition and representation system-the Specifictome-to acquire, organizeand disseminate tissue specificity knowledge for humanand other model organisms such as mouse, rat, and so onusing novel statistical machine learning and Semantic Web technologies.Background:An important part of understanding the human genome and biology is the study of tissue-specific gene expressionsand regulations, which is a complex phenomenon.It is a result of a large number of interacting factors and can beexhibited in a number of ways, including developmental stages, gene expression levels, a genes structural characteristics (promoters, etc.) and gene regulatory and interaction networks.The inherent complexity makes tissue specificityresearch a very challenging problem.Also severaltissue specific databases such as TiGER,TiProD, TissueInfo,TissueDistributionDBsandTiSGeDprovide web services to extract the TS and HK genes and their genomic features by predefinedthresholdvalues.However, there is no widely acceptedcriterion to judge which gene is expressedin a specific tissue only by the expression patterns.Methods: In this research, we 1) uncover the co-expression pattems of TS genes using constrained Bayesian mixture models, 2) discover and evaluate the sequence significant patterns of TS genes using Bayesian factor analysis, 3) build stable and reliable TS gene regulatory networks, and 4) develop a comprehensive knowledge atlas for the expression and regulations of TS genes using semantic web technologies.Results: We have integrated three types of expression data sets: microarray, EST and SAGE for identifying TS and housekeeping genes and used pubmed literatures to validate them.Also, some significant DNA binding motifs are discovered using our motif discovery pipeline.A prototype of the knowledge platform has been developed.Conclusions: Tissue specificity of gene expressions and regulations is a fundamental event during the cell division and development in biology.The knowledge and algorithms for gene tissue specificity prediction could be used in the evaluation of individual tcchniques and the knowledge organization framework as a whole.The platform of the knowledge atlas can not only be used for acquiring corresponding information about TS genes, but also used as a portal to share knowledge among bioinformatics research community .