【摘 要】
:
Objective.To detect genes containing both rare and common variants from next-generation sequencing data with longitudinal measure of phenotypic traits, using penalized generalized estimating equations
【机 构】
:
Department of Health Statistics, Shanxi Medical University
论文部分内容阅读
Objective.To detect genes containing both rare and common variants from next-generation sequencing data with longitudinal measure of phenotypic traits, using penalized generalized estimating equations (PGEE) and penalized quadratic inference function (PQIF).Methods.We simulated longitudinal binary hypertension data derived from the GAW18 genetic data.Weighted sum statistics were constructed to collapse both common and rare variants over genes and then were included into different models:PGEE and PQIF.We examined the two penalized methods, based on the true positives (TP) rate, false positives (FP) rate and mean squared error (MSE).The two methods were also applied to the original GAW18 data.Results.Compared to the unpenalized method, the total MSEs of penalized approaches are much smaller.As the sample size increases, the estimation accuracy of PGEE and PQIF improves significantly.Both PGEE and PQIF can select the true genes such as MAP4, TNN, NRF1, FLT3, and ZFP37 genes with high accuracy.PGEE has larger TP rates than PQIF, but has higher FP rates than PQIF.In summary, the results show that PQIF performs better than PGEE does.PQIF remains relatively optimal even when the working correlation structure is miss-specified.However, PQIF is more sensitive to sample size and sometimes cannot converge when sample size is very small.Conclusion.PQIF has low false selection rate compared to PGEE.The high selection rate of PGEE with true positives might be due to its high false selection rate.Although the difference between the two is not significant under large sample size, a conservative recommendation is to apply PQIF when sample size is small to avoid false selection.In summary, the penalized methods provide a promising and powerful tool for next-generation sequencing data analysis involving longitudinal traits.
其他文献
目的:通过构建离散型地理传播模型,探索H7N9病毒空间维度下的动态传播,为H7N9的监测和预警提供新的视角.方法:选取全球共享禽流感数据倡议组织(GISAID)中H7N9病毒HA和NA基因序列,BioEdit7.0软件进行多序列比对,运用BEAST 1.8.2软件,在贝叶斯框架下,构建离散型地理传播模型,选用symmetric substitution模型和BBSVS方法,对病毒的历史传播轨迹进行
目的:构建一个适用于劳动力人群的冠心病风险评估模型,促进企业健康管理由健康问题发现向疾病风险评估转变。方法:参考Framingham冠心病风险评估模型的建模思想,基于大型石油企业员工健康体检数据,建立风险因素赋分系统,评估未来十年冠心病风险概率。结果:建立了劳动力人群冠心病风险因素赋值体系和风险得分与风险概率的对照关系表。结论:模型具有较好的准确性、人群针对性和简单易用性,可作为大型石油企业员工冠
背景与目的:大气颗粒物(Particulate Matter, PM)是城市主要的大气污染物,严重影响能见度、环境以及人群健康.适时、准确地监测颗粒物污染水平是环境管理控制以及大气污染危险度评估的重要基础.然而,现有地面空气质量监测站点有限,不能全面、适时了解大气污染状况.近几年有研究尝试采用气溶胶光学厚度(Aerosol Optical Depth,AOD)等卫星遥感数据估计地面污染物浓度,但模
目的:探讨长沙市开福区491名不同出生季节和生长季节对婴儿体重增长速率的影响.方法:运用回顾性队列研究的方法,纵向监测长沙市开福区491名婴儿的体重发育情况,采用多重线性回归分析不同出生季节和生长季节对婴儿体重增长速率的影响.结果:长沙市开福区婴儿体重增长速率与出生季节和生长季节均有关.不同出生季节的婴儿在3~12月龄段时体重增长速率差异有统计学意义(P<0.05);不同生长季节的婴儿仅在6~7月
目的:探讨减速瓮模型在选择性交叉设计中的应用,提出一种新的基于减速瓮模型的选择性交叉设计(DUM-SC,Decelerated Um Model-Selective Crossover Design)方法,并探索减速因子函数中参数a和b的合理取值范围.方法:DUM-SC设计是将改良后的瓮模型引入选择性交叉设计的一种新的临床试验设计方法,采用Monte Carlo方法模拟对不同疗效假定下DUM-SC
目的:评价中盖结核病防治项目实施前后陕西省汉中市肺结核患者灾难性卫生支出状况.方法:采用横断面调查获得数据,分析不同界定标准下因结核病导致的家庭灾难性卫生支出的发生率、平均差距、相对差距.结果:阈值标准为40%时,2013年、2015年肺结核患者灾难性卫生支出的发生率分别为44.25%、42.90%.灾难性卫生支出发生率、平均差距、相对差距随着阈值和收入的升高而下降,住院患者高于非住院患者.项目干
目的:比较原因别风险模型和部分分布风险模型预测性能.方法:选取北京市老龄化多维纵向研究中基线无心血管疾病的55岁及以上老年人.感兴趣事件为心血管疾病死亡,竞争事件为其他原因死亡.采用原因别风险模型和部分分布竞争风险模型建立预测模型.通过受试者特征工作曲线下面积(AUC)评价模型的判别能力;采用校正曲线衡量模型的校正能力.结果:1992年基线无心血管疾病共1642人,随访20年,362人因心血管疾病
Objective When discussing the relationship between meteorological factors and malaria, the previous studies mainly focus on the interaction between different climatic factors, while the possible inter
BACKGROUND: Treadmill exercise test (TET) is one of the most common noninvasive diagnosis approaches for ischemic heart diseases, but potential reduction of TET accuracy among women patients was ignor
Background: The evidence of disease mortality attributable to temperature is limited among Chinese population.In this study, we established the measures of attributable risk within distributed lag non