Hostname: page-component-89b8bd64d-dvtzq Total loading time: 0 Render date: 2026-05-07T14:47:01.886Z Has data issue: false hasContentIssue false

Genetic risk prediction in a small cohort of healthy adults in Atlanta

Published online by Cambridge University Press:  26 February 2013

JING ZHAO
Affiliation:
Center for Integrative Genomics, School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA
DALIA ARAFAT
Affiliation:
Center for Integrative Genomics, School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA
KENNETH L. BRIGHAM
Affiliation:
Center for Health Discovery and Well Being, Emory University Midtown Hospital, Atlanta, GA 30308, USA
GREG GIBSON*
Affiliation:
Center for Integrative Genomics, School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA
*
*Corresponding author: School of Biology, Georgia Tech, 310 Ferst Drive, Atlanta, GA 30332, USA. Tel: (404) 385-2343. E-mail: greg.gibson@biology.gatech.edu
Rights & Permissions [Opens in a new window]

Summary

Compared with single markers, polygenic scores that evaluate the joint effects of multiple trait-associated variants are more effective in explaining the variance of traits and risk of diseases. In total, 182 CHDWB (Emory-Georgia Tech Center for Health Discovery and Well Being study) adults were genotyped to investigate the common variant contributions to three traits (height, BMI, serum triglycerides) and three diseases (coronary artery disease (CAD), type 2 diabetes (T2D) and asthma). Association was contrasted between weighted and simple allelic sum polygenic scores with quantitative traits, and with the Framingham risk scores for CAD and T2D. Although the cohort size is two or three orders of magnitude smaller than typical discovery cohorts, we were able to detect significant associations and to explain up to 5% of the traits by the genetic risk scores, despite a strong influence of outliers. An unexpected finding was that CAD-associated single nucleotide polymorphisms (SNPs) explain a significant amount of the variation for total serum cholesterol. Forward step-wise sequential addition of SNPs into the regression model showed that the top-ranked SNPs explain a large proportion of variance, whereas inclusion of gender and ethnicity also affect the performance of polygenic scores.

Information

Type
Research Papers
Copyright
Copyright © Cambridge University Press 2013 
Figure 0

Table 1. Variance explained by genetic risk scores

Figure 1

Fig. 1. Left: The percentage of variance explained by the model by sequentially adding SNPs in the order of their effect sizes (height, BMI and cholesterol-CAD SNPs from top to bottom, (a), (c) and (e)). Right: The percentage of variance explained by the models randomly adding SNPs (height, BMI and cholesterol-CAD SNPs from top to bottom, (b), (d) and (f)) averaged over 100 permutations. SA refers to models with just the sum of alleles score, while eg_SA refers to models additionally fitting ethnicity and gender as covariates with the sum of alleles score. WS refers to models with sum of weighted allelic effects, while eg_WS and pre_WS refer to weighted allelic sum includng ethnicity and gender in the CHDWB cohort, or taken as the population averages, as covariates respectively. LR refers to likelihood ratio models, with or without pre-test probability as a covariate. Variance explained (%) refers only to the genetic contribution in each model.

Figure 2

Fig. 2. Linear regression plot fitting real height by sum of increasing alleles in males. Red symbols Caucasians; blue American Indian; green African Americans; purple Asians. Asterisks, the individuals who are taller than their genetic information would indicate. Red line, regression fitting line for all men. Green line, regression fitting line for males without those taller than the expected men.

Figure 3

Fig. 3. Linear regression plot fitting total triglyceride levels by sum of increasing alleles. Dots: triglyceride levels greater than 100 mg/ml. Triangles, triglyceride levels lower than 100 mg/ml. Red line, linear regression fitting line for all the individuals (P = 0·0053, R2 = 0·042). Blue line, linear regression fitting line for individuals with TG levels higher than 100 mg/ml (P = 0·5453, R2 = 0·005). Green line, linear regression fitting line for individuals with lower TG levels (P = 0·0169, R2 = 0·053).

Figure 4

Fig. 4. (a) Regression of log10(Framingham Risk Score for heart disease) against genotypic log likelihood in males. Exclusion of five older Caucasian males indicated with asterisks elevates the regression from non-significant (P = 0·18, R2 = 0·03) to nominally significant (P = 0·0065, R2 = 0·13). (b) Logistic regression of Framingham risk status on sum of CAD risk alleles in all study participants shows a significant association. (P = 0·0221, R2 = 0·027). Red symbols, Caucasians; blue, American Indian; green, African Americans; purple, Asians. Circles females, Triangles males.