Hostname: page-component-89b8bd64d-ksp62 Total loading time: 0 Render date: 2026-05-07T17:08:01.223Z Has data issue: false hasContentIssue false

Genetically Adjusted Propensity Score Matching: A Comparison to Discordant MZ Twin Models

Published online by Cambridge University Press:  04 May 2022

Ian A. Silver*
Affiliation:
Law and Justice Department, Rowan University, Glassboro, NJ, USA Corrections Institute, University of Cincinnati, Cincinnati, OH, USA
Hexuan Liu
Affiliation:
School of Criminal Justice, University of Cincinnati, Cincinnati, OH, USA
Joseph L. Nedelec
Affiliation:
School of Criminal Justice, University of Cincinnati, Cincinnati, OH, USA
*
Author for correspondence: Ian A. Silver, Email: silveria@rowan.edu

Abstract

Discordant monozygotic (MZ) twin methodologies are considered one of the foremost statistical approaches for estimating the influence of environmental factors on phenotypic variance. Limitations associated with the discordant MZ twin approach generates an inability to estimate particular relationships and adjust estimates for the confounding influence of gene-nonshared environment interactions. Recent advancements in molecular genetics, however, can provide the opportunity to address these limitations. The current study reviews an alternative technique, genetically adjusted propensity scores (GAPS) matching, that integrates observed genetic and environmental information to adjust for the confounding of these factors in nonkin individuals. Simulations and a real data example were used to compare the GAPS matching approach to the discordant MZ twin method. Although the results of the simulated comparisons demonstrated that the discordant MZ twin approach remains the more robust statistical technique to adjust for shared environmental and genetic factors, GAPS matching — under certain conditions — could represent a viable alternative when MZ twin samples are unavailable. Overall, the findings suggest that GAPS matching can potentially provide an alternative to the discordant MZ twin approach when limited variation exists between identical twin pairs. Moreover, the ability to adjust for gene-nonshared environment interactions represents a potential advancement associated with the GAPS approach. The limitations of the approach, as well as polygenic risk scores, are also discussed.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2022. Published by Cambridge University Press on behalf of International Society for Twin Studies
Figure 0

Fig. 1. Visual depiction of specified variation in the simulated treatment variable (X) without interactions.Note: A represents the genetic effect, pa represents the proportion of variation in X predicted by A, C represents the shared environment, pc represents the proportion of variation in X predicted by C, E represents the nonshared environment, pe represents the proportion of variation in X predicted by E, X represents a dichotomous treatment construct. The double-headed arrow represents the residual variation in the specification of X, where prv represents the proportion of residual variation.

Figure 1

Fig. 2. Visual depiction of specified variation in the simulated treatment variable (X) without interactions

Figure 2

Fig. 3. Visual depiction of specified variation in the simulated treatment variable (X) with interactions

Figure 3

Fig. 4. Visual depiction of specified variation in the simulated independent variable (Y) without interactions

Figure 4

Fig. 5. Visual depiction of specified variation in the simulated independent variable (Y) with interactions

Figure 5

Fig. 6. Visual examples of the analytical matching approach.Note: Percentage values represent the amount of variation in X predicted by A, C and E used to predict values on X. For example, .25 of E when E predicted 38.2% of the variation in X indicates that 9.56% of the variation of X (contributed by the nonshared environment) is adjusted for in the matching process. A represents the genetic effect, C represents the shared environment, E represents the nonshared environment and X represents a dichotomous independent variable. Y represents the dependent variable, and Δ or b represents the difference in the outcomes between the treatment (X = 1) and control cases (X = 0). The matching procedure described does not fully emulate the discordant MZ twin approach as identical twins would share the same genetic materials. A predicted probability was calculated using a logistic regression with all of or subcomponents of A (x1-x4), C (x5-x8) or E (x9-x12; see Appendix A and R-scripts) serving as the independent variables and X serving as the dependent variable. The resulting predicted probability was then used to match participants.

Figure 6

Fig. 7. (Example 1). Slope coefficients of Y regressed on X differentially adjusting for the confounding influence of genetic, shared environmental and nonshared environmental effects.Note: A = genetics, E = nonshared environment, C = shared environment. The true association between Y and X is 1.00 (Starting N = 10,000). For the current example, variation in X was specified as 45 % genetic, 38.2% nonshared environment, 12.8% shared environment and 4% error. Genetic, shared environmental and nonshared environmental factors equally contributed to the variation in Y. All estimates were derived from a postmatching OLS model. Matching was completed using nearest neighbor matching with a caliper of .05. The orange area provides the specifications that performed further away from the true point estimate (1.00) than the approximation of discordant MZ twin methods, and the green area provides the specifications that were closer to the true point estimate (1.00) than the approximation of discordant MZ twin methods. The proportions on the y-axis represent the proportion of the variation in X contributed by the specified component (A, C or E) that is adjusted for by the model. For example, .25 * E indicates that 9.56% of the variation of X (contributed by the nonshared environment) is adjusted for in the model.

Figure 7

Table 1. Demonstration of the effects of missing heritability on Example 1 (simulation starting N = 10,000)

Figure 8

Fig. 8. (Example 1). Slope coefficients of Y regressed on X with low and high genetic contribution to the variation in X.Note: A = genetics, E = nonshared environment, C = shared environment. The true association between Y and X is 1.00 (Starting N = 10,000). For the current example, low genetic contribution: 10 % genetic, 64.5% nonshared environment, 21.5% shared environment and 4% error. High genetic contribution: 80 % genetic, 12% nonshared environment, 4% shared environment and 4% error. Genetic, shared environmental and nonshared environmental factors equally contributed to the variation in Y. All estimates were derived from a postmatching OLS model. Matching was completed using nearest neighbor matching with a caliper of .05. The orange area provides the specifications that performed further away from the true point estimate (1.00) than the approximation of discordant MZ twin methods, and the green area provides the specifications that were closer to the true point estimate (1.00) than the approximation of discordant MZ twin methods. The proportions on the y-axis represent the proportion of the variation in X contributed by the specified component (A, C or E) that is adjusted for by the model.

Figure 9

Fig. 9. (Example 2). Slope coefficients of Y regressed on X differentially adjusting for the confounding influence of genetic, shared environmental and nonshared environmental effects.Note: A = genetics, E = nonshared environment, C = shared environment. The true association between Y and X is 1.00 (Starting N = 10,000). For the current example, variation in X was specified as 36 % genetic, 30.6% nonshared environment, 12.8% shared environment, 16.7% genetic* nonshared environment and 4% error. Genetic, shared environmental and nonshared environmental factors equally contributed to the variation in Y. All estimates were derived from a postmatching OLS model. Matching was completed using nearest neighbor matching with a caliper of .05. The orange area provides the specifications that performed further away from the true point estimate (1.00) than the approximation of discordant MZ twin methods, and the green area provides the specifications that were closer to the true point estimate (1.00) than the approximation of discordant MZ twin methods. The proportions on the y-axis represent the proportion of the variation in X contributed by the specified component (A, C or E) that is adjusted for by the model. For example, .25 * E indicates that 7.7% of the variation of X (attributed to the nonshared environment) is adjusted for in the model.

Figure 10

Fig. 10. (Example 2). Slope coefficients of Y regressed on X with low and high genetic contribution to the variation in X.Note: A = genetics, E = nonshared environment, C = shared environment. The true association between Y and X is 1.00 (Starting N = 10,000). For the current example, Low Genetic Contribution: 8 % genetic, 51.6% nonshared environment, 21.5% shared environment, 14.9% genetic* nonshared environment, and 4% error. High Genetic Contribution: 64 % genetic, 9.6% nonshared environment, 4% shared environment, 18.4% genetic* nonshared environment, and 4% error. Genetic, shared environmental, and nonshared environmental factors equally contributed to the variation in Y. All estimates were derived from a postmatching OLS model. Matching was completed using nearest neighbor matching with a caliper of .05. The orange area provides the specifications that performed further away from the true point estimate (1.00) than the approximation of discordant MZ twin methods, and the green area provides the specifications that were closer to the true point estimate (1.00) than the approximation of discordant MZ twin methods. The proportions on the y-axis represent the proportion of the variation in X contributed by the specified component (A, C, or E) that is adjusted for by the model.

Figure 11

Fig. 11. (Example 3). Slope coefficients of Y regressed on X differentially adjusting for the confounding influence of genetic, shared environmental and nonshared environmental effects.Note: A = genetics, E = nonshared environment, C = shared environment. The true association between Y and X is 1.00 (Starting N = 10,000). For the current example, variation in X was specified as 27 % genetic, 30.6% nonshared environment, 10.2% shared environment, 16.7% genetic* nonshared environment, 11.6% genetic*shared environment and 4% error. Genetic, shared environmental and nonshared environmental factors equally contributed to the variation in Y. All estimates were derived from a postmatching OLS model. Matching was completed using nearest neighbor matching with a caliper of .05. The orange area provides the specifications that performed further away from the true point estimate (1.00) than the approximation of discordant MZ twin methods, and the green area provides the specifications that were closer to the true point estimate (1.00) than the approximation of discordant MZ twin methods. The proportions on the y-axis represent the proportion of the variation in X contributed by the specified component (A, C or E) that is adjusted for by the model. For example, .25 * E indicates that 7.7% of the variation of X (attributed to the nonshared environment) is adjusted for in the model.

Figure 12

Fig. 12. (Example 3). Slope coefficients of Y regressed on X with low and high genetic contribution to the variation in X.Note: A = genetics, E = nonshared environment, C = shared environment. The true association between Y and X is 1.00 (Starting N = 10,000). For the current example, Low genetic contribution: 6% genetic, 51.6% nonshared environment, 17.2% shared environment, 14.9% genetic* nonshared environment, 6.3% genetic*shared environment and 4% error. High genetic contribution: 48% genetic, .96% nonshared environment, 3.2% shared environment, 18.4% genetic* nonshared environment, 16.8% genetic*shared environment and 4% error. Genetic, shared environmental and nonshared environmental factors equally contributed to the variation in Y. All estimates were derived from a postmatching OLS model. Matching was completed using nearest neighbor matching with a caliper of .05. The orange area provides the specifications that performed further away from the true point estimate (1.00) than the approximation of discordant MZ twin methods, and the green area provides the specifications that were closer to the true point estimate (1.00) than the approximation of discordant MZ twin methods. The proportions on the y-axis represent the proportion of the variation in X contributed by the specified component (A, C or E) that is adjusted for by the model.

Figure 13

Table 2. Estimating the association between completing 4 years of college and personal earnings using the full Add Health sample, the GAPS matched sample and a sibling comparison model

Supplementary material: PDF

Silver et al. supplementary material

Silver et al. supplementary material

Download Silver et al. supplementary material(PDF)
PDF 1.6 MB