Hostname: page-component-89b8bd64d-9prln Total loading time: 0 Render date: 2026-05-07T22:02:56.396Z Has data issue: false hasContentIssue false

Exploring causality from observational data: An example assessing whether religiosity promotes cooperation

Published online by Cambridge University Press:  27 June 2023

Daniel Major-Smith*
Affiliation:
Centre for Academic Child Health, Population Health Sciences, Bristol Medical School, University of Bristol, Bristol BS8 2BN, UK

Abstract

Causal inference from observational data is notoriously difficult, and relies upon many unverifiable assumptions, including no confounding or selection bias. Here, we demonstrate how to apply a range of sensitivity analyses to examine whether a causal interpretation from observational data may be justified. These methods include: testing different confounding structures (as the assumed confounding model may be incorrect), exploring potential residual confounding and assessing the impact of selection bias due to missing data. We aim to answer the causal question ‘Does religiosity promote cooperative behaviour?’ as a motivating example of how these methods can be applied. We use data from the parental generation of a large-scale (n = approximately 14,000) prospective UK birth cohort (the Avon Longitudinal Study of Parents and Children), which has detailed information on religiosity and potential confounding variables, while cooperation was measured via self-reported history of blood donation. In this study, there was no association between religious belief or affiliation and blood donation. Religious attendance was positively associated with blood donation, but could plausibly be explained by unmeasured confounding. In this population, evidence that religiosity causes blood donation is suggestive, but rather weak. These analyses illustrate how sensitivity analyses can aid causal inference from observational research.

Information

Type
Registered Report
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
Copyright © The Author(s), 2023. Published by Cambridge University Press
Figure 0

Figure 1. Directed acyclic graph describing different causal structures and how they can be encoded in such causal graphs. The arrows represent the direction of causality; for instance, the arrow from ‘Exposure’ to ‘Outcome’ indicates that the exposure causes the outcome. A confounder is a variable which causes both the exposure and the outcome, as indicated by arrows from ‘Confounder’ to both ‘Exposure’ and ‘Outcome; as information can ‘flow’ from the exposure to the outcome via confounders, it is necessary to adjust for all confounders – which blocks these back-door paths – in order to obtain an unbiased causal estimate of the exposure–outcome association. A mediator is a variable which is caused by the exposure (arrow from ‘Exposure’ to ‘Mediator’), which in turn causes the outcome (arrow from ‘Mediator’ to ‘Outcome’); as mediators are part of the pathway by which the exposure causes the outcome, adjusting for a mediator will result in a biased estimate of the exposure–outcome association. A collider is a variable which is caused by both the exposure and the outcome (arrows from both ‘Exposure’ and ‘Outcome’ to ‘Collider’). It is not necessary to adjust for a collider, as colliders ‘block’ the flow of information between other variables; adjusting for a collider, however, opens these pathways, potentially resulting in biased associations. Using directed acyclic graphs to represent the assumed causal structure of the data can help identify whether covariates are confounders, mediators or colliders, and therefore which variables to statistically adjust for to return an unbiased causal estimate, given the assumptions embedded in the causal graph.

Figure 1

Figure 2. Directed acyclic graph showing potential reciprocal causation between religiosity and marital status, and how marital status can be both a confounder (at time 1) and a mediator (at time 3) of the religiosity–cooperation association. The ‘_U’ after MaritalStatus_t1 means this variable is unobserved (as in our real-word data example), resulting in an inability to estimate an unbiased causal estimate of the association between Religiosity_t2 and Cooperation.

Figure 2

Table 1. Summary of religious/spiritual beliefs and behaviours exposure variables used in this study. Sample sizes are 13,477 for mothers and 13,424 for partners.

Figure 3

Table 2. Details of covariates and whether they are assumed to be either confounders, or both confounders and mediators. For each causal path with the exposure (religiosity) and the outcome (blood donation/cooperation), we have coded as ‘no’, ‘unlikely’, ‘possibly’ or ‘yes’ depending on certainty of effect (these are qualitative judgment calls based on expert knowledge, existing literature, logical deduction or simply best guesses if no additional information was available). See the ‘Confounders’ sub-section of the Methods for additional justification and relevant literature regarding these decisions. Note that in some cases, due to a lack of data from the partners, we are using the mother's data as a proxy.

Figure 4

Figure 3. Causal graph encoding the assumptions made in Table 2 regarding confounding variables. Note that, for simplicity, hypothesised causal relations between covariates have not been displayed here. Presumed confounders are above the horizonal exposure–outcome arrow, while possible confounders/mediators with bidirectional causation between the exposure and covariate are below the horizontal arrow (and have bidirectional arrows between themselves and the exposure). The node ‘U’ denotes potential residual/unmeasured confounding.

Figure 5

Table 3. Details of auxiliary variables used for multiple imputation to impute missing data. All imputation models included the exposures, outcomes and confounders, in addition to the auxiliary variables detailed below. Note also that for partners, references to ‘mother’ refer to the study mother (i.e. the partner's partner), and not the partner's mother. As the partners have considerably more missing data, and there are a number of auxiliary variables that may provide information about this missing data, the list of auxiliary variables is much longer for partners compared with mothers.

Figure 6

Figure 4. Assumed directed acyclic graph for missing data. Missingness markers for individual variables have been indicated with a prefix ‘M_’, while ‘M_Model’ denotes overall missingness in the complete-case analysis model (with a box around it, indicating that the complete-case analysis is conditional on this). For simplicity, all covariates in the substantive analysis model from Table 2 have been grouped together, as have the mediating variables of alcohol use, smoking and depression. Missing exposure data which may depend on the exposure (religiosity), and missing outcome data which may depend on the outcome (cooperation/blood donation), have been represented via dashed arrows. If this causal graph excluding the dashed arrows represents the true causal structure of the missing data, then multiple imputation with the mediators as auxiliary variables in the imputation model (in addition to the exposures, outcomes and covariates, plus other auxiliary variables (not displayed here) to predict missing data in Table 3) ought to meet the Missing-At-Random assumption, meaning that analyses using the imputed data may not be biased. However, if this causal graph including the dashed arrows represents the true causal structure of the missing data, then multiple imputation using all variables mentioned above will not meet the Missing-At-Random assumption, meaning that analyses using data imputed via standard multiple imputation will still be biased. This is because both the exposure and the outcome are associated with their own missingness. In this scenario, we can use further sensitivity analyses to explore how different Missing-Not-At-Random assumptions regarding the missing data of the exposure and outcome impact our conclusions.

Figure 7

Table 4. Summary of analyses. Where appropriate, all quantitative bias analyses (E-value, generalised sensitivity analysis, multiple imputation and Not-At-Random multiple imputation) were repeated using both ‘confounders only’ and ‘confounders and/or mediators’ adjusted models (see Table 2). All analyses were repeated for mothers and partners.

Figure 8

Figure 5. Results of the complete-case (black) and multiple imputation (red) analyses for each religiosity exposure with blood donation as the outcome for mothers. The ‘confounders only’ scenario adjusts only for assumed confounders, while the ‘confounders and/or mediators’ scenario adjusts for both assumed confounders and variables which may be both confounders and mediators (see Table 2 and Figure 3). For differences in the probabilities of donating blood based on these models, see Figure S1.

Supplementary material: PDF

Major-Smith supplementary material

Tables S1-S14 and Figures S1-S12

Download Major-Smith supplementary material(PDF)
PDF 1.8 MB