Hostname: page-component-77c78cf97d-hf2s2 Total loading time: 0 Render date: 2026-04-24T05:01:21.588Z Has data issue: false hasContentIssue false

Testing of Reverse Causality Using Semi-Supervised Machine Learning

Published online by Cambridge University Press:  07 April 2025

Nan Zhang*
Affiliation:
Department of Management, Warrington College of Business, University of Florida, Gainesville, FL, USA
Heng Xu
Affiliation:
Department of Management, Warrington College of Business, University of Florida, Gainesville, FL, USA
Manuel J. Vaulont
Affiliation:
Management and Organizational Development Group, D’Amore-McKim School of Business, Northeastern University, Boston, MA, USA
Zhen Zhang
Affiliation:
Department of Management, Strategy and Entrepreneurship, Edwin L. Cox School of Business, Southern Methodist University, Dallas, TX, USA
*
Corresponding author: Nan Zhang; Email: zhang.nan@ufl.edu
Rights & Permissions [Opens in a new window]

Abstract

Two potential obstacles stand between the observation of a statistical correlation and the design (and deployment) of an effective intervention, omitted variable bias and reverse causality. Whereas the former has received ample attention, comparably scant focus has been devoted to the latter in the methodological literature. Many existing methods for reverse causality testing commence by postulating a structural model that may suffer from widely recognized issues such as the difficulty of properly setting temporal lags, which are critical to model validity. In this article, we draw upon advances in machine learning, specifically the recently established link between causal direction and the effectiveness of semi-supervised learning algorithms, to develop a novel method for reverse causality testing that circumvents many of the assumptions required by traditional methods. Mathematical analysis and simulation studies were carried out to demonstrate the effectiveness of our method. We also performed tests over a real-world dataset to show how our method may be used to identify causal relationships in practice.

Information

Type
Theory and Methods
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of Psychometric Society
Figure 0

Figure 1 Illustrative example for $Y \to X$.Note: Both panels depict the probability density function of $P(X)$ when $X = \alpha Y + \epsilon $, where Y follows Bernoulli distribution with $p = 0.5$ and $\epsilon \sim N(0,1)$. Note that, in either case, $P(X)$ follows a Gaussian mixture distribution with two equal-weight components, which are illustrated in red dashed lines. The mean difference between the two Gaussian components is always equal to $\alpha $, suggesting that the functional relationship between X and Y can be precisely inferred from $P(X)$.

Figure 1

Table 1 Pseudocode for reverse causality testing with continuous variables

Figure 2

Table 2 Type I error rate and statistical power of our method in the main simulation study

Figure 3

Table 3 Type I error rate of RI-CLPM and our method under STARTS model

Figure 4

Figure 2 Relationship between effectiveness of self-training and representativeness of labeled set.Note: WFC, work-family conflict; JSA, job satisfaction. Source: Swiss Household Panel (SHP).