Hostname: page-component-89b8bd64d-x2lbr Total loading time: 0 Render date: 2026-05-08T04:36:27.918Z Has data issue: false hasContentIssue false

SELF-Tree: An Interpretable Model for Multivariate Causal Direction Heterogeneity Analysis

Published online by Cambridge University Press:  10 December 2025

Zhifei Li
Affiliation:
Collaborative Innovation Center of Assessment for Basic Education Quality, Beijing Normal University , China
Hongbo Wen*
Affiliation:
Collaborative Innovation Center of Assessment for Basic Education Quality, Beijing Normal University , China
*
Corresponding author: Hongbo Wen; Email: whb@bnu.edu.cn
Rights & Permissions [Opens in a new window]

Abstract

Identifying causal directions among variables via data-driven approaches is a research hotspot. Researchers now focus on detecting causal direction heterogeneity among multiple variables (variables more than two) when covariates cause such heterogeneity. This study combines the structural equation likelihood function (SELF) method with a recursive partitioning method to achieve an interpretable model of multivariate causal direction heterogeneity in multivariable settings. Through simulation, we compared the performance of the SELF-Tree model in terms of the identification about heterogeneous causal direction under different conditions. Using a public drug consumption dataset, we demonstrated its real data application. The SELF-Tree model offers researchers a new way to understand variable causal direction heterogeneity.

Information

Type
Theory and Methods
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of Psychometric Society
Figure 0

Figure 1 An example of a directed acyclic graph.

Figure 1

Figure 2 Causal direction heterogeneity of variables under different covariate values.

Figure 2

Figure 3 The tree structure in the simulation study. Note that the left is the Moderate structure, and the right is the Extreme structure. The percentage in each leaf node indicates the theoretical proportion of samples among all participants. The DAG1, DAG2, and DAG3 are used to label the causal graph structures under different conditions.

Figure 3

Figure 4 The identification result of tree structure.

Figure 4

Figure 5 The identification result of the split point of the covariates. Note that under the Extreme condition, the tree model structure may not be accurately identified when the sample size is 500 with 10 or 15 variables, or when the sample size is 1,000 with 15 variables. Therefore, the split point identification result under these conditions is not presented in this figure.

Figure 5

Figure 6 The impact of the number of variables on the identification of heterogeneous DAGs. Each 3 × 3 heatmap’s number label $i$ corresponds to the DAG structure simulation scenario in Figure 3. The columns represent true DAGs, and rows represent the obtained DAGs based on the SELF-Tree model. Greater color contrast between the diagonal and off-diagonal areas of the heatmap indicates better recognition of heterogeneous causal directions.

Figure 6

Figure 7 The impact of indegree centrality on the identification of heterogeneous DAGs.

Figure 7

Figure 8 The impact of sample size on the identification of heterogeneous DAGs.

Figure 8

Figure 9 The identification result of tree structure with spurious covariates.

Figure 9

Figure 10 The causal direction identification of drug consumption about total participates with the following abbreviations. Amphet: amphetamine; amyl: amyl nitrite; benzos: benzodiazepine; caff: caffeine; choc: chocolate; coke: cocaine; legalh: legal high; LSD: lysergic acid diethylamide; meth: methadone; VSA: volatile substance abuse (e.g., solvents, petrol, etc.). These abbreviations apply to subsequent content as well.

Figure 10

Figure 11 The identification result about the tree structure in drug consumption with the following abbreviations. Nscore: neuroticism; ss: sensation-seeking.

Figure 11

Figure 12 The identification result about heterogeneity in every leaf node about SELF-Tree model.

Figure 12

Table 1 The difference between heterogeneous DAGs and DAG based on the overall data

Figure 13

Figure A1 The description statistics on the total samples.

Figure 14

Figure A2 The description statistics on Node 4 about the heterogeneous drug consumption patterns. Note that the “user” represents the participants who used the drug in the last 10 years, while “non-user” represents the participants who never used the drug or used it over a decade. This definition applies to subsequent mentions of user and nonuser.

Figure 15

Figure A3 The description statistics on Node 5 about the heterogeneous drug consumption patterns.

Figure 16

Figure A4 The description statistics on Node 6 about the heterogeneous drug consumption patterns.

Figure 17

Figure A5 The description statistics on Node 7 about the heterogeneous drug consumption patterns.

Supplementary material: File

Li and Wen supplementary material

Li and Wen supplementary material
Download Li and Wen supplementary material(File)
File 30.4 KB