Hostname: page-component-89b8bd64d-nlwjb Total loading time: 0 Render date: 2026-05-07T11:45:35.386Z Has data issue: false hasContentIssue false

Generalizing Trimming Bounds for Endogenously Missing Outcome Data Using Random Forests

Published online by Cambridge University Press:  26 June 2025

Cyrus Samii*
Affiliation:
Department of Politics, New York University, New York, USA
Ye Wang
Affiliation:
Department of Political Science, University of North Carolina at Chapel Hill, Chapel Hill, USA
Junlong Aaron Zhou
Affiliation:
Independent Researcher, Mountain View, USA
*
Corresponding author: Cyrus Samii; Email: cds2083@nyu.edu
Rights & Permissions [Opens in a new window]

Abstract

We present a method for narrowing nonparametric bounds on treatment effects by adjusting for potentially large numbers of covariates, using generalized random forests. In many experimental or quasi-experimental studies, outcomes of interest are only observed for subjects who select (or are selected) to engage in the activity generating the outcome. Outcome data are thus endogenously missing for units who do not engage, and random or conditionally random treatment assignment before such choices is insufficient to identify treatment effects. Nonparametric partial identification bounds address endogenous missingness without having to make disputable parametric assumptions. Basic bounding approaches often yield bounds that are wide and minimally informative. Our approach can tighten such bounds while permitting agnosticism about the data-generating process and honest inference. A simulation study and replication exercise demonstrate the benefits.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of The Society for Political Methodology
Figure 0

Figure 1 Endogenous missingness.Note: The above figure shows a directed acyclic graph (DAG) representing an endogenous missingness mechanism. White nodes represent observable variables and black nodes represent unobservable ones. An arrow from one node to the other (a “path”) marks the causal relationship from the former to the latter. The heavier arrows indicate the relationships that generate the bias from conditioning on $S_i$.

Figure 1

Table 1 Principal strata.

Figure 2

Table 2 Empirical examples

Figure 3

Figure 2 Basic trimming bounds in Lee (2009).

Figure 4

Figure 3 Covariate-tightened trimming bounds in simulated data.Note: The plot on the left shows the averages of the covariate-tightened trimming bound estimates (CTB, in squares), basic trimming bounds (TB, in triangles), ATE estimate using causal forest on the non-missing sample (CF, in circles), ATE estimate using OLS on the non-missing sample (OLS, in circles), and ATE estimate using OLS re-weighted by the inverse probability of attrition (IPW, in circles). Lower bound estimates are in blue and upper bound estimates are in red. The segments represent 95% confidence intervals. The red and blue dotted lines mark the CTBs. The black dotted line represents the true ATE and the black dash dotted line represents the true ATE for the always-responders. The plot on the right shows the estimates of the conditional covariate-tightened trimming bounds across 19 evaluation points on $X_1$, holding $X_2$ to $X_{10}$ to their sample averages.

Figure 5

Figure 4 Asymptotic performance of the method.Note: The left panel shows how the MSE of the lower bound estimate (in blue) and upper bound estimate (in red) varies with sample sizes. The right panel presents the variation of the coverage rate across sample sizes. The gray line marks the nominal level of coverage, 95%.

Figure 6

Figure 5 Covariate-tightened trimming bounds in Santoro and Broockman (2022).Note: In both plots, the outcome variable is warmth toward outpartisan voters. The plot on the left shows the estimated covariate-tightened trimming bounds (CTB, in squares), basic trimming bounds under monotonicity (TB (mono.), in triangles) and under conditional monotonicity (TB (cond. mono.), in triangles), and the ATE estimate using OLS on the non-missing sample (OLS, in circles). Lower bound estimates are in blue and upper bound estimates are in red. Segments represent the 95% confidence intervals for the estimates. Black segments between the red and blue ones represent the 95% confidence intervals for the ATE for always-responders, using the Imbens–Manski approach. The plot on the right shows the estimates of the conditional covariate-tightened trimming bounds across nine evaluation points of age with other demographic attributes are fixed to sample mean or mode.

Supplementary material: File

Samii et al. supplementary material

Samii et al. supplementary material
Download Samii et al. supplementary material(File)
File 853.8 KB