Hostname: page-component-68c7f8b79f-wfgm8 Total loading time: 0 Render date: 2025-12-22T08:39:19.497Z Has data issue: false hasContentIssue false

Bayesian Sensitivity Analysis for Unmeasured Confounding in Causal Panel Data Models

Published online by Cambridge University Press:  22 December 2025

Licheng Liu*
Affiliation:
Political Science, University of Michigan , USA
Teppei Yamamoto
Affiliation:
Waseda University , Japan
*
Corresponding author: Licheng Liu; Email: lichengl@umich.edu
Rights & Permissions [Opens in a new window]

Abstract

Despite the recent methodological advancements in causal panel data analysis, concerns remain about unobserved unit-specific time-varying confounders that cannot be addressed by unit or time fixed effects or their interactions. We develop a Bayesian sensitivity analysis (BSA) method to address the concern. Our proposed method is built upon a general framework combining Rubin’s Bayesian framework for model-based causal inference (Rubin [1978], The Annals of Statistics 6(1), 34–58) with parametric BSA (McCandless, Gustafson, and Levy [2007], Statistics in Medicine 26(11), 2331–2347). We assess the sensitivity of the causal effect estimate from a linear factor model to the possible existence of unobserved unit-specific time-varying confounding, using the coefficients of the treatment variable and observed confounders in the model for the unobserved confounding as sensitivity parameters. We utilize priors on these coefficients to constrain the hypothetical severity of unobserved confounding. Our proposed approach allows researchers to benchmark the assumed strength of confounding on observed confounders more systematically than conventional frequentist sensitivity analysis techniques. Moreover, to cope with convergence issues typically encountered in nonidentified Bayesian models, we develop an efficient Markov chain Monte Carlo algorithm exploiting transparent parameterization (Gustafson [2005], Statistical Science 20(2), 111–140). We illustrate our proposed method in a Monte Carlo simulation study as well as an empirical example on the effect of war on inheritance tax rates.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of The Society for Political Methodology

1 Introduction

Researchers are often concerned about unobserved confounding when analyzing time-series cross-section data. To address such confounding, scholars routinely use linear outcome models with additive two-way fixed effects (TWFEs) to take advantage of the panel data structure. However, the underlying assumptions for identifying causal effects with a linear model with additive fixed effects, often heuristically understood as “parallel trends” (PTs) of the outcomes in the absence of treatment, are known to be rather restrictive (e.g., Manski and Pepper Reference Manski and Pepper2018).

Recent model-based approaches relax the assumption of PT by replacing the TWFE with an interactive factor term (e.g., Abadie, Diamond, and Hainmueller Reference Abadie, Diamond and Hainmueller2010, Reference Abadie, Diamond and Hainmueller2015). The factor term, also known as interactive fixed effects (IFEs; Bai Reference Bai2009), allows the common shocks to have heterogeneous influence on different units and incorporates the TWFE as a special case. In addition, the counterfactual-prediction-based estimators, like the synthetic control (SC) estimator, allow the treatment effect to be heterogeneous. Such causal panel data models (Athey et al. Reference Athey, Bayati, Doudchenko, Imbens and Khosravi2021) are gaining popularity in empirical social sciences research. Still, there may be an unobserved confounder U that varies both across units and time periods in a way that cannot be fully captured by either the TWFE or the IFE.

To address this problem, we develop a fully Bayesian sensitivity analysis (BSA) method. Our method is based on a general framework for causal BSA which combines Rubin’s Bayesian framework for model-based causal inference (Rubin Reference Rubin1978) with parametric BSA (McCandless, Gustafson, and Levy Reference McCandless, Gustafson and Levy2007). That is, we adopt a strategy similar to McCandless et al. (Reference McCandless, Gustafson and Levy2007) by assuming that there is an unobserved continuous confounder U and investigate how its existence affect the treatment effect estimate under a given prior belief about the magnitude of confounding by U.

The proposed methodology fits within the literature that assesses the robustness of treatment effect estimation in the presence of potential violations of PT. Rambachan and Roth (Reference Rambachan and Roth2023) propose a sensitivity analysis framework that treats observed pre-trends as informative about possible violations of PT. Ye et al. (Reference Ye, Keele, Hasegawa and Small2024) assume that the outcome trajectories of treated units are bounded by those of the set of control units. Both approaches yield partial identification of causal effects. The proposed BSA framework assesses robustness by explicitly modeling the presence of an unobserved time-varying confounder. Built on a counterfactual prediction approach that accommodates heterogeneous treatment effects, it can be viewed as an extension of the omitted variable bias (OVB) framework in Cinelli and Hazlett (Reference Cinelli and Hazlett2020) to panel data settings. Recent developments in more interpretable sensitivity analysis (e.g., Cinelli and Hazlett Reference Cinelli and Hazlett2020; VanderWeele and Ding Reference VanderWeele and Ding2017) may be adapted to causal panel settings, although doing so often involves additional structural assumptions, such as treatment effect homogeneity.

We make two important contributions in this article. First, conceptually, our proposed method addresses a key challenge that is common to most existing non-BSA techniques for unobserved confounding: the specification of “plausible” sensitivity parameter values. While there have been many successful attempts to address this in a frequentist setting—for instance, researchers can assess how large an influence unobserved confounding would need to have to explain away the estimated treatment effects (e.g., Altonji, Elder, and Taber Reference Altonji, Elder and Taber2005; Imbens Reference Imbens2003)—existing techniques typically leave such specification up to the discretion of the researcher, providing little systematic guidance as to how to set those parameter values. BSA, on the other hand, provides a principled solution to this problem via a formally justified procedure to incorporate the researcher’s prior belief about those values. That is, in BSA, a researcher would begin with a prior distribution on the sensitivity parameter values and the results would be summarized using the posterior distribution of the sensitivity parameter, e.g., by averaging the causal effect estimates over the posterior on the sensitivity parameter.

In many BSA methods, however, this advantage remains only theoretical, as they also provide little guidance on how to specify such priors. In contrast, our proposed approach adopts the approach of Gustafson et al. (Reference Gustafson, McCandless, Levy and Richardson2010) and borrows information from the observed covariates to make an informed choice about the sensitivity parameter prior. Specifically, we assign priors to a subset of sensitivity parameters with restrictions to guarantee that the dispersion of the unobserved confounder is comparable to that of observed confounders. Then, we assign exchangeable priors to the coefficients of both unobserved and observed confounders. We view this as a Bayesian analog of “benchmarking” sensitivity parameters by observed pre-treatment confounders, as commonly done in frequentist approaches to make results of sensitivity analysis more interpretable (Cinelli and Hazlett Reference Cinelli and Hazlett2020; Imbens Reference Imbens2003).

Second, computationally, BSA often runs into convergence issues because the model with sensitivity parameters is by design nonidentified. To cope with this problem, we develop an efficient Markov chain Monte Carlo (MCMC) algorithm by utilizing the transparent parameterization proposed by Gustafson (Reference Gustafson2005). Moreover, the shrinkage prior assigned to the coefficient of unobserved confounder induces substantial indirect learning of the coefficient given data, making results based on BSA significantly different from those based on Monte Carlo sensitivity analysis (MCSA; Greenland Reference Greenland2003), a procedure that does not involve the Bayesian updating of sensitivity parameters.

The rest of this article is organized as follows: in Section 2, we introduce our general framework for causal BSA, as well as the details of our proposed methodology; in Section 3, we illustrate details of computational issues; in Section 4, we conduct Monte Carlo studies; and in Section 5, we apply the proposed method to a real-world example. The last section concludes.

2 Methodology

2.1 The General Framework

Researchers often rely on unconfoundedness assumptions to infer causality from observational data, where they do not directly control the treatment assignment mechanism. A canonical example of such an assumption, often called conditional ignorability, posits that the treatment assignment is independent of the potential outcomes given a set of observed pre-treatment confounders, i.e.,

(1)

where $D_i \in \{0,1\}$ is the treatment assigned to unit i, $ Y_{i}(d) $ is unit i’s potential outcome under treatment $d \in \{0,1\}$ , and $X_i$ is a vector of observed pre-treatment covariates. In many real-world applications, however, this assumption may not hold. A typical example is observational studies with non-randomized treatment assignment, where researchers cannot rule out the possibility of unobserved confounders correlated with both the treatment and potential outcomes. Another example, which we focus on in this article, is longitudinal data where researchers collect repeated observations for each unit. In such data, unobserved unit-level heterogeneity in the potential outcomes and time-varying common shocks to the outcomes may also affect the adoption of treatment.

A common approach to addressing the violation of an unconfoundedness assumption is sensitivity analysis. Specifically, instead of the relationship (1), it is assumed that there is an unobserved confounder $ U_i $ such that

(2)

which is also known as “latent ignorability” (Rubin Reference Rubin1976). Various sensitivity analysis methods have been developed on the basis of assumptions similar to (2) (e.g., Imbens Reference Imbens2003; Oster Reference Oster2019; VanderWeele and Ding Reference VanderWeele and Ding2017). In this article, we propose a fully Bayesian approach, which directly builds upon the Bayesian causal inference framework of Rubin (Reference Rubin1978) (see also Imbens and Rubin Reference Imbens and Rubin1997; Rubin et al. Reference Rubin, Wang, Yin and Zell2008). Specifically, we consider causal inference to be the problem of imputing missing potential outcomes ( $Y_{mis}$ ) given observed data (D, X and $Y_{obs}$ ) as well as the maintained assumptions. Under latent ignorability, we have

(3) $$ \begin{align} \begin{aligned} &Pr(Y_{mis} | D, X, Y_{obs}) \\& \quad =\frac{Pr(D, X, Y(0), Y(1))}{Pr(D, X, Y_{obs})} \\& \quad \propto Pr(D, X, Y(0), Y(1)) \\& \quad =\int Pr(D, X, U, Y(0), Y(1)) dU \\& \quad =\int Pr(D | X, U, Y(0), Y(1)) Pr(X, U, Y(0), Y(1)) dU \\& \quad = \int Pr(D | X, U) Pr(X, U, Y(0), Y(1)) dU, \end{aligned} \end{align} $$

where the last equality holds because of (2). Note that $ Pr(D | X, U) $ is the treatment assignment mechanism and $ Pr(X, U, Y(0), Y(1)) $ is what Rubin et al. (Reference Rubin, Wang, Yin and Zell2008) call the “science.” Thus, the posterior predictive distribution of the missing outcome $Pr(Y_{mis} | D, X, Y_{obs})$ is proportional to the treatment assignment mechanism multiplied by the underlying science, marginalizing over the unobserved confounder. Many existing sensitivity analysis techniques implicitly utilize this general relationship with additional parametric assumptions for $ Pr(D | X, U) $ and $ Pr(X, U, Y(0), Y(1))$ (e.g., Imbens Reference Imbens2003).

Here, we pursue an alternative factorization which establishes an explicit connection between Rubin’s Bayesian causal inference framework and the Bayesian, non-counterfactual framework for sensitivity analysis (McCandless et al. Reference McCandless, Gustafson and Levy2007). That is, we have

(4) $$ \begin{align} &Pr(Y_{mis} | D, X, Y_{obs}) \nonumber\\& \quad \propto \int Pr(D | X, U) Pr(X, U, Y(0), Y(1)) dU \nonumber\\& \quad = \int \frac{Pr(D, X, U) }{Pr(X, U) } Pr(X, U, Y(0), Y(1)) dU \\& \quad = \int Pr(U|D, X) Pr(D, X) \frac{Pr(X, U, Y(0), Y(1))}{Pr(X, U)} dU \nonumber\\& \quad \propto \int Pr(U|D, X) Pr(Y(0), Y(1)|X, U) dU, \nonumber \end{align} $$

suggesting an alternative procedure for inference based on the imputation of the latent variable U as a function of D and X, as well as a model for the potential outcomes. Indeed, the standard BSA framework utilizes this imputation-based strategy for inference (Gustafson et al. Reference Gustafson, McCandless, Levy and Richardson2010; McCandless et al. Reference McCandless, Gustafson and Levy2007), first imputing U using a parametric model for $Pr(U|D, X)$ , and then including the imputed values of U in an outcome model $f(Y_{obs}|D, U, X)$ . That is, a model for the observed data $(Y_{obs}, D, X)$ can be obtained by integrating U out:

(5) $$ \begin{align} \begin{aligned} Pr(Y_{obs}, D, X) &\propto Pr(Y_{obs}| D, X) \\&=\int Pr(Y_{obs}, U|D, X) dU \\&\propto \int Pr(U|D, X) Pr(Y_{obs}|D, X, U) dU. \end{aligned} \end{align} $$

Comparing (5) to the posterior predictive distribution (4) reveals that the standard BSA framework can be used for sensitivity analysis with respect to causal quantities involving missing potential outcomes. That is, a latent ignorability assumption, such as (2), implies sufficient constraints on the joint potential outcome distribution conditional on X and U so that a causal quantity of interest is identified by the distribution of the observed outcome $Y_{obs}$ conditional on D, $X,$ and U.

2.2 Setup and Notation

Now, we set up our notational framework. Suppose that we observe a panel data set that consists of N units and spans over T periods, with $i = 1, 2, \ldots , N$ and $t = 1, 2, \ldots , T$ indexing unit and time, respectively. We are interested in the effect of a binary treatment indicator, $D_{it} \in \{0, 1 \} $ , on some outcome variable $Y_{it} \in \mathbb {R}$ . For notational convenience, we assume that the panel data set is balanced, i.e., there are no missing observations. We adopt the potential outcomes framework (Rubin Reference Rubin1974) to define the causal quantities of interest. We denote by $\boldsymbol {D}_i = (D_{i1}, D_{i2}, \ldots , D_{iT}) $ the full treatment assignment vector for unit i and $\boldsymbol {D}_i^t = (D_{i1}, D_{i2}, \ldots , D_{it})$ the treatment assignment vector up to time t for unit i. $Y_{it} (\boldsymbol {d}_i) $ and $Y_{it}(\boldsymbol {d}^{\prime }_i)$ are two different potential outcomes for unit i in period t given treatment assignment vectors $\boldsymbol {d}_i = (d_{i1}, d_{i2}, \ldots , d_{iT}) $ and $\boldsymbol {d}^{\prime }_i = (d^{\prime }_{i1}, d^{\prime }_{i2}, \ldots , d^{\prime }_{iT}) $ . We make the following assumptions.

Assumption 1 No anticipation of future treatments

Treatment assignment in the future does not affect current potential outcomes, i.e.,

(6) $$ \begin{align} Y_{it} (\boldsymbol{d}_i) = Y_{it} (\boldsymbol{d}^{\prime}_i) \quad \text{if} \quad \boldsymbol{d}_i^t = \boldsymbol{d}_i^{'t} \end{align} $$

for any $i\in \{1,\ldots ,N\}$ and $t \in \{1, \ldots , T\}$ .

Assumption 2 No carryover effects

Past treatment assignments do not affect current potential outcomes, i.e.,

(7) $$ \begin{align} Y_{it} (\boldsymbol{d}_i^t) = Y_{it} (\boldsymbol{d}_i^{'t}) \quad \text{if} \quad d_{it} = d^{\prime}_{it} \end{align} $$

for any $i\in \{1,\ldots ,N\}$ and $t\in \{1,\ldots ,T\}$ . Footnote 1

Under these assumptions, we can re-write the potential outcome as $ Y_{it} (\boldsymbol {d}_i) = Y_{it} (d_{it}) $ , and there are two potential outcomes $Y_{it}(1)$ and $Y_{it}(0)$ for each unit in each period. The individual treatment effect is defined as the difference in potential outcomes:

(8) $$ \begin{align} \delta_{it} = Y_{it}(1) - Y_{it}(0). \end{align} $$

We focus on the identification and estimation of the average treatment effect on the treated (ATT), defined as $ ATT = \frac {\sum _{i,t} D_{it} \delta _{it} }{\sum _{i,t} D_{it} } $ . The results also apply to other convex combinations of individual effects, like ATT in a specific period relative to the occurrence of the treatment.

We consider three types of unobserved confounders: unit-specific time-invariant confounders $\gamma _i$ , time-specific unit-invariant confounders $f_t$ , and confounders that vary both across units and time $U_{it}$ . Below, we will develop a method for sensitivity analysis based on the following latent ignorability assumption with respect to these unobserved confounders.

Assumption 3 Panel latent ignorability

(9)

for $i\in \{1,\ldots ,N\}$ and $t\in \{1,\ldots ,T\}$ , where $X_{it}$ is a vector of observed confounders.

2.3 Causal Panel Data Models

The panel latent ignorability assumption implies that we need to control for observed confounders $X_{it}$ as well as the unobserved confounders $\gamma _i$ , $f_t$ , and $U_{it}$ to identify the effect of $D_{it}$ on $Y_{it}$ . Allowing $U_{it}$ to be time-varying enables the model to capture both unobserved time-varying confounders and the time-varying influence of time-invariant confounders, making it a more flexible specification. Suppose the observed confounder $X_{it}$ is a vector of length P, i.e., $ X_{it} = (X_{it,1}, X_{it,2}, \ldots , X_{it,P}) $ , where each element $X_{it,p}$ is normalized to have zero mean and unit variance. Existing literature has proposed both design-based approaches like doubly-robust estimator (Arkhangelsky and Imbens Reference Arkhangelsky and Imbens2022) and model-based approach to counterfactual outcome prediction. The latter is widely adopted in empirical studies, where scholars assume a parametric model for the potential outcomes. A general functional form for the observed outcome model is written as:

(10) $$ \begin{align} Y_{it} = g(D_{it}, X_{it}, \gamma_i, f_t, U_{it}, \varepsilon_{it}), \end{align} $$

where $\varepsilon _{it}$ is an error term with zero mean. In empirical studies, scholars add more constraints on the functional form to identify causal parameters despite the presence of unobserved confounding. For example, Pang, Liu, and Xu (Reference Pang, Liu and Xu2022) assume that the observed terms in $g(\cdot )$ are additive and that unobserved confounding can be represented by functions of $\gamma _i$ and $f_t$ , such as the TWFE and the latent factor term or IFE in the SC methods. With a statistical model for potential outcomes, scholars can predict the counterfactual outcomes under control for the treated observations and then average the predicted differences between the observed outcomes and predicted counterfactuals as an estimate for the ATT. Such counterfactual prediction-based approaches are called “causal panel data models” in Athey et al. (Reference Athey, Bayati, Doudchenko, Imbens and Khosravi2021).

Here, we focus on a linear model incorporating $\gamma _i$ and $f_t$ by way of a two-way interaction term or a latent factor term:

(11) $$ \begin{align} \begin{gathered} Y_{it} = \delta_{it} D_{it} + \beta_0 + X_{it}' \beta + \gamma_i^{\prime}f_t + \varepsilon_{it}, \\ \gamma_i | \boldsymbol{\Omega}_0 \sim \mathcal{N}(0, \boldsymbol{\Omega}_0), \quad f_t \sim \mathcal{N}(0, I_r), \quad \varepsilon_{it} \sim \mathcal{N}(0, \sigma^2), \end{gathered} \end{align} $$

where $\beta _0$ is the global intercept and $\beta = (\beta _1, \beta _2, \ldots , \beta _P) $ is the slope coefficients for $X_{it}$ . In this setup, $\gamma _i$ can be regarded as an ( $r \times 1$ ) vector of factor loadings and $f_t$ an ( $r \times 1$ ) vector of common factors. The number of factors r is unknown by the researcher a priori. The functional form above implies that $ Y_{it}(0) = \beta _0 + X_{it}' \beta + \gamma _i^{\prime }f_t + \varepsilon _{it} $ , i.e., we assume a latent factor model (LFM) for counterfactual outcomes under control. From a Bayesian perspective, Model (11) is a static variant of the state space model (Klinenberg Reference Klinenberg2022; Pang et al. Reference Pang, Liu and Xu2022). Following the literature of Bayesian factor analysis, we assume that the factor loadings are drawn from an i.i.d. multivariate normal distribution $\mathcal {N}(0, \boldsymbol {\Omega }_0)$ , where $ \boldsymbol {\Omega }_0 = Diag\{\omega _1^2, \ldots , \omega _r^2 \} $ is an ( $r \times r$ ) diagonal matrix, and that common factors are drawn from an i.i.d. multivariate normal distribution $\mathcal {N}(0, I_r)$ , where $I_r$ is the identity matrix. The error term $\varepsilon _{it}$ is drawn from i.i.d. normal distribution with mean 0 and variance $\sigma ^2$ . Model (11) entails the assumption that the panel latent ignorability (9) holds without conditioning on $U_{it}$ , i.e., unobserved confounding that varies both over units and time can be fully captured by the r-dimensional factor term $\gamma _i'f_t$ .

It is worth noting that there is an extensive literature on estimating Model (11) using both frequentist and Bayesian approaches.Footnote 2 While the Bayesian approach offers greater flexibility in estimation, it relies on additional distributional assumptions for the factor term and the error term. In the frequentist approach, the factor term $\gamma _i{\prime }f_t$ , also known as “interactive fixed effects” (Bai Reference Bai2009), can be used to address endogeneity for the identification of coefficients $\beta $ , when some covariates in $X_{it}$ are endogenous. Additionally, instead of assuming that error terms are drawn from an independent standard normal distribution, they can be serially correlated within a unit.

In this article, we focus on BSA for treatment effects and thus assume independence between the factor term and the observed covariates, with i.i.d. error terms. However, Model (11) can be extended to incorporate endogenous covariates by directly modeling the relationship between $X_{it}$ and $\gamma _i{\prime }f_t$ . For the error terms, while we assume that $\varepsilon _{it}$ are i.i.d., the factor term, which includes unit random effects as a special case, provides a way to account for within-unit correlation (Cameron and Miller Reference Cameron and Miller2015). More complex structures for the error term, such as stochastic volatility (Belmonte, Koop, and Korobilis Reference Belmonte, Koop and Korobilis2014), can also be incorporated.

Determining the number of factors is a frequently discussed issue in the literature on estimating LFMs like (11). The prior variances for factor loadings, $\{ \omega _1^2, \ldots , \omega _r^2 \}$ , which control the effective number of factors, play a key role in the Bayesian analysis of factor models. For example, if $\omega _1^2 = 0$ , then the factor loading that corresponds to the first factor equals 0 for each unit, and thus it has no effect on the potential outcomes and is essentially excluded from the model. However, a well-known challenge in specifying prior for variance is that the conjugate inverse gamma distribution performs poorly for the purpose of selection of factor numbers. Since the support of the inverse gamma distribution includes only positive real numbers, even a conventional vague inverse gamma prior exerts a substantial influence on the posterior of $\omega ^2$ when its true value is close to 0 (Frühwirth-Schnatter and Wagner Reference Frühwirth-Schnatter and Wagner2010). There are two main approaches to this problem, both based on Bayesian shrinkage: spike-and-slab prior and continuous shrinkage prior. In this article, we adopt the continuous shrinkage approach. We follow Frühwirth-Schnatter and Wagner (Reference Frühwirth-Schnatter and Wagner2010) and re-parameterize the factor loadings as follows:

(12) $$ \begin{align} \begin{gathered} y_{it} = \beta_0 + \delta_{it} D_{it} + X_{it}^{\prime}\beta + (\omega_{\gamma} \cdot \tilde{\gamma}_i)^{\prime} f_t + \varepsilon_{it}, \\ \tilde{\gamma}_i \sim \mathcal{N}(0, I_r), \quad f_t \sim \mathcal{N}(0, I_r), \quad \varepsilon_{it} \sim \mathcal{N}(0, \sigma^2), \end{gathered} \end{align} $$

where the factor loadings $ \gamma _i $ in model (11) are re-parameterized as the product of $ \omega _{\gamma } $ and $\tilde {\gamma }_i$ . In model (12), $\tilde {\gamma }_i$ is a vector of “standardized” factor loadings with each entry drawn from i.i.d. standard normal distribution and $\omega _\gamma $ is now interpreted as a vector of coefficients controlling the influence of each factor. If the $k^{th}$ element in $\omega _\gamma $ equals $0$ , then the $k^{th}$ factor is excluded from the model. The re-parameterization allows the use of a prior distribution for $\omega _\gamma $ that contains 0 in the interior of its support, enabling model selection by shrinkage.Footnote 3

2.4 Bayesian Sensitivity Analysis

Although the factor term in model (11) is highly flexible, it may not fully capture the unobserved confounding that varies across units and time. To address this common concern, we develop a sensitivity analysis method for treatment effect estimation under Assumption 3, where the existence of general confounding varying across units and time is permitted. As shown above in terms of a general setting (4) and (5), Assumption 3 enables imputation-based sensitivity analysis for causal effects by jointly modeling $Y_{it}$ conditional on ( $D_{it}$ , $X_{it}$ , $\gamma _i$ , $f_t$ , $U_{it}$ ) and $U_{it}$ conditional on ( $D_{it}, X_{it}, \gamma _i, f_t$ ). Specifically, we expand Model (12) to incorporate $U_{it}$ in the model:Footnote 4

(13) $$ \begin{align} \begin{gathered} Y_{it} = \beta_0 + \delta_{it} D_{it} + X_{it}^{\prime}\beta + \beta_u U_{it} + (\omega_{\gamma} \cdot \tilde{\gamma}_i)^{\prime} f_t + \varepsilon_{it}, \\ U_{it} = \lambda_0 + \lambda_d D_{it} + X_{it}^\prime \lambda_x + \lambda_f (\omega_{\gamma} \cdot \tilde{\gamma}_i)^{\prime} f_t + \nu_{it}, \\ \tilde{\gamma}_i \sim \mathcal{N}(0, I_r), \quad f_t \sim \mathcal{N}(0, I_r), \\ \varepsilon_{it} \sim \mathcal{N}(0, \sigma^2), \quad \nu_{it} \sim \mathcal{N}(0, c_1^2), \end{gathered} \end{align} $$

where the coefficient $\beta _u$ measures the effect of $U_{it}$ on the outcome of interest. We follow common practice (e.g., Gustafson et al. Reference Gustafson, McCandless, Levy and Richardson2010) to model $U_{it}$ as a linear function of the treatment indicator $D_{it}$ , observed confounders $X_{it}$ , as well as the factor terms $(\omega _{\gamma } \cdot \tilde {\gamma }_i)^{\prime } f_t$ , with their associated coefficients denoted by $\boldsymbol \lambda \equiv [\lambda _0, \lambda _d, \lambda _x^\prime , \lambda _f]^\prime $ . The variance of the normal error term $\nu _{it}$ equals $c_1^2$ , a fixed constant specified by the researcher. We call Model (13) “Bayesian sensitivity analysis for the linear factor model” (BSA-LFM).

$\boldsymbol \lambda $ and $\beta _u$ represent our sensitivity parameters and deserve several additional remarks. First, as discussed in the next section, we specify the prior for each element in $\boldsymbol \lambda $ to be symmetric and centered at zero, indicating that the grand mean of the unobserved confounder is $0$ a priori. Second, existing methods for sensitivity analysis often assume that the unobserved confounder is only determined by the treatment indicator to simplify the analysis (e.g., Lin, Psaty, and Kronmal Reference Lin, Psaty and Kronmal1998). However, the assumption that $U_{it}$ is independent of the observed covariates and the factor term conditional on treatment is often unrealistic and has been challenged by VanderWeele (Reference VanderWeele2008).Footnote 5 In addition, the common correlations of the outcome and the unobserved confounder imply additional learning (Gustafson et al. Reference Gustafson, McCandless, Levy and Richardson2010). That is, while $\beta _u$ and $\lambda _d$ are the only two sensitivity parameters that affect the estimation of the treatment effect,Footnote 6 allowing non-zero correlation of $U_{it}$ with observed confounders as well as the factor terms can help narrow the inference. If at least one element in $\lambda _x$ or $\lambda _f$ is not equal to 0, we can achieve substantive updating in $\beta _u$ given the data. Furthermore, allowing for correlation between observed covariates and unobserved confounders implies that some covariates may be endogenous (Diegert, Masten, and Poirier Reference Diegert, Masten and Poirier2022), which is a weaker condition than treating all covariates as exogenous.

Finally, it is important to note that we must assume some correlation between $\lambda _0$ and $\lambda _d$ . Since Model (13) allows for arbitrary heterogeneous treatment effects, like most counterfactual imputation methods, only observations under control ( $D_{it} = 0$ ) are used for model fitting. For these observations, Model (13) reduces to:

(14) $$ \begin{align} \begin{gathered} Y_{it|D_{it} = 0} = \beta_0 + X_{it}^{\prime}\beta + \beta_u U_{it} + (\omega_{\gamma} \cdot \tilde{\gamma}_i)^{\prime} f_t + \varepsilon_{it}, \\ U_{it|D_{it} = 0} = \lambda_0 + X_{it}^\prime \lambda_x + \lambda_f (\omega_{\gamma} \cdot \tilde{\gamma}_i)^{\prime} f_t + \nu_{it}. \end{gathered} \end{align} $$

Thus, we can only infer $\lambda _0$ directly from the data and indirectly estimate $\lambda _d$ through its assumed correlation with $\lambda _0$ . Details of the prior specifications are provided below.

2.5 Prior Specification

Prior specification plays a key role in BSA. Because data contain no direct information about unobserved confounders, posteriors of sensitivity parameters ( $\boldsymbol \lambda $ and $\beta _u$ ) are dominated by priors assigned to them. In fact, priors for the sensitivity parameters encode the researcher’s a priori belief about the direction and magnitude of unobserved confounding. Such beliefs are typically incorporated in frequentist sensitivity analysis only partially and informally by fixing the sensitivity parameters to particular values. In contrast, our Bayesian approach provides a more formal, principled way to represent researcher’s belief about confounding in the form of a probability distribution.

From a practical standpoint, the choice of a prior distribution for sensitivity parameters entails a bias–variance trade-off. An overly diffuse prior would result in a credible interval that has low power to detect a non-zero treatment effect, while a prior with high precision risks bias that never disappears asymptotically. Here, we adopt the approach proposed by McCandless, Gustafson, and Levy (Reference McCandless, Gustafson and Levy2008), which makes use of observed confounders to benchmark our prior on the effects of unobserved confounders. That is, we assume that the confounding effects of $U_{it}$ and $X_{it, j}$ ( $j \in \{1, \ldots , p\}$ ) are comparable a priori after standardization, in the sense that $\beta _u$ and $\beta $ have exchangeable prior distributions. Formally, we have

(15)

where the second line in (15) is by de Finetti’s theorem, and the first line adds several constraints on the prior distributions of sensitivity parameters $\boldsymbol \lambda $ as well as the variance of $\nu _{it}$ . We assign conjugate normal priors to $\boldsymbol {\lambda }$ with constraints on prior variancesFootnote 7 and exchangeable priors to $\beta _u$ and $\beta $ . More specifically, our prior on $\boldsymbol {\lambda }$ is specified as follows:

  • $\lambda _0$ and $\lambda _d$ :

    (16) $$ \begin{align} \begin{gathered} (\lambda_0, \lambda_d)' \sim \mathcal{N}(0, \Lambda), \\ \Lambda = \begin{pmatrix} c_2^2, -c_3^2/2\\ -c_3^2/2, c_3^2 \end{pmatrix}, \end{gathered} \end{align} $$
    where the hyperparameters are constrained such that $0<c_2\leq 1$ , $c_3> 0,$ and $ 2c_2> c_3 $ to guarantee positive definiteness of the covariance matrix.
  • $\lambda _x$ :

    (17) $$ \begin{align} \lambda_x \sim \mathcal{N}\left(0, \frac{c_4^2}{p} I_p\right). \end{align} $$
  • $\lambda _f$ :

    (18) $$ \begin{align} \lambda_f \sim \mathcal{N} \left(0, \frac{c_5^2(k_1-1)}{2rk_2}\right), \end{align} $$

where $k_1$ and $k_2$ are hyperparameters for $\omega _\gamma $ defined below. Taken together, our prior specification for $\boldsymbol {\lambda }$ implies the following proposition.Footnote 8

Proposition 1 Prior variance of unobserved confounder

Priors (16)–(18) imply that the prior variance of $U_{it}$ is independent of treatment status $D_{it}$ and given by,

(19) $$ \begin{align} Var(U_{it}|D_{it}) = Var(U_{it}) = c_1^2 + c_2^2 + c_4^2 + c_5^2. \end{align} $$

A formal proof of Proposition 1 can be found in Appendix 1 of the Supplementary Material. Since we standardize the observed and unobserved confounders to have unit variance, Proposition 1 implies the following constraint on the hyperparameters:

(20) $$ \begin{align} c_1^2 + c_2^2 + c_4^2 + c_5^2 = 1. \end{align} $$

Now, we turn to parameters in the outcome model and prior variances of factor loadings. There are several sensible options for exchangeable priors assigned to $\beta $ and $\beta _u$ . Of particular note, while an independent, weakly informative normal prior is perhaps the most intuitive choice, Gustafson et al. (Reference Gustafson, McCandless, Levy and Richardson2010) use the i.i.d. t distribution with a certain degree of freedom, noting that the latter often yields substantially narrower inferences than a normal independence prior with the same marginal variability. This prior has a hierarchical form: the priors for $ \beta _0, \beta _1, \ldots , \beta _p, \beta _u $ are i.i.d. $N(0, \tau ^2 S)$ given S, and the hyperparameter S is drawn from inverted-gamma distribution $S \sim Inv-Gamma(d/2, d/2)$ .

Here, we propose a more flexible specification, allowing each $\beta _k$ and $\beta _u$ to have heterogeneous prior variances drawn from a multi-level model. Specifically, we use the Laplacian prior (Park and Casella Reference Park and Casella2008), which is a continuous shrinkage prior that have heavier tails than the t distribution and more densities near $0$ to “let the data speak for itself.”

  • $ \beta _j $ for $ j \in \{0, 1, \ldots , p \} $ :Footnote 9

    (21) $$ \begin{align} \begin{gathered} \beta_j | \tau_{\beta_j}^2 \sim \mathcal{N}(0, \tau_{\beta_j}^2), \\ \tau_{\beta_j}^2 | \xi_{\beta} \sim Exp(\frac{\xi_{\beta}^2}{2}), \\ \xi_{\beta}^2 \sim Gamma(a_1, a_2). \end{gathered} \end{align} $$
  • $ \beta _u $ :

    (22) $$ \begin{align} \begin{gathered} \beta_u | \tau_{\beta_u}^2 \sim \mathcal{N}(0, \tau_{\beta_u}^2), \\ \tau_{\beta_u}^2 | \xi_{\beta} \sim Exp(\frac{\xi_{\beta}^2}{2}), \\ \xi_{\beta}^2 \sim Gamma(a_1, a_2). \end{gathered} \end{align} $$

We also assign the Laplacian prior to $\omega _\gamma $ for the purpose of factor number selection.

  • $ \omega _{\gamma _j} $ for $ j \in \{1, \ldots , r \} $ :

    (23) $$ \begin{align} \begin{gathered} \omega_{\gamma_j} | \tau_{\gamma_j}^2 \sim N(0, \tau_{\gamma_j}^2), \\ \tau_{\gamma_j}^2 | \xi_{\gamma} \sim Exp(\frac{\xi_{\gamma}^2}{2}), \\ \xi_{\gamma}^2 \sim Gamma(k_1, k_2). \end{gathered} \end{align} $$

Finally, we use the conventional inverted-gamma prior for the error variance $\sigma ^2$ .

  • $ \sigma _\varepsilon ^2 $ :

    (24) $$ \begin{align} \sigma^{2} \sim Inv-Gamma(e_1, e_2). \end{align} $$

Taken together, our model for sensitivity analysis requires 11 hyperparameter values to be set by the researcher, subject to the constraints as specified above. Appropriately setting hyperparameter values is often a challenging task in implementing Bayesian methods in practice, particularly for complex models. Fortunately, sensible default values that approximate a lack of prior knowledge are available for most of the hyperparameters in our model, including those related to the error variance, coefficients in the outcome model, and factor selection. This substantially simplifies the task of specifying priors in applied research.

Table 1 summarizes such default values for these hyperparameters, along with their substantive interpretations. For the prior variances of the sensitivity parameters $\lambda $ s, we impose constraints to keep their scales small while allowing $\beta _u$ to be potentially large. In practice, selecting these values may require some prior knowledge. For instance, if researchers believe that $U_{it}$ is informative given $X_{it}$ or the factor terms, they may choose relatively large values for $c_4^2$ or $c_5^2$ . Otherwise, they may favor a larger random component, reflected in a larger $c_1^2$ .

Table 1 Suggested default specifications for the hyperparameters.

3 Computation

Suppose the true data-generating process (DGP) is model (13) but we fit the LFM in model (12) without the unobserved confounder $U_{it}$ . Then, the bias for the ATT estimate is represented by the product $\beta _u \lambda _d$ . Frequentist sensitivity analysis usually sets a range of values for $\beta _u$ and $\lambda _d$ and investigates how large their values must be to explain away the estimated treatment effect (Imbens Reference Imbens2003; Rosenbaum and Rosenbaum Reference Rosenbaum and Rosenbaum2002). By contrast, BSA assigns priors to all parameters and makes inference on the posterior distribution. Model (12) implies a joint distribution of the observed outcome $Y_{it}$ and the unobserved confounder $U_{it}$ , and thus it seems appealing to apply the MCMC algorithm to the joint posterior distributions of parameters as well as $U_{it}$ . However, since none of the elements in the parameter vector $\theta = \{ \beta _0, \beta , \beta _u, \Lambda , F, \lambda _0, \lambda _d, \lambda _x, \lambda _f, \sigma ^2 \}$ is known, the model is nonidentified. For example, define $ \tilde {\theta } = \{ \beta _0, \beta , \beta _u/2, \Lambda , F, 2\lambda _0, 2\lambda _d, 2\lambda _x, 2\lambda _f, \sigma ^2 \} $ , an alternative parameter vector. Then, $f(Y|\theta ) = f(Y|\tilde {\theta })$ .

The non-identifiability of such a model results in practical issues of simulation-based model fitting, like the MCMC. For example, Gelfand and Sahu (Reference Gelfand and Sahu1999) argue that with flat priors, the trajectories of the Markov chain for some parameters tend to drift to extreme values, causing poor mixing of Markov chains and make them hard to converge. In this article, we address the issue of computation by adopting the “transparent parameterization” of the original nonidentified model (13) to improve the efficiency of the MCMC sampler, as recommended by Gustafson (Reference Gustafson2005).

3.1 Transparent Parameterization of the Nonidentified Model

We begin by briefly summarizing the general idea of transparent parameterization proposed by Gustafson (Reference Gustafson2005). Suppose the original parameter vector in the nonidentified model is $\theta $ . we reparameterize $\theta $ to new parameter vector $\phi \equiv \{ \phi _I, \phi _N \} $ such that $ f(\text {data}|\phi ) = f(\text {data}|\phi _I) $ . Priors assigned to $\phi $ can be factorized as $ f(\phi ) = f(\phi _N|\phi _I)f(\phi _I) $ , and the corresponding posteriors are:

(25) $$ \begin{align} \begin{gathered} f(\phi_I|\text{data}) \propto f(\text{data}|\phi_I) f(\phi_I), \\ f(\phi_N|\text{data}) = \int f(\phi_N|\phi_I) f(\phi_I|\text{data}) d\phi_I. \end{gathered} \end{align} $$

The posteriors (25) imply qualitatively different asymptotic behaviors for $\phi _I$ and $\phi _N$ . That is, direct Bayesian learning occurs for $\phi _I$ , in that under regularity conditions, $f(\phi _I|\text {data})$ converges to the point probability mass at the true value of $\phi _I$ as sample size grows, and indirect Bayesian learning for $\phi _N$ , which converges to the conditional distribution $f(\phi _N|\phi _I)$ at the true value of $\phi _I$ . This, in turn, implies that $\phi _I$ and $\phi _N$ will also exhibit distinct behaviors in an MCMC sampler.

Gustafson et al. (Reference Gustafson, McCandless, Levy and Richardson2010) suggest that transparent parameterization could be exploited for the purpose of building a computationally efficient posterior sampler, making jumps in spaces for those parameters informed by the data while moving fast through spaces for those parameters with little information from the data. We adopt a similar strategy for our proposed BSA-LFM. Specifically, we substitute $U_{it}$ in the first line for potential outcome with the second line in model (13) that describe the DGP of $U_{it}$ , essentially integrating out the unobserved confounder:

(26) $$ \begin{align} Y_{it} = (\delta_{it} + \beta_u \lambda_d) D_{it} + (\beta_0 + \beta_u \lambda_0) + X_{it}^{\prime}(\beta + \beta_u \lambda_x) + (1 + \beta_u \lambda_f)(\omega_{\gamma} \cdot \tilde{\gamma}_i)^{\prime} f_t + \beta_u \nu_{it} + \varepsilon_{it}. \end{align} $$

Define $\tilde {\delta }_{it} = \delta _{it} + \beta _u \lambda _d$ , $\tilde {\beta }_0 = \beta _0 + \beta _u \lambda _0$ , $\tilde {\beta } = \beta + \beta _u \lambda _x$ , $\tilde {\omega }_{\gamma } = (1 + \beta _u \lambda _f)\omega _{\gamma }$ , and $\tilde {\epsilon }_{it} = \beta _u \nu _{it} + \varepsilon _{it}$ . Model (13) can then be re-written as:

(27) $$ \begin{align} \begin{gathered} Y_{it} = \tilde{\delta}_{it} D_{it} + \tilde{\beta}_0 + X_{it}^{\prime}\tilde{\beta}+ (\tilde{\omega}_{\gamma} \cdot \tilde{\gamma}_i)^{\prime} f_t + \tilde{\epsilon}_{it}, \\ \tilde{\epsilon}_{it} \sim \mathcal{N} (0, \tilde{\sigma}^2), \\ \tilde{\sigma}^2 = \beta_u^2 c_1^2 + \sigma_Y^2. \end{gathered} \end{align} $$

The original parameter vector is $\theta = \{ \beta _0, \beta , \boldsymbol {\omega }, \Gamma , F, \sigma _Y^2, \beta _u, \lambda _0, \lambda _d, \lambda _x, \lambda _f \}$ , and the re-parameterized vector is:

(28) $$ \begin{align} \phi = \left\{ \tilde{\beta}_0, \tilde{\beta}, \tilde{\boldsymbol{\omega}}, \Gamma, F, \tilde{\sigma}^2, \beta_u, \lambda_0, \lambda_d, \lambda_x, \lambda_f \right\}, \end{align} $$

where $\phi _I = \left \{\tilde {\beta }_0, \tilde {\beta }, \tilde {\boldsymbol {\omega }}, \Gamma , F, \tilde {\sigma }^2\right \}$ and $\phi _N = \left \{\beta _u, \lambda _0, \lambda _d, \lambda _x, \lambda _f\right \}$ . One can verify that the data likelihood only depends on $ \phi _I $ :Footnote 10

(29) $$ \begin{align} f(Y^{obs} | \phi ) &\propto (\sqrt{\sigma^2 + \beta_u^2 c_1^2}) ^{-(NT - \sum_{i,t} D_{it})} \nonumber\\& \quad \exp [-\frac{1}{2(\sigma^2 + \beta_u^2 c_1^2)} \sum_{D_{it} = 0} (y_{it} - (\beta_0 + \beta_u \gamma_0) - X_{it}^{\prime}(\beta + \beta_u \gamma_x) - (\omega_{\gamma} (1 + \beta_u \gamma_f) \cdot \tilde{\gamma}_i)^{\prime} f_t)^2 ] \nonumber\\&= \tilde{\sigma}^{-(NT - \sum_{i,t} D_{it})} \exp [-\frac{1}{2\tilde{\sigma}^2} \sum_{D_{it} = 0} (y_{it} - \tilde{\beta}_0 - X_{it}^{\prime}\tilde{\beta} - (\tilde{\omega}_{\gamma} \cdot \tilde{\gamma}_i)^{\prime} f_t)^2 ]\nonumber\\&\propto f(Y^{obs} | \phi_I ). \\[-40pt]\nonumber \end{align} $$

Now, we derive the posteriors for parameters in the re-parameterized model (27). Denote the one-to-one mapping of the transparent re-parameterization by $ \phi = g(\theta ) $ . Prior distributions assigned to the original parameter vector $\theta $ imply priors assigned to $\phi $ are:

(30) $$ \begin{align} f_\phi(\phi) = f_\theta(g^{-1}(\phi))|J|, \end{align} $$

where $|J|$ is the determinant of the Jacobian $ \theta = g^{-1}(\phi ) $ . Thus, posterior distributions of $\phi $ can be written as:

(31) $$ \begin{align} \begin{aligned} \pi(\phi | Y^{obs}) &\propto f(Y^{obs} | \phi ) f_\phi(\phi) \\& = f(Y^{obs} | \phi ) f_\theta(g^{-1}(\phi))|J| \\& = f(Y^{obs} | \phi ) |1 + \beta_u\lambda_f|^{-r} \\&\quad \prod_{j = 0}^{p} \pi(\tilde{\beta}_j - \beta_u \lambda_x | \tau_{\beta_j}^2) \pi(\tau_{\beta_j}^2|\xi_\beta^2) \prod_{k = 1}^{r}\pi(\frac{\tilde{\omega}_{\gamma_j}}{1 + \beta_u\lambda_f} | \tau_{\gamma_j}^2) \pi(\tau_{\gamma_j}^2|\xi_\gamma^2) \pi(\xi_\gamma^2) \\&\quad \pi(\beta_u | \tau_{\beta_u}^2) \pi(\tau_{\beta_u}^2|\xi_\beta^2) \pi(\tilde{\Gamma})\pi(F) \pi(\tilde{\sigma}^{2} - \beta_u^2 c_1^2) \pi(\lambda_0, \lambda_d) \pi(\lambda_x) \pi(\lambda_f), \end{aligned} \end{align} $$

where $\pi (\cdot )$ denotes the prior distribution of the corresponding original parameter. Then, we can recover the posteriors of the original parameter vector $\theta $ from the posteriors (31) of $\phi $ . Since prior distributions are assigned to $\theta $ , the implied priors are not conjugate for some parameters in $\phi $ . Therefore, we develop a Metropolis–Hastings step within Gibbs sampling procedure to simulate the posteriors summarized as Algorithm 1. Details of the MCMC algorithm can be found in Appendix 2 of the Supplementary Material.

Remark. In Step 10 of Algorithm 1, we estimate the individual treatment effect for observations under treatment ( $D_{it} = 1$ ) using $\delta _{it} = \tilde {\delta }_{it} - \beta _u \cdot \lambda _d$ , where $\tilde {\delta }_{it} = Y_{it} - Y_{it}(0)$ , with $Y_{it}(0)$ representing the imputed counterfactual outcome under control. This step demonstrates the “frequentist analogy” of the proposed BSA. To see this, suppose the treatment effects are homogeneous ( $\delta _{it} \equiv \delta $ ). In that case, the identified parameter is $\tilde {\delta } = \delta + \beta _u \cdot \lambda _d$ . Researchers can obtain an interval estimate for $\delta $ by fixing the sensitivity parameters at specific values and varying these values. The BSA approach, rather than selecting specific values for the sensitivity parameters, marginalizes the treatment effect estimate over the posterior distributions of $\beta _u$ and $\lambda _d$ .

4 Monte Carlo Studies

In this section, we conduct a series of Monte Carlo studies to investigate the behavior of the proposed BSA (BSA-LFM) in terms of key frequentist performance metrics. We focus on the $ATT$ ( $ \sum _{i,t} D_{it} \delta _{it} / \sum _{i,t} D_{it} $ ) as a quantity of interest and investigate the coverage rate of the posterior credible intervals, as well as their average lengths. In particular, we are interested in how well the proposed method performs against alternatives in terms of the aforementioned bias–variance trade-off. Specifically, our analysis focuses on whether we can (1) improve the coverage rate for the true ATT compared to the LFM (DM-LFM; Pang et al. Reference Pang, Liu and Xu2022) and (2) obtain narrower credible intervals while maintaining the same false positive rate (measured by coverage rate when true ATT is 0) using the proposed shrinkage priors when there exists an unobserved confounder.

We generate simulated datasets with varying sample sizes according to the following DGP:

(32) $$ \begin{align} \begin{aligned} Y_{it} &= 3 + \delta_{it} D_{it} + X_{1it} + 3 X_{2it} + \alpha_i + \xi_t + \gamma_i' f_t + U_{it} + \varepsilon_{it}, \\U_{it} &= 0.3 - 0.5 D_{it} + 0.1 \sum_{k=1}^{5} X_{kit} + 0.1 (\alpha_i + \xi_t + \gamma_i' f_t) + \nu_{it}. \end{aligned} \end{align} $$

The DGP (32) assumes that there are three time-varying confounders: $X_1$ , $X_2$ , and U that determine the value of potential outcomes. The first two are observed while the last one is unobserved. Their corresponding coefficients are 1, 3, and 1. $\alpha _i$ and $\xi _t$ are additive two-way random effects, and $\lambda _i'f_t$ is the latent factor term with two factors. All of them are drawn from independent and identical standard normal distributions, i.e., $\alpha _i \sim N(0, 1) $ , $\xi _t \sim N(0, 1) $ , $ \gamma _{ki} \sim N(0, 1), $ and $f_{kt} \sim N(0, 1)$ for $k \in \{1, 2 \}$ . We also assume that five covariates are observed: $X_1$ , $X_2$ , $X_3$ , $X_4$ , and $X_5$ , and only the first two have non-zero effects on potential outcomes. The observed covariates $X_1$ to $X_5$ are also generated from independent and identical standard normal distributions. For the unobserved confounder U, we assume that it is correlated with all observed covariates $X_1$ to $X_5$ , treatment indicator D, and additive two-way random effects as well as the latent factor term. $\varepsilon _{it}$ and $\nu _{it}$ are normal errors with mean 0 and variance 1 and 0.25, respectively. Therefore, $X_3$ , $X_4$ , and $X_5$ implicitly determine the value of potential outcomes through U. Values of sensitivity parameters (each element of $\lambda _x$ and $\lambda _f$ ) are all set to 0.1. We keep them small so that they are not on the margin of their corresponding prior distributions. Finally, we set the value of the treatment effect $\delta _{it}$ . For coverage rate and length of credible intervals, it suffices to study the case where $ATT$ equals 0. For power analysis, we set a variety of values for ATT. We assume that individual treatment effects are heterogeneous; however, for a given value of the ATT, there is no variation across simulated samples.

Existing literature finds that the performance of factor models for causal inference depends on the number of pre-treatment periods as well as the number of control units (Pang et al. Reference Pang, Liu and Xu2022; Samartsidis et al. Reference Samartsidis, Seaman, Montagna, Charlett, Hickman and De Angelis2020; Xu Reference Xu2017). For the design of Monte Carlo studies, we consider several combinations of the number of units ( $N \in \{50, 100\} $ ) and time periods ( $T \in \{30, 50, 90\} $ ). We set the proportion of the treated units to be 20%, whose treatment status switches from 0 to 1 in the post-treatment periods. For the remaining control units, their treatment statuses are always equal to 0. We regard the last ten periods as post-treatment periods. For each combination of N and T, we generate 1,000 simulated data samples using the DGP described above.

For model fitting, we include all observed covariates ( $X_1$ to $X_5$ ) in the regression models and regard the true number of factors as known.Footnote 11 We compare estimation results based on four different model specifications: (1) DM-LFM (w/ U) the ideal, “oracle” case where the unobserved confounder U is observed, (2) DM-LFM (w/o U) the naïve case where we ignore the unobserved U, (3) BSA-LFM (w/ Shrinkage Prior) BSA with the proposed shrinkage prior for $\beta _u$ , and (4) BSA-LFM (w/ Normal Prior) BSA with normal prior for $\beta _u$ . For shrinkage prior, we assign a diffuse prior $\xi ^2_{\beta _u} \sim Gamma(0.001, 0.001) $ to the hyperparameter, while for the normal prior, we assign $\beta _u \sim N(0, 10)$ . The results are displayed in Tables 2 and 3.

Table 2 Coverage rate of 95% credible intervals.

Table 3 Average length of 95% credible intervals.

Table 2 displays a coverage rate of nominal 95% credible intervals. As expected, DM-LFM (w/ U) has a coverage rate close to the nominal value for different number of units. DM-LFM (w/o U), however, produces a coverage rate substantially smaller than the nominal value, and the problem of under-coverage worsens as the number of units increases. BSA-DM-LFM with shrinkage or normal prior has a coverage rate equal to 100%, indicating that they are always conservative. Table 3 reports average lengths of the nominal 95% credible intervals. For both the ideal case and naive case, the average length of CI decreases at the order of the squared root of the sample size. For BSA-LFM, the average length of CI decreases only slightly as the number of units increases.Footnote 12

The conservativeness of the credible intervals and their asymptotic behavior arise from the two sources of uncertainty contributing to these intervals. While sampling variability decreases as sample size increases, the “scenario uncertainty” about the extent of unobserved confounding, which is reflected on the specification of sensitivity parameters, remains unchanged (Gustafson et al. Reference Gustafson, McCandless, Levy and Richardson2010). If the substantive prior knowledge about the sensitivity parameter values is available, it is possible to reduce the width of the credible interval by changing priors assigned to the sensitivity parameters, e.g., by reducing the prior variance of $\lambda _d$ .

We also conduct a power analysis on BSA-LFM with varying sample sizes and magnitudes of treatment effect and compare its performance with different priors with that of DM-LFM. Figure 1 presents the results. We find that the power curves of the ideal and naïve cases look very similar for different number of units, except that it is symmetric about 0 for the ideal case while shifted toward right for the naive case due to the negative bias. We have two notable findings about BSA-LFM. First, with shrinkage prior, the power curve looks more symmetric about 0 even if the posterior median indicates a negative bias. Second, for a given number of units, as the magnitude of true ATT increases, the percentage of credible intervals excluding 0 becomes larger. Compared to normal prior, BSA-LFM with shrinkage prior may produce a larger percentage of credible intervals excluding 0 for a given combination of the number of units and treatment effect, which is consistent with the evidence that on average BSA-LFM with shrinkage prior produces narrower credible interval than that with normal prior. Combined with the results above, we may conclude that BSA-LFM with shrinkage prior can be more efficient to detect a non-zero ATT compared to that with normal prior while maintaining same false positive rate when some observed covariates have zero effect on potential outcomes.

Figure 1 Percentage of nominal 95% credible intervals excluding zero.

5 Empirical Application

To illustrate its applicability, we apply our model to the empirical section of Scheve and Stasavage (Reference Scheve and Stasavage2012), which studies the causal effect of war and democracy on taxation of inherited wealth, an important debate in the field of political economy. The authors implemented a difference-in-differences (DID) model specification to analyze an original dataset that includes 19 countries that spans 180 years (1821–2000). They found that mass warfare has a significant positive effect on the top margin rate of inheritance tax, while democracy does not have a significant effect. In this replication, we follow the article and regard mass warfare as a binary treatment indicator which equals 1 if a country engaged in any interstate wars and 2% of the total population was serving in the military. Democracy measured in the expansion of suffrage right is regarded as an observed time-varying confounder. We consider the following model specification:Footnote 13

(33) $$ \begin{align} Y_{it} = \delta_{it} D_{i,t} + X_{it}'\beta + \beta_u U_{it} + \alpha_i + \xi_t + \lambda_i' f_t + \varepsilon_{it}, \end{align} $$

where $X_{it}$ is a vector of (lagged) time-varying confounders that include: a binary indicator of universal male suffrage, a binary indicator of left executive, and GDP per capita. Lagged outcome variable is also controlled. The original statistical model only adds TWFEs to control for unobserved confounders. In some alternative model specifications, Scheve and Stasavage (Reference Scheve and Stasavage2012) control for country-specific time trends to approximate common shocks with unit-heterogeneous effects. Here, we add a latent factor term with ten factors and shrinkage priors for factor selection.

As noted by the original authors, the treatment effect of mass warfare in their baseline model could be biased due to the existence of unobserved time-varying confounders. For example, it is possible that a country’s belief about its ability to finance wars by taxing the rich and selection into war are positively correlated, biasing the estimate of the treatment effect in the positive direction. To address this concern, we apply our proposed sensitivity analysis. We regard $U_{it}$ as an unobserved continuous confounder, and investigate how it affects the magnitude and significance of the causal effects. For comparison, we also fit a naïve statistical model which does not include $U_{it}$ but otherwise is exactly the same. The estimation results are displayed in Figure 2.

Figure 2 The effect of war on taxation.

Note: On the left panel, dashed lines indicate posterior 95% credible intervals for ATT estimates based on the naïve model (blue) and the proposed BSA (red).

As expected, the point estimates (posterior means) are very close between our proposed model (“w/ BSA”) and the naïve model (“w/o BSA”), both in terms of time-varying treatment effects (right plot, red and blue solid lines) and overall average effects (left plot, solid circles). However, the resulting 95% credible interval is wider when we consider the existence of $U_{it}$ , especially for the first two post-treatment periods (red and blue dotted lines). For average treatment effect, while the 95% credible interval based on the naïve model does not cover 0, the interval based on the proposed model does, indicating that the estimated effect becomes insignificant when we consider the uncertainty due to the existence of an unobserved confounder.

In some cases, researchers may be more interested in hypothetical scenarios corresponding to particular values of the sensitivity parameters. In the taxation example, if the ability to tax the rich to finance war is the unobserved confounder, then $\beta _u$ should be positive. Posteriors of the sensitivity parameters offer useful, principled guidelines for this type of analysis, by way of providing probabilistic benchmarks for such specific values.Footnote 14 To examine the posterior dependence between the treatment effect and the sensitivity parameters, we fix the values of $\beta _u$ and $\lambda _d$ at different deciles of their posterior distributions and re-estimate the treatment effect. The results are shown in Figure 3. We find that when the value of $\beta _u$ or $\lambda _d$ is closer to 0, the corresponding 95% CI is narrower, indicating less uncertainty when confounding strength of $U_{it}$ is weaker. Indeed, the 95% CI may exclude 0 when either $\beta _u$ or $\lambda _d$ is close to 0. Besides, the point estimate of the treatment effect increases as $\beta _u$ changes from negative to positive, while the value of $\lambda _d$ barely affects the point estimate.

Figure 3 Sensitivity analysis conditional on posterior deciles of $\beta _u$ and $\lambda _d$ .

Note: In each plot, the red dotted line indicates zero, while the blue dotted line represents the naïve estimate of the ATT, i.e., the posterior mean of the ATT without Bayesian sensitivity analysis. Each tick on the x-axis represents a decile of the corresponding sensitivity parameter’s posterior distribution. The histogram on the bottom represents the posterior marginal density of the sensitivity parameter.

Finally, researchers may also be interested in assessing the robustness of their findings under extreme or “worst-case” scenarios, as is common in conventional frequentist sensitivity analysis. For instance, one may ask how large the sensitivity parameters must be to explain away the estimated ATT. To illustrate this, we present a contour plot (Figure 4) that shows how the simultaneous variation of $\beta _u$ and $\lambda _d$ corresponds to different values of the ATT estimate. In parallel, the posterior distributions of the sensitivity parameters can help researchers assess how plausible a given scenario on the sensitivity contour is in light of the observed data.

Figure 4 Sensitivity contour of the ATT estimate.

Note: In this plot, the value at (0, 0) corresponds to the ATT estimate without applying BSA. Other values represent the ATT estimates obtained under different combinations of sensitivity parameter values. The ranges of $\beta _u$ and $\lambda _d$ shown in the plot correspond to their 95% posterior credible intervals.

6 Conclusion

In this article, we propose a novel BSA method for treatment effect estimation with causal panel data models, especially the linear factor model, based on counterfactual imputation, assessing the sensitivity of the causal effect estimate to the possible existence of unit-specific time-varying unobserved confounding. Through appropriate prior specification, our approach is essentially a Bayesian analog of “benchmarking” sensitivity parameters by observed pre-treatment confounders, making the results of sensitivity analysis more interpretable.

Another advantage of our proposed method is its computation efficiency. When there exists an unobserved time-varying confounder, the augmented linear factor model with sensitivity parameters is by design nonidentified. We utilize the transparent parameterization (Gustafson Reference Gustafson2005) to develop an efficient MCMC algorithm. Furthermore, the shrinkage prior assigned to the sensitivity parameter induces substantial indirect learning, making the model more efficient to detect a non-zero treatment effect compared to a diffuse normal prior, as is shown in the Monte Carlo studies.

Our proposed method also has several limitations. First, although the posterior distributions of the sensitivity parameters provide insight into how plausible different degrees of violation of the no unobserved confounding assumption may be, specifying priors is not necessarily easier than defining a range of plausible sensitivity values. As such, researchers must rely on substantive knowledge when selecting priors for the sensitivity parameters. Second, the method relies on parametric assumptions for the outcome model and the unobserved confounder. While the use of linear models reflects a pragmatic modeling choice in practice, such assumptions may limit flexibility in capturing complex DGPs. Finally, although our proposed method is quite flexible to summarize the influence of the unobserved confounder, the Bayesian approach is inherently more computationally intensive than typical frequentist approaches, which may constrain its practical applicability to relatively small datasets. Future research aimed at overcoming these limitations is worth pursuing.

Acknowledgments

We thank Xun Pang, Anton Strezhnev, Ye Wang, Yiqing Xu, Soichiro Yamauchi, members of the Kim Research Group at MIT, and participants at the 2022 APSA Annual Meeting, the 2023 Asian Political Methodology Meeting, the 2023 Annual Meeting of the Society for Political Methodology, and seminars at Peking University School of International Studies and the University of Tokyo. We are especially grateful to Editors-in-Chief Daniel Hopkins and Brandon Stewart and two anonymous reviewers for their incredibly helpful comments. All remaining errors are our own.

Funding Statement

This research is supported by the Political Methodology Lab at MIT.

Data Availability Statement

Replication data and code for this article (Liu and Yamamoto Reference Liu and Yamamoto2025) have been published in the Political Analysis Harvard Dataverse at https://doi.org/10.7910/DVN/HL7ZYO.

Supplementary Material

For supplementary material accompanying this paper, please visit https://doi.org/10.1017/pan.2025.10029.

Footnotes

Edited by: Daniel J. Hopkins and Brandon M. Stewart

1 This assumption is also called no dynamics in Arkhangelsky and Imbens (Reference Arkhangelsky and Imbens2022).

2 Bayesian methods for causal inference have been increasingly developed and applied in political science research (Gill Reference Gill2012) For example, Hollenbach, Montgomery, and Crespo-Tenorio (Reference Hollenbach, Montgomery and Crespo-Tenorio2019) propose a model-based approach, and Keele and Quinn (Reference Keele and Quinn2017) develop a sensitivity analysis method for binary outcome data. Bayesian approaches have also been applied to quantify the uncertainty of Rosenbaum bounds in sensitivity analysis (Mebane Jr and Poast Reference Mebane and Poast2013).

3 The factor terms are unidentified because of the label-switching problem, such that $\omega _{\gamma } \cdot \tilde {\gamma }_i = (-\omega _{\gamma }) \cdot (-\tilde {\gamma }_i)$ . Following the literature, we introduce a step of random permutation in our sampler to address this problem.

4 An alternative specification of $U_{it}$ assumes that it is orthogonal to the observed covariates and the latent factor term conditional on $D_{it}$ ; that is, $U_{it} = \lambda _0 + \lambda _d D_{it} + \nu _{it}$ . As a reviewer noted, this specification reduces the number of sensitivity parameters, thereby simplifying the model. This approach aligns with the MCSA (Greenland Reference Greenland2003), which draws inference based on the prior distribution of sensitivity parameters. However, as Gustafson and McCandless (Reference Gustafson and McCandless2018) shows, compared to MCSA, the proposed BSA, which links $U_{it}$ to the observed covariates and latent factors, enables inference based on the posterior distribution of sensitivity parameters (e.g., $\beta _u$ ) and may yield more precise uncertainty estimates.

5 As noted by VanderWeele (Reference VanderWeele2008), if neither the observed covariates nor the factor term affects treatment assignment, it is reasonable to assume $\lambda _0 = \lambda _x = \lambda _f = 0$ . This conventional practice is included as a special case within our proposed model.

6 As one reviewer helpfully pointed out, re-parameterizing the model of U to express D as a function of U makes it clearer that $\lambda _d$ , rather than the other $\lambda $ terms or $\beta _u$ , is the essential sensitivity parameter. Suppose D is continuous. Then, the model of U can be equivalently written as: $D_{it} = -\frac {\lambda _0}{\lambda _d}+\frac {1}{\lambda _d} U_{i t}-\left \{\frac {\lambda _x}{\lambda _d} X_{i t}+\frac {\lambda _f}{\lambda _d}\left (\omega _w \cdot \tilde {\gamma }_i\right ) f_t+v_{i t}^{\prime }\right \}$ , where $v_{i t}^{\prime }=-\frac {1}{\lambda _d} v_{i t}$ . This re-parameterization demonstrates that $1 / \lambda _d$ is the key sensitivity parameter.

7 As one reviewer pointed out, another common choice is to assign uniform priors to the sensitivity parameters, reflecting vague prior knowledge. In this case, it is necessary to specify lower and upper bounds to satisfy the constraint on the variance of $U_{it}$ .

8 If the researcher chooses $k_1 \leq 1$ , another prior specification for $\lambda _f$ is $\lambda _f \sim \mathcal {N} \left (0, \frac {c_5^2(k_1+1)}{2rk_2}\right )$ . See the Supplementary Material for more details.

9 The parameterization of gamma distribution has mean $= \frac {a_1}{a_2}$ .

10 For counterfactual imputation-based approach, only observations under control status are used for model fitting.

11 Readers may refer to Pang et al. (Reference Pang, Liu and Xu2022) for details about how to conduct selection of number of factors with shrinkage priors.

12 For BSA-LFM with and without shrinkage priors, the coverage rate of 95% credible intervals is both 100%. Therefore, we conduct additional Monte Carlo studies and compare the coverage rate of 50% credible intervals. The results are reported in Appendix 3.2 of the Supplementary Material.

13 Note that in our model, both the treatment indicator and observed confounders are contemporaneous with the outcome variable. For consistency and comparison with the original findings in Scheve and Stasavage (Reference Scheve and Stasavage2012), we lag the treatment and observed confounders for one year.

14 Due to limited space, we provide a simulated example on BSA conditional on a specific value of sensitivity parameters $\beta _u$ or $\lambda _d$ with conditional posteriors of the other sensitivity parameter in Appendix 3.1 of the Supplementary Material.

References

Abadie, A., Diamond, A., and Hainmueller, J.. 2010. “Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California’s Tobacco Control Program.” Journal of the American Statistical Association 105 (490): 493505.10.1198/jasa.2009.ap08746CrossRefGoogle Scholar
Abadie, A., Diamond, A., and Hainmueller, J.. 2015. “Comparative Politics and the Synthetic Control Method.” American Journal of Political Science 59 (2): 495510.10.1111/ajps.12116CrossRefGoogle Scholar
Altonji, J. G., Elder, T. E., and Taber, C. R.. 2005. “Selection on Observed and Unobserved Variables: Assessing the Effectiveness of Catholic Schools.” Journal of Political Economy 113 (1): 151184.10.1086/426036CrossRefGoogle Scholar
Arkhangelsky, D., and Imbens, G. W.. 2022. “Doubly Robust Identification for Causal Panel Data Models.” The Econometrics Journal 25 (3): 649674.10.1093/ectj/utac019CrossRefGoogle Scholar
Athey, S., Bayati, M., Doudchenko, N., Imbens, G., and Khosravi, K.. 2021. “Matrix Completion Methods for Causal Panel Data Models.” Journal of the American Statistical Association 116 (536): 17161730.10.1080/01621459.2021.1891924CrossRefGoogle Scholar
Bai, J. 2009. “Panel Data Models with Interactive Fixed Effects.” Econometrica 77 (4): 12291279.Google Scholar
Belmonte, M. A. G., Koop, G., and Korobilis, D.. 2014. “Hierarchical Shrinkage in Time-Varying Parameter Models.” Journal of Forecasting 33 (1): 8094.10.1002/for.2276CrossRefGoogle Scholar
Cameron, A. C., and Miller, D. L.. 2015. “A Practitioner’s Guide to Cluster-Robust Inference.” Journal of Human Resources 50 (2): 317372.10.3368/jhr.50.2.317CrossRefGoogle Scholar
Cinelli, C., and Hazlett, C.. 2020. “Making Sense of Sensitivity: Extending Omitted Variable Bias.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 82 (1): 3967.10.1111/rssb.12348CrossRefGoogle Scholar
Diegert, P., Masten, M. A, and Poirier, A.. 2022. “Assessing Omitted Variable Bias When the Controls are Endogenous.” Preprint, arXiv:2206.02303. Google Scholar
Frühwirth-Schnatter, S., and Wagner, H.. 2010. “Stochastic Model Specification Search for Gaussian and Partial Non-Gaussian State Space Models.” Journal of Econometrics 154 (1): 85100.10.1016/j.jeconom.2009.07.003CrossRefGoogle Scholar
Gelfand, A. E., and Sahu, S. K.. 1999. “Identifiability, Improper Priors, and Gibbs Sampling for Generalized Linear Models.” Journal of the American Statistical Association 94 (445): 247253.10.1080/01621459.1999.10473840CrossRefGoogle Scholar
Gill, J. 2012. “Bayesian Methods in Political Science: Introduction to the Virtual Issue.” Political Analysis 20 (V3): 19.10.1017/S1047198700014273CrossRefGoogle Scholar
Greenland, S. 2003. “The Impact of Prior Distributions for Uncontrolled Confounding and Response Bias: A Case Study of the Relation of Wire Codes and Magnetic Fields to Childhood Leukemia.” Journal of the American Statistical Association 98 (461): 4754.10.1198/01621450338861905CrossRefGoogle Scholar
Gustafson, P., McCandless, L. C., Levy, A. R., and Richardson, S.. 2010. “Simplified Bayesian Sensitivity Analysis for Mismeasured and Unobserved Confounders.” Biometrics 66 (4): 11291137.10.1111/j.1541-0420.2009.01377.xCrossRefGoogle ScholarPubMed
Gustafson, P. 2005. “On Model Expansion, Model Contraction, Identifiability and Prior Information: Two Illustrative Scenarios Involving Mismeasured Variables.” Statistical Science 20 (2): 111140.10.1214/088342305000000098CrossRefGoogle Scholar
Gustafson, P., and McCandless, L. C.. 2018. “When Is a Sensitivity Parameter Exactly that?Statistical Science 33 (1): 8695.10.1214/17-STS632CrossRefGoogle Scholar
Hollenbach, F. M., Montgomery, J. M., and Crespo-Tenorio, A.. 2019. “Bayesian Versus Maximum Likelihood Estimation of Treatment Effects in Bivariate Probit Instrumental Variable Models.” Political Science Research and Methods 7 (3): 651659.10.1017/psrm.2018.15CrossRefGoogle Scholar
Imbens, G. W. 2003. “Sensitivity to Exogeneity Assumptions in Program Evaluation.” American Economic Review 93 (2): 126132.10.1257/000282803321946921CrossRefGoogle Scholar
Imbens, G. W., and Rubin, D. B.. 1997. “Estimating Outcome Distributions for Compliers in Instrumental Variables Models.” The Review of Economic Studies 64 (4): 555574.10.2307/2971731CrossRefGoogle Scholar
Keele, L., and Quinn, K. M. 2017. “Bayesian Sensitivity Analysis for Causal Effects from 2 $\setminus$ times2 Tables in the Presence of Unmeasured Confounding with Application to Presidential Campaign Visits.” Annals of Applied Statistics 11 (4): 19741997.Google Scholar
Klinenberg, D. 2022. “Synthetic Control with Time Varying Coefficients a State Space Approach with Bayesian Shrinkage.” Journal of Business & Economic Statistics 41 (4): 10651076.10.1080/07350015.2022.2102025CrossRefGoogle Scholar
Lin, D. Y., Psaty, B. M., and Kronmal, R. A.. 1998. “Assessing the Sensitivity of Regression Results to Unmeasured Confounders in Observational Studies.” Biometrics 54 (3): 948963.10.2307/2533848CrossRefGoogle ScholarPubMed
Liu, L., and Yamamoto, T.. 2025. “Replication Data for: Bayesian Sensitivity Analysis for Unmeasured Confounding in Causal Panel Data Models.” Harvard Dataverse, V1. https://doi.org/10.7910/DVN/HL7ZYO CrossRefGoogle Scholar
Manski, C. F., and Pepper, J. V.. 2018. “How Do Right-to-Carry Laws Affect Crime Rates? Coping with Ambiguity Using Bounded-Variation Assumptions.” Review of Economics and Statistics 100 (2): 232244.10.1162/REST_a_00689CrossRefGoogle Scholar
McCandless, L. C., Gustafson, P., and Levy, A.. 2007. “Bayesian Sensitivity Analysis for Unmeasured Confounding in Observational Studies.” Statistics in Medicine 26 (11): 23312347.10.1002/sim.2711CrossRefGoogle ScholarPubMed
McCandless, L. C., Gustafson, P., and Levy, A. R.. 2008. “A Sensitivity Analysis Using Information about Measured Confounders Yielded Improved Uncertainty Assessments for Unmeasured Confounding.” Journal of Clinical Epidemiology 61 (3): 247255.10.1016/j.jclinepi.2007.05.006CrossRefGoogle ScholarPubMed
Mebane, W. R. Jr, and Poast, P.. 2013. “Causal Inference without Ignorability: Identification with Nonrandom Assignment and Missing Treatment Data.” Political Analysis 21 (2): 233251.10.1093/pan/mps043CrossRefGoogle Scholar
Oster, E. 2019. “Unobservable Selection and Coefficient Stability: Theory and Evidence.” Journal of Business & Economic Statistics 37 (2): 187204.10.1080/07350015.2016.1227711CrossRefGoogle Scholar
Pang, X., Liu, L., and Xu, Y.. 2022. “A Bayesian Alternative to Synthetic Control for Comparative Case Studies.” Political Analysis 30 (2): 269288.10.1017/pan.2021.22CrossRefGoogle Scholar
Park, T., and Casella, G.. 2008. “The Bayesian Lasso.” Journal of the American Statistical Association 103 (482): 681686.10.1198/016214508000000337CrossRefGoogle Scholar
Rambachan, A., and Roth, J.. 2023. “A more Credible Approach to Parallel Trends.” Review of Economic Studies 90 (5): 25552591.10.1093/restud/rdad018CrossRefGoogle Scholar
Rosenbaum, P. R., and Rosenbaum, P. R.. 2002. Observational Studies. New York, NY: Springer New York.10.1007/978-1-4757-3692-2CrossRefGoogle Scholar
Rubin, D., Wang, X., Yin, L., and Zell, E.. 2008. “Bayesian Causal Inference: Approaches to Estimating the Effect of Treating Hospital Type on Cancer Survival in Sweden Using Principal Stratification.” In Handbook of Applied Bayesian Analysis, edited by Anthony O’Hagan and Mike West, 679708, Oxford University Press.Google Scholar
Rubin, D. B. 1974. “Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies.” Journal of Educational Psychology 66 (5): 688.10.1037/h0037350CrossRefGoogle Scholar
Rubin, D. B. 1976. “Inference and Missing Data.” Biometrika 63 (3): 581592.10.1093/biomet/63.3.581CrossRefGoogle Scholar
Rubin, D. B. 1978. “Bayesian Inference for Causal Effects: The Role of Randomization.” The Annals of Statistics 6 (1): 3458.10.1214/aos/1176344064CrossRefGoogle Scholar
Samartsidis, P., Seaman, S. R, Montagna, S., Charlett, A., Hickman, M., and De Angelis, D.. 2020. “A Bayesian Multivariate Factor Analysis Model for Evaluating an Intervention by Using Observational Time Series Data on Multiple Outcomes.” Journal of the Royal Statistical Society Series A: Statistics in Society 183 (4): 14371459.10.1111/rssa.12569CrossRefGoogle Scholar
Scheve, K., and Stasavage, D.. 2012. “Democracy, War, and Wealth: Lessons from Two Centuries of Inheritance Taxation.” American Political Science Review 106 (1): 81102.10.1017/S0003055411000517CrossRefGoogle Scholar
VanderWeele, T. J. 2008. “Sensitivity Analysis: Distributional Assumptions and Confounding Assumptions.” Biometrics 64 (2): 645649.10.1111/j.1541-0420.2008.01024.xCrossRefGoogle ScholarPubMed
VanderWeele, T. J., and Ding, P.. 2017. “Sensitivity Analysis in Observational Research: Introducing the e-Value.” Annals of Internal Medicine 167 (4): 268274.10.7326/M16-2607CrossRefGoogle ScholarPubMed
Xu, Y. 2017. “Generalized Synthetic Control Method: Causal Inference with Interactive Fixed Effects Models.” Political Analysis 25 (1): 5776.10.1017/pan.2016.2CrossRefGoogle Scholar
Ye, T., Keele, L., Hasegawa, R., and Small, D. S.. 2024. “A Negative Correlation Strategy for Bracketing in Difference-in-Differences.” Journal of the American Statistical Association 119 (547): 22562268.10.1080/01621459.2023.2252576CrossRefGoogle Scholar
Figure 0

Table 1 Suggested default specifications for the hyperparameters.

Figure 1

Table 2 Coverage rate of 95% credible intervals.

Figure 2

Table 3 Average length of 95% credible intervals.

Figure 3

Figure 1 Percentage of nominal 95% credible intervals excluding zero.

Figure 4

Figure 2 The effect of war on taxation.Note: On the left panel, dashed lines indicate posterior 95% credible intervals for ATT estimates based on the naïve model (blue) and the proposed BSA (red).

Figure 5

Figure 3 Sensitivity analysis conditional on posterior deciles of $\beta _u$ and $\lambda _d$.Note: In each plot, the red dotted line indicates zero, while the blue dotted line represents the naïve estimate of the ATT, i.e., the posterior mean of the ATT without Bayesian sensitivity analysis. Each tick on the x-axis represents a decile of the corresponding sensitivity parameter’s posterior distribution. The histogram on the bottom represents the posterior marginal density of the sensitivity parameter.

Figure 6

Figure 4 Sensitivity contour of the ATT estimate.Note: In this plot, the value at (0, 0) corresponds to the ATT estimate without applying BSA. Other values represent the ATT estimates obtained under different combinations of sensitivity parameter values. The ranges of $\beta _u$ and $\lambda _d$ shown in the plot correspond to their 95% posterior credible intervals.

Supplementary material: File

Liu and Yamamoto supplementary material

Liu and Yamamoto supplementary material
Download Liu and Yamamoto supplementary material(File)
File 940.8 KB
Supplementary material: Link

Liu and Yamamoto Dataset

Link