Comparative Causal Mediation and Relaxing the Assumption of No Mediator–Outcome Confounding: An Application to International Law and Audience Costs

Experiments often include multiple treatments, with the primary goal to compare the causal effects of those treatments. This study focuses on comparing the causal anatomies of multiple treatments through the use of causal mediation analysis. It proposes a novel set of comparative causal mediation (CCM) estimands that compare the mediation effects of different treatments via a common mediator. Further, it derives the properties of a set of estimators for the CCM estimands and shows these estimators to be consistent (or conservative) under assumptions that do not require the absence of unobserved confounding of the mediator–outcome relationship, which is a strong and nonrefutable assumption that must typically be made for consistent estimation of individual causal mediation effects. To illustrate the method, the study presents an original application investigating whether and how the international legal status of a foreign policy commitment can increase the domestic political “audience costs” that democratic governments suffer for violating such a commitment. The results provide novel evidence that international legalization can enhance audience costs via multiple causal channels, including by amplifying the perceived immorality of violating the commitment.


Introduction
Causal mediation analysis aims to open the "black box of causality," o ering the opportunity to explore how and why certain treatment e ects occur in addition to simply detecting the existence of those e ects. Estimation of causal mediation e ects, which are e ects transmitted via intermediary variables called mediators, is o en implemented in experimental research. In the most commonly used "single-experiment design," the treatment variable is randomized and the mediator(s) observed.
Another common practice in experimental research is the design of experiments featuring multiple treatment arms. As knowledge and empirical results have accumulated in various academic subfields and in specific program evaluation contexts, experimental research questions have evolved in ways that require evaluating multiple related treatments. Instead of simply testing the e ects of single treatments, o en of primary interest are the empirical and theoretical di erences between the e ects of multiple treatments. Across scientific, social scientific, and policy/program evaluation contexts, richer insights can be gained from comparing di erent treatments' causal anatomies-that is, the ensemble of causal mechanisms that endow each treatment with its e ect.
This study focuses on comparing the causal anatomies of multiple treatments through the use of causal mediation analysis. It proposes a novel set of comparative causal mediation (CCM) estimands that compare the mediation e ects of di erent treatments via a common mediator. Specifically, these estimands take the form of ratios between mediation e ects. In addition, the value of this approach is enhanced by the fact that, as this study shows, these CCM estimands can be estimated under fewer threats to internal validity than individual causal mediation e ects. Specifically, consistent estimation of individual causal mediation e ects requires the strong and nonrefutable assumption of no unobserved confounding of the mediator-outcome relationship. In contrast, this study derives the properties of a set of estimators for the CCM estimands and shows these estimators to be consistent (or conservative) under assumptions that do not require the absence of unobserved confounding of the mediator-outcome relationship. The estimators are easy to understand and implement, thereby providing researchers with a simple, reliable, and systematic method of comparing, discovering, and testing the causal mechanism di erences between multiple treatments.
. Related Literature Estimation of causal mediation e ects has traditionally been implemented using the parametric structural equation modeling (SEM) framework (Baron and Kenny ). More recent years have seen important advances in the formalization, generalization, and estimation of causal mediation e ects within the potential outcomes framework (Robins and Greenland ; Albert ; Imai, Keele, and Tingley a; Imai, Jo, and Stuart a; Imai et al. b) and both parametric and nonparametric SEM frameworks (Pearl ; VanderWeele ). The parametric SEM framework has been critiqued in particular for its inflexibility and reliance on functional form assumptions, with researchers instead advocating for more generalized, nonparametric formulations of causal mediation e ects (Imai, Keele, and Tingley a; Imai et al. b; Pearl , ). This study employs the potential outcomes formalization of causal mediation e ects presented by Imai, Keele, and Tingley ( a), Imai, Keele, Yamamoto ( b). In addition, to formulate the methods, this study adapts the semiparametric model introduced by Imai and Yamamoto ( ), which presents a convenient and interpretable statistical structure yet also avoids the rigidity of the traditional parametric SEM framework by allowing for unit-specific parameters. In addition, this flexibility allows for the causal mediation e ects as defined using potential outcomes notation to be easily expressed within the model. For other semiparametric modeling approaches to causal mediation analysis, see Glynn ( ) and Tchetgen and Shpitser ( ). This study also follows much of the methodological literature on causal mediation preceding it in terms of key assumptions that are employed. A version of the assumption of no interaction between treatment and mediator, which was introduced and formalized to identify mediation e ects in earlier work on causal mediation (Robins and Greenland ; Robins ), is employed for some of the results in this study. However, as emphasized by Robins ( ) and Imai, Tingley, and Yamamoto ( ), the no-interaction assumption must generally hold at the individual level for existing causal mediation methods, whereas this assumption must simply hold on average in the comparative context introduced in this study. Following previous work (Imai, Keele, Yamamoto b; Kraemer et al. ; Imai and Yamamoto ), this study also presents results when the no-interaction assumption is relaxed. In addition, the assumption of no covariance between (individual-level) causal parameters is employed in this study. As has been highlighted by Hong ( , chapter ), this assumption is routinely employed (or implied by other assumptions) in existing approaches to causal mediation analysis. See Shpitser and VanderWeele ( ) and VanderWeele ( ) for a discussion of the connection between the nonparametric SEM and potential outcomes approaches to causal mediation analysis.
While continuing to utilize certain assumptions, a key contribution of this study is in allowing for a relaxation of the assumption of no unobserved confounding of the mediator-outcome relationship. Loeys et al. ( ) make a similar contribution of highlighting how certain causal mediation quantities of interest can still be identified when relaxing this assumption. Specifically, Loeys et al. ( ) show how an "index for moderated mediation," which measures the extent to which a causal mediation e ect varies by the level of other variables (moderators), can be identified under certain conditions without the assumption of no unobserved mediator-outcome confounding. In contrast to the present study, however, the structural framework used by Loeys et al. ( ) employs constant e ects rather than unit-specific parameters. It is worth explicitly noting that the method presented in this study does not apply to comparing the e ects of a single treatment transmitted via di erent mediators. In contrast to the method presented in this study, trying to compare the e ects transmitted via multiple mediators would compound the threat to internal validity, as the problem of confounding is likely to a ect each mediator to a di erent degree and in ways that cannot be measured or tested. As a separate issue, there is also a possibility of causal connections between the mediators, further threatening clean identification and obscuring what is even being measured. Guidance on how to handle these issues, which are not covered in this study, can be found in Imai and Yamamoto ( ) and Daniel et al. ( ). In addition, another related line of research has focused on identification and estimation of "controlled direct e ects," which refer to the direct e ect of a treatment when fixing the mediator at a common value for all units, in contrast to "natural direct e ects," which fix the mediator at unit-specific potential values for each unit under a particular treatment level, such as under nonexposure (e.g. Robins ; Pearl ; VanderWeele ; Acharya, Blackwell, and Sen ). Controlled and natural direct e ects are not considered in this study. Guidance on the di erence between these two types of direct e ects, their relationship with causal mediation e ects, and how to identify and estimate average controlled direct e ects can be found in Acharya, Blackwell, and Sen ( ).
. Outline The remainder of this study is organized as follows. Section provides motivation and explains the value, in both theoretical and policy contexts, for comparing the causal mediation e ects of multiple treatments. Section formally introduces the new CCM estimands. Section then presents an estimation strategy, describing the assumptions and methods under which the CCM estimands can be estimated consistently. Section presents simulations to illustrate the properties of the estimators. Section then describes how these properties change-namely, how the CCM estimands can be estimated conservatively but no longer consistently-under a relaxed set of assumptions. To illustrate the CCM method, Section presents an original application, investigating the e ect of international legality on the domestic political costs that democratic governments su er for violating foreign policy commitments. Section concludes.

Motivation for Comparing Causal Mediation E ects
In experimental research contexts involving multiple related treatments, theories on why one treatment should have a larger e ect than another are linked to the presumed mechanism(s) through which each treatment propagates its e ect. As a prelude to the application presented later in this study, consider the recent accumulation of experimental evidence in the political science literature on "audience costs" (for a brief review, see Hyde ). These many studies have di ered greatly, however, not only in terms of their foreign policy contexts (e.g. security scenarios, international economic scenarios, etc.) but also in terms of the specific nature of the foreign policy commitment (e.g. informal, legal, etc.). One may then wonder whether and why the nature of such a commitment might a ect the strength of audience costs. For instance, a legalized foreign policy commitment could gain audience cost strength over an informal commitment via various mechanisms, such as a heightened sense of immorality for violating legalized commitments on the part of citizens, or a belief that violating legalized commitments is more likely to lead to international retaliation. Another example exists in the literature on party cues in American politics, which includes a wealth of experimental studies that investigate party cue e ects on voter attitudes and behavior (e.g. Kam ; Arceneaux ). As these studies have highlighted, there are various types of party cues, and there is some experimental evidence that out-party cues may, in fact, be more influential than in-party cues (Aaroe ; Arceneaux and Kolodny ; Slothuus and de Vreese ; Goren, Federico, and Kittilson ; Nicholson ). There may be various mechanisms by which out-party cue e ects can exceed those of in-party cues-for instance, the possibility that out-party cues elicit stronger emotional reactions than in-party cues, or the possibility that out-party cues may actually be more informative than in-party cues. Such possibilities could be tested by rigorously comparing the mechanisms underlying each set of party cues.
Comparing the causal anatomies of related treatments also o ers great value in the policy and program evaluation context, where multiple related treatments are o en investigated in individual studies. Because of constraints on resources, as well as logistical and administrative realities, the execution of experimental studies is o en restricted to short periods of time and small subsets of locations. Ideally, however, the e ectiveness of any preferred policy intervention should be generalizable across time and di erent localities. One important means of assessing generalizability is to develop a comprehensive understanding of the mechanisms underlying di erent treatments.
For instance, consider an experimental study on job training programs, aimed at finding employment for lower-income adults. Imagine the study is implemented in a handful of towns and involves two training programs (i.e. two treatments and a control condition of no training). A preliminary analysis of the results may reveal that both programs have roughly equal-sized e ects on employment, and a superficial interpretation of these results would then be that the two programs are interchangeable. However, to enable more e icient policy targeting, it would be useful to investigate the causal mechanism di erences between the two job training programs, as it is possible they achieved their positive e ects on employment via di erent channels. One program may have achieved its primary e ect by increasing the job search motivation of its participants, while the other may have achieved its primary e ect by helping its participants to develop specific skills. If equipped with such knowledge, policymakers would be in a much better position to make optimal decisions on which job training program to introduce in di erent localities, depending upon local economic conditions.

Comparative Causal Mediation (CCM) Estimands
As a frame of reference, consider the single-treatment experimental setting. LetT denote a binary treatment variable, Y an outcome variable, and M an intermediary variable that is a ected by T and that a ectsY . Causal mediation e ects refer to the average e ect ofT onY transmitted via the mediator M . This is o en termed the natural indirect e ect or, in the potential outcomes approach, the average causal mediation e ect (ACME). Following the potential outcomes approach to causal mediation analysis presented by Imai, Keele, and Tingley ( a), Imai, Keele, Yamamoto ( b), let Y (t , m) denote the potential outcome for Y given that the treatment T and the mediator M equal t and m respectively, and let M (t ) denote the potential value for M given that T equals t . The ACME is defined formally as κ(t ) = E [Y (t , M (1))−Y (t , M (0))]. Note that the ACME is a function of t , though in the case of no interaction between the treatment and mediator, the value of the ACME is the same for t = 0, 1. This study deals with a context in which there are multiple related treatments and the researcher is interested in comparing the extent to which those di erent treatments transmit their e ects via a common mediator. For simplicity and conceptual clarity, consider a three-level experimental design that involves a true control condition and two di erent mutually exclusive treatments. The two treatments may be qualitatively di erent or one may be a scaled-up version of the other. Furthermore, there is a single mediator of interest. It may be the case that multiple mediators have been measured in the experiment, but the estimands of interest will be applied within the context of a single mediator at a time.
Let T 1 and T 2 denote two mutually exclusive binary treatments and M denote a common mediator. Now define the potential outcomes Y (t 1 , t 2 , m) and M (t 1 , t 2 ). In the control condition t 1 = t 2 = 0, in the first treatment condition t 1 = 1 and t 2 = 0, and in the second treatment condition t 1 = 0 and t 2 = 1. This allows for defining a separate AC M E j and AT E j for each treatment T j as follows: Note that all e ects (AC M E s and AT E s) are referenced against the pure control condition. As will be shown, in spite of the strong assumptions required for the identification of any single ACME, a weaker set of assumptions-which, notably, does not contain the usual assumption of no unobserved confounding of the mediator-outcome relationship-will allow for consistent or conservative estimation of the following two CCM estimands of interest.

D
. Define the estimands of interest as follows: Estimand 1 : ACME 2 ACME 1 = κ 2 (t 2 ) κ 1 (t 1 ) Estimand 2 : The first estimand measures the extent to which one treatment has a stronger causal mediation e ect transmitted via the mediator of interest relative to the other treatment. In contrast, the second estimand measures the extent to which one treatment has a greater proportion of its total e ect transmitted through the mediator of interest relative to the other treatment, which allows for testing the extent to which the mediator is more important to the overall causal anatomy of one treatment. For additional discussion on the types of research questions and hypotheses each estimand is better suited to address, see Appendix H.

Estimation of Comparative Causal Mediation . Model
Consider a simple random sample of N observations. Let Y i (t 1 , t 2 , m) and M i (t 1 , t 2 ) denote the potential outcomes for unit i . Let T 1i (T 2i ) denote the first (second) treatment indicator, which equals one if unit i receives the first (second) treatment and zero otherwise. The observed mediator M i equals M i (T 1i , T 2i ), and the observed outcomeY i equalsY i (T 1i , T 2i , M i (T 1i , T 2i )). Note that given the mutual exclusivity of the two binary treatments,Y i (1, 1, m) and M i (1, 1) do not exist.
Adapting the semiparametric model introduced by Imai and Yamamoto ( ), the potential outcomes are modeled using the following structural equations: The model shares some basic notational similarities with the parametric structural equation models o en used to describe causal mediation, though a key di erence is that the equations here allow for unit-specific parameters. The relationships implicitly assume that the potential outcomes are linear in m, but are otherwise flexible given mutually exclusive, binary treatments and the unit-specific parameters. In the case of a binary mediator, the relationships become fully flexible and nonparametric. This semiparametric setup highlights the relationship between the ACME as defined under the potential outcomes approach and the natural indirect e ect as defined by structural equation models of causal mediation: In the classic SEM framework (Baron and Kenny ), constant e ects and no interaction between treatment and mediator are assumed. Applying those assumptions to the two-treatment context here yields E [α j i (β i + γ j i t j )] = α j β , where j = 1, 2 denotes the treatment, which is indeed the classic product-of-coe icients result in the SEM framework. However, this study will not assume constant e ects, and a no-interaction assumption will be introduced but then relaxed later.
In addition, define the reduced-form version of the potential outcome Y i (t 1 , t 2 , M i (t 1 , t 2 )) = Y i (t 1 , t 2 ) = χ i + τ 1i t 1 + τ 2i t 2 , which is fully flexible given mutually exclusive, binary treatments.
The average treatment e ects (ATEs) can thus be expressed: Now, following Imai and Yamamoto ( ), the unit-specific parameters can be decomposed into their means and deviations. That is, for each parameter This yields the following set of estimating equations where the individual-level heterogeneity is subsumed into the error terms: The equivalency of the product of coe icients to the natural indirect e ect is specific to the linear SEM formulation, though it has also been shown elsewhere to be a special case that nests within more general frameworks of causal mediation (Jo ; Pearl ). This includes the potential outcomes framework, where it has previously been shown that the ACME is equivalent to αβ under certain conditions (Imai, Keele, Yamamoto b). This reduced-form presentation is also employed in the single-treatment context by Glynn ( ). As shown in the single-treatment context (e.g. Imai, Keele, Yamamoto b), the ATEs can also be equivalently defined with reference to the full potential outcomes Y i (t 1 , t 2 , m) and M i (t 1 , t 2 ) as such: . Assumptions The first identification assumption, which has already been implicit in the potential outcomes notation used up to this point, is the stable unit treatment value assumption (SUTVA).

A
. Stable unit treatment value assumption (SUTVA) To be explicit, the linearity assumption is also reiterated. A . Linear relationships between the potential outcomes and the mediator.
As already described above, while the assumption of linearity seems demanding, it is made trivial by the employment of a binary mediator. Given a binary mediator and the two mutually exclusive binary treatments, the potential outcome model described above is fully saturated and hence "inherently linear" (Angrist and Pischke , p. ). This is why it need not be stated nor assumed that the potential values of the mediator are linear in the treatments. This also helps to justify the exclusion of covariates from the model. In contrast to the case of estimating a single causal mediation e ect, the CCM estimands can be estimated consistently without covariate adjustment, as will be shown shortly; furthermore, inclusion of covariates would invalidate the full saturation, and hence linearity, of the model. The next assumption is that the two treatments, in addition to being mutually exclusive, have been completely randomized: A . Complete randomization of mutually exclusive treatments. Let N 1 denote the number of units assigned to treatment , N 2 the number assigned to treatment , and N − N 1 − N 2 the number assigned to the control condition (neither treatment nor treatment ). Then for any unit i , The third assumption is no treatment-mediator interactions in expectation.
A . No expected interaction between the treatments and mediator.
In other words, this assumption means that equation ( ) becomesY i = λ+δ 1 T 1i +δ 2 T 2i +β M i +ι i . The no-interaction assumption was introduced and formalized to identify the ACME in earlier literature on causal mediation analysis (Robins and Greenland ; Robins ), and it has since been commonly employed to identify the ACME in the single-treatment context. However, as emphasized by Robins ( ) and Imai, Tingley, and Yamamoto ( ), the no-interaction assumption must generally hold at the individual level in the standard single-treatment context. In contrast, here the assumption must simply hold on average. Nonetheless, compared to assumptions and , the no-interaction assumption is more stringent and cannot be guaranteed by design. For this reason, this assumption will be relaxed later (γ 1 and γ 2 will be allowed to be nonzero), and diagnostics will be presented to allow for an empirical assessment of the assumption.
The last assumption pertains to the covariances between the individual-level parameters.
A . No covariance between individual-level treatment and mediator parameters.
This type of no-covariance assumption is also made, implicitly or explicitly, in other approaches to causal mediation (Hong ). For instance, in the classic SEM formulation, the parameters are assumed to be constant structural e ects, thereby meaning they do not vary across units and guaranteeing zero covariance across units. In addition, in the potential outcomes approach to causal mediation as applied to a linear structural form, a conditional version of this assumption is implied by sequential ignorability. See Hong ( , chapter ) for a comprehensive overview of the no-covariance assumption as used in the various statistical approaches to causal mediation analysis. It is worth noting that a conditional version of this assumption is not necessarily any weaker or more plausible than an unconditional version, as there is no empirical or theoretical basis for expecting that any existing covariance between α j i and β i will be attenuated within conditioning strata of the population. This is in contrast to omitted variable bias, which should generally be expected to shrink with stratification.

. Consistent Estimation
Notably, the method presented here dispenses with the assumption of no confounding of the relationship between the mediator and outcome, which is a strong and nonrefutable assumption that is the most o en criticized component of causal mediation analysis (e.g. Gerber and Green ; Bullock, Green, and Ha ; Glynn ; Bullock and Ha ). This assumption is required regardless of the statistical framework used for the identification and estimation of causal mediation e ects, though its formal basis takes di erent forms depending on the statistical framework. In the SEM approach, this takes the form of recursivity or no correlation between the errors of the di erent equations, while in the potential outcomes framework, the unconfoundedness of the mediator-outcome relationship is implied by the "sequential ignorability" assumption. Notably, methods of sensitivity analysis have been developed to systematically assess the impact of violations of this assumption (e.g. Imai, Keele, Yamamoto b). However, while such analyses allow for evaluation of the sensitivity of causal mediation estimates, they do not enable the recovery of consistent or unbiased estimates.
In the formulation here, such an assumption would take the form of E [ι i T 1i , T 2i , M i ] = 0. Because the mediator has not been randomized, however, this assumption is di icult to justify and impossible to test; hence, this assumption will not be made. With the assumptions that are made, described above, it can be shown that estimation of β via linear least squares regression results in the bias term E var(η i ) . In contrast, α j can be estimated consistently and without bias for both j = 1, 2. The key implication of these results is that, if comparing two treatments and their mediated e ects via the same mediator, then a common bias a licts both ACME estimates. As Imai, Keele, Yamamoto ( b) note, the sequential ignorability assumption implies a set of assumptions developed by Pearl ( ), which includes the independence between the potential values of the outcome and the potential values of the mediator. In the linear structural form, α i is a function of the potential values of the mediator, while β i is a function of the potential values of the outcome. The independence between the potential values of the outcome and the potential values of the mediator implies the independence between these functions, thus implying independence between α i and β i . By corollary, this means the unavoidable mediation bias does not prevent us from comparing the causal mediation anatomies of two di erent treatments, as long as we are doing so in terms of the same mediator.

P
. Callτ N 2 ,τ N 1 ,α N 2 ,α N 1 , andβ N the linear least squares regression estimators of the parameters from equations ( ), ( ), and ( ) given a simple random sample of size N from a larger population. Given assumptions -, then the following estimators converge in probability to the estimands of interest under the usual generalized linear regression regularity conditions: In sum, the CCM estimands can be estimated consistently through the simple use of linear least squares regression estimators.

. Scope Conditions and Issues in Ratio Estimation
A number of issues have long been noted with the use and interpretation of ratio estimators, and the estimators proposed here are no exception. In particular, their ratio form has important implications for the scope conditions under which they are useful and reliable, their small-sample tendencies, uncertainty estimation, and statistical power. These issues are discussed below.

. . Scope Conditions
In addition to the obvious precondition of an experimental design featuring multiple treatments, there are other key scope conditions that dictate when the CCM methods will be usable or useful. First, each estimand is only useful when both the numerator and denominator can be estimated as having the same sign and with su icient statistical precision. This is, first and foremost, a conceptual precondition as the estimands are conceptually meaningful and interpretable only when the ACMEs for both treatments are presumed to be nonzero in the same direction. In addition, this is also an important statistical consideration. Indeed, it has long been known that ratio estimators exhibit finite-sample distributional behavior that is di icult to formally characterize (except under special conditions) and has important implications for their central tendencies and dispersion (e.g. Fieller ). Given their ratio form, the CCM estimators presented in this study share the same fundamental problem of potentially "dividing by zero" as that of weak instruments in instrumental variables (IV) estimation (Nelson and Startz ). Research over the past two decades to develop best practices for detecting weak instruments is thus informative here (see Andrews, Stock, and Sun ( ) for an overview). Earlier research on the matter provided the rule-of-thumb recommendation, which continues to be widely used, that IV estimates for a single endogenous regressor be considered reliable only when tests of the first-stage regression yield an F statistic greater than 10 (Staiger  and Stock ; Stock, Wright, and Yogo ), and more recent research has highlighted that this simple decision rule provides relatively reliable guidance in the single-instrument case (Stock and Yogo ; Olea and Pflueger ; Andrews, Stock, and Sun ). Given that single-instrument IV estimation is a simple ratio estimator itself, this rule of thumb thus provides useful scope conditions for the CCM estimators as well. To implement this decision rule, first note that the two CCM estimators can be re-expressed asα byθ d , and consider the estimator unreliable if the following statistic is less than 10: .
Third, the estimands are also likely to be most useful when the two treatments themselves have nonzero treatment e ects of the same sign as the ACMEs, and where one treatment does not clearly dominate the other. This is again a matter of both conceptual clarity and statistical properties. Conceptually, there may be limited theoretical or practical insights to be gained from comparing the mediation e ects if one treatment is orders of magnitude larger than the other. This should generally not be the case, however, in the context of comparing closely related treatments, which is the motivating context for the CCM methods. In addition, note that the treatment e ect estimateτ N 2 is a component of the denominator in the second estimator and hence covered by the decision rule presented above.

. . Finite-Sample Adjustments
Even in the case where the scope conditions above are met, the CCM estimators are not exactly centered on the true estimand in finite samples due to their ratio form. This divergence becomes negligible as the sample size grows, and in smaller samples, finite-sample adjustments can be made. One simple and well-established method of deriving finite-sample corrections for estimators of functions, such as ratio estimators, involves Taylor series expansions (e.g. Cochran ; Withers ; Lehmann and Casella , chapter ). In this vein, Appendix B presents adjusted estimators for both CCM estimands that include finite-sample corrections derived using Taylor series expansion. Simulations, presented below, compare the adjusted estimators over the simple estimators in small samples.

. . Uncertainty Estimation
Because the estimators employ ratios in which the distribution of the denominator may have positive probability density at zero, these estimators do not necessarily have finite-sample moments. This pathological problem is characteristic of ratio estimators in general, and it theoretically complicates the calculation of confidence intervals for those estimators. The existence of probability density at the point where the denominator equals zero creates a singularity in the distribution of a ratio estimator, which can result in the mysterious unbounded confidence interval. Yet traditional methods for constructing confidence sets do not necessarily take this property into account, and it has been shown that "any method which cannot generate unbounded confidence limits for a ratio leads to arbitrary large deviations from the intended confidence level" (von Luxburg and Franz ; Gleser and Hwang ; Koschat ; Hwang ). This issue has been studied extensively, with exact solutions derived in some special cases (e.g. Fieller ) and approximation techniques based on the bootstrap developed for more general cases (Hwang ; von Luxburg and Franz ). However, it has also been shown that in spite of the mathematical problems with ratio estimators, the use of standard methods for the practical estimation of confidence intervals can yield approximately correct coverage under the reasonable condition that the confidence interval is actually bounded at the desired α level, which is met when the 1 − α confidence interval of the denominator does not contain zero (Franz ). This should be met by the scope conditions presented above, which will provide for estimator denominators that are su iciently bounded away from zero and hence allow for the use of standard methods of confidence-interval construction, such as the Delta Method and bootstrap techniques.
As in general, a su iciently large sample size is also necessary for analytic methods that rely on the central limit theorem, and for bootstrap methods to adequately approximate the population distribution.

. . Power
As observed by researchers of causal mediation analysis, there is a relative dearth of general methods to compute power and sample size requirements for causal mediation estimators (Fairchild and McDaniel ; VanderWeele , chapter ). One exception is a study by Fritz and MacKinnon ( ), which provides a table of basic power and sample size requirements based on simulations. However, given the limited number of specifications considered, these results do not allow researchers to compute power or sample size requirements for their own specific scenarios. In the CCM context, there is additional complexity in computing power given the ratio functional form and the additional parameters to estimate.
One recommended method of proceeding with a power analysis in the context of complex causal mediation models is to employ customized Monte Carlo simulations (Thoemmes, MacKinnon, and Reiser ; Zhang ; Fairchild and McDaniel ). In particular, Zhang ( ) presents a simulation-based method using bootstrap inference that can be adapted to the CCM estimators by simulating the model equations ( )-( ). Under the no-interaction assumption, only equations ( ) and ( ) would need to be simulated given howβ N drops out of the estimators. As generally the case in power analyses, implementation would require hypothesized parameter values and variance estimates, in this case the variance of the error terms, which could be obtained from previous or pilot studies. The power to reject the null hypothesis that either estimand equals 1 at a specific level of confidence could then be computed for a given sample size, or the required sample size could be determined to achieve a desired level of power. See Zhang ( ) for systematic instructions on implementation.

Simulations
To illustrate the properties of the CCM method, this section presents a simulation. Simulated causal mediation data were generated according to the following model, with the output of the first equation feeding into the second equation: T 1 and T 2 are indicator variables that were generated such that an equal number of units were randomly assigned to (a) neither treatment, (b) T 1 , and (c) T 2 , with no units assigned to both T 1 and T 2 . The rest of the variables and parameters were generated as follows: As indicated, the parameters were generated to vary independently across units, yielding heterogeneous e ects with zero covariance between α j and β for j = 1, 2. Further, the data were also generated with no interaction between T j and M for j = 1, 2. Along with the linear form and the exogeneity ofT j for j = 1, 2, all assumptions established above are met by the data-generating process. Once the data were generated, the mean values of the parameters α 1 , α 2 , and β -as well as τ 1 and τ 2 -were estimated by linear least squares regression according to equations ( )-( ) with γ 1 and γ 2 assumed to be zero. Thus X was omitted from the estimation process, simulating unobserved confounding.  In the results presented in Figure , the model was simulated times with a total of units per simulation ( assigned to each of the two treatments and assigned to neither treatment). Each panel in the plot displays the point estimates from each simulation for a di erent estimand, along with % confidence intervals constructed via the nonparametric percentile bootstrap. The solid lines denote confidence intervals that cover the true value, whereas the dashed lines denote lack of coverage. The panels in the top row correspond to the traditional causal mediation estimands: The panels in the bottom row correspond to the CCM estimands, with both simple and small-sample adjusted estimators presented. The panels note the coverage of the confidence intervals, the true value of the estimand, and the mean estimate over all simulations. As can be seen, Figure clearly shows how the traditional ACME estimators (top row) are biased and exhibit confidence-interval undercoverage given the presence of unmeasured confounders (X ). The top le two panels show that the estimators of ACME 1 and ACME 2 are biased upward by approximately . and , resulting in only % and % coverage of the % confidence intervals. The story is the same for the top right two panels, which show the estimates of the proportions mediated for each treatment.
In contrast to the clear bias of the traditional causal mediation estimators, the bottom row shows that the CCM estimators are properly centered and exhibit good coverage. The bottom le two panels present the estimators of the ACME ratio, the first being the simple estimator and the second being the small-sample adjusted estimator. As can be seen, both perform well in recovering a mean estimate close to the true estimand value and good confidence-interval coverage (subject to simulation error). In addition, the small-sample adjustments slightly improve the mean estimates, but in doing so they also substantially inflate the variance and increase the number of confidence intervals that blow up below zero from 3 to 18. The results are the same in the bottom right two panels, which show the simple and adjusted estimators for the ratio of proportions mediated. Again, the small-sample adjustments slightly improve the mean estimates at the cost of inflated variance, and an increase in the number of confidence intervals that blow up below zero from 4 to 8.

Relaxing the No-Interaction Assumption . Setup
Following Imai and Yamamoto ( ), the semiparametric model presented earlier, equations ( )-( ), can proceed without assumption and hence allow for treatment-mediator interactions, which has been referred to by some scholars as a version of moderated mediation (James and Brett ; Preacher ). In this case, of interest are functions of the ACMEs for subsamples, namely for the treated units, κ j (1), and for the control units, κ j (0): The same results as presented above (assuming no interactions) continue to apply in this case with regards to the ACMEs for the control units, κ 1 (0) and κ 2 (0). However, the CCM estimands are likely to be of greater theoretical and practical interest in terms of the ACMEs for the treated units. In this case, the estimands of interest are as follows: .

P
. Without loss of generality, assume that both the numerator and denominator of the estimator are positive, and that the estimator is greater than (i.e. the numerator is larger than the denominator). Callτ N 2 ,τ N 1 ,α N 2 ,α N 1 ,β N ,γ N 2 ,γ N 1 the linear least squares regression estimators of the parameters from equations ( ), ( ), and ( ) given a simple random sample of size N from a larger population. Letω N 1 =β N +γ N 1 andω N 2 =β N +γ N 2 . Further call ξ 1 and ξ 2 the asymptotic bias components ofω N 1 andω N 2 , respectively (i.e. plim N →∞ω N 1 − ω 1 = ξ 1 and plim N →∞ω N 2 − ω 2 = ξ 2 ). Make assumptions , , , and . Then, given ω 2 ξ 1 > ω 1 ξ 2 , the following holds: The result is that, given the conditions described in Proposition , the bias attenuates the estimates of the two CCM estimands. Since these results were presented without loss of generality Loeys et al. ( ) describe specific conditions under whichγ 2 andγ 1 are unbiased estimators even whenβ is not.
in the context where the estimands are greater than , this means that the attenuated estimates will be conservative. In other words, the estimates will be biased in favor of the null hypothesis that the estimands equal . Note that while assumption was relaxed, Proposition introduces the following additional condition: ω 2 ξ 1 > ω 1 ξ 2 . As shown in Appendix C, this condition can be partially assessed empirically.
. Additional Notes Similar to the case in which the no-interaction assumption is maintained, finite-sample adjustments can be derived for the CCM estimators when relaxing the no-interaction assumption. Appendix B presents these finite-sample adjustments. In addition, Appendix D presents simulation results when the no-interaction assumption has been relaxed.

Application: International Law and Audience Costs . Background
Does international law a ect state behavior? There is a longstanding scholarly debate on this question, with some political scientists and legal scholars viewing international law as largely epiphenomenal to state interests and power (e.g. Downs, Rocke, and Barsoom ; Goldsmith and Posner ), and others seeing international law as having a real impact on state decision making (e.g. Goldstein ). Among the latter group, many scholars have identified domestic political processes and institutions as an important conduit through which national governments can be induced to honor their international legal obligations, even in cases where those governments did not intend to comply in the first place (Simmons ; Trachtman ; Hathaway ; Moravcsik ; Dai ; Abbott and Snidal ; Risse-Kappen, Ropp, and Sikkink ). The electoral compliance mechanism, in which governments are incentivized to maintain compliance with international legal agreements under the threat of electoral punishment for violations, is one possible domestic source of compliance.
In a number of recent studies using survey experiments, political scientists have accumulated evidence that voters in the United States and elsewhere are indeed inclined to punish elected o icials who renege on previous foreign policy commitments (Tomz ; McGillivray and Smith ; Chaudoin ; Chilton ; Hyde ). The political costs that a government incurs as a result of constituents disapproving of violations of policy commitments-which may manifest in the form of electoral power in democracies or via the threat of protest and dissent in nondemocracies-are generally referred to as domestic "audience costs" (Fearon ; Morrow ; Tomz ; Weeks ; Jensen ). The types of foreign policy commitments that have been investigated in this literature vary widely. This includes commitments targeted at a purely domestic audience, such as promises by national leaders to their constituents not to engage in certain behavior or activities. This also includes commitments directed at other countries, such as threats made against aggressor countries and promises to aid allies in the event of conflict. Finally, this also includes legally formalized international commitments, such as agreements codified in treaties.
The application presented here focuses on the legal dimension of foreign policy commitments and its relationship with audience costs. An important gap remains in the relevant scholarship: while studies have shown that public disapproval of a foreign policy decision tends to increase when that policy decision requires reneging on international legal commitments, these studies have not isolated the role of legality per se in generating that disapproval. Instead, the design of these studies has masked the extent to which such disapproval is attributable to the baseline breaking of the commitment (i.e. the audience costs for not honoring a policy pledge in general) versus the additional legal status of the commitment. In other words, does the dimension of international legality actually enhance audience costs, and if it does, to what extent and why is that the case?
Indeed, in scholarship on public attitudes toward international commitments, much of the international relations literature tends to abstract away the distinctive nature of legality and treat international legal commitments as generic international commitments. The implication of such a framing is that legality should not a ect the prospect for audience costs. Yet there are, of course, reasons to believe that voters will respond more negatively to home government violations of foreign policy commitments when those violations also entail breaking international law. Voters may view legal commitments as uniquely serious and solemn forms of commitment, the violation of which is considered particularly objectionable, in which case legality should increase the prospect for audiences costs. While this has been suggested in the literature (Lipson ; Abbott and Snidal ; Simmons and Hopkins ), it has not been explicitly tested.

. Study Design
In order to address this gap in the literature, the author designed and implemented a novel survey experiment embedded in an online survey administered in August , with U.S.-based respondents recruited via Amazon Mechanical Turk. The experiment revolved around a security scenario in which the U.S. government decided to take military action against ISIS forces in Iraq. Appendix E provides the survey instrument text and variable coding rules. Appendix F provides sample demographic distributions and balance statistics across treatment conditions. Tests of the relationship between the treatment assignment and demographic covariates fail to reject the null hypothesis of independence at the . significance level, indicating good balance.
The scenario involved a U.S. military operation in Iraq to capture ISIS militants who were threatening rocket attacks on neighboring countries but were hiding in a civilian zone. Respondents were told that in order to avoid collateral damage, the U.S. military deployed commandos in a covert operation, in which the commandos used an ostensibly nonlethal incapacitating chemical gas to neutralize the ISIS militants. The incapacitating gas was featured in the scenario in order to exploit real-world ambiguity surrounding the international legality of chemical incapacitants in unconventional operations, as well as ambiguity surrounding the lethality of these chemical agents. Because of this ambiguity and the technical nature of the legal categorization of chemical incapacitants, survey respondents should not be expected to identify such agents as clearly illegal, in contrast to well-known chemical warfare agents. At the same time, it is also plausible and hence reasonable to convince respondents that these chemical incapacitants are illegal under the Chemical Weapons Convention. As a result, it was possible to e ectively intervene upon respondents' knowledge of the legal status of these chemical incapacitants.
There were two primary goals of the research. The first goal was to disentangle the dimension of (il)legality from the baseline violation of a foreign policy commitment more explicitly than have previous studies, thereby creating a more valid design to answer the research question: Does the international legal status of a foreign policy commitment increase the potential for domestic audience costs if that commitment is violated? To achieve this goal, the experimental design featured two mutually exclusive treatment conditions in addition to a control condition. In the control condition, respondents were simply told about the U.S. government's decision to use military force employing chemical incapacitants. In the first "informal" treatment condition, respondents were additionally told that this decision constituted a violation of the This research was approved by the Institutional Review Board at Stanford University (Protocol ). While the illegality of chemical incapacitants is probably the most widely accepted position among arms control legal experts, some experts have argued otherwise in terms of the use of chemical incapacitants under certain conditions. For an overview of the debate, see Ballard ( ).
U.S. government's previous foreign policy commitment, but they were not given any information about international legality. In the second "legal" treatment condition, respondents were told that this decision constituted a violation of the U.S. government's international legal commitment. There were two outcome variables of interest. The first measured the extent to which respondents (dis)approved of the policy decision to use chemical incapacitants, and the second measured the extent to which respondents would be likely to vote for a U.S. Senator who supported the policy decision. Both variables were measured in the survey on a five-point scale.
To allow for easier interpretation, the analysis presented here employs dichotomized versions of these variables: whether or not the respondent disapproved, which will be called Disapproval, and whether or not the respondent would be less likely to vote for a supportive U.S. Senator, which will be called Punishment.
The second research goal was to identify and better understand the contours of public opinion that determine the extent to which legalization does (or does not) amplify audience costs. In addition to measuring Disapproval and Punishment, respondents' perceptions of the (im)morality of the decision to use chemical incapacitants were also measured and investigated as a mediator. Normative or moral aversion represents one possible mechanism that could lead violations of international commitments, whether legalized or not, to result in public disapproval. Previous research has highlighted and tested a variety of possible mechanisms, including morality, whereby international law may a ect public opinion (Chilton ; Chilton and Versteeg ). The application presented here focuses specifically on the morality mechanism because perceptions of immorality represent one of the earliest theoretical reasons noted by international relations scholars of international law that voters would more strongly disapprove of violations of legalized foreign policy commitments versus similar nonlegalized commitments (Abbott and Snidal ). In addition, Appendix G presents additional analysis that probes into a second possible mechanism: concerns that other countries would follow suit in developing or using chemical incapacitants and hence harm U.S. security in the long run. Other possible mechanisms that could also be active in the international security context but were not tested include fear of more immediate international retaliation or enforcement, beliefs about the e icacy of prohibited actions or behaviors, and concerns about impact on national reputation.
To test the morality mechanism, a mediator variable was constructed by asking respondents about the degree to which they believed the policy decision to use chemical incapacitants was morally right or wrong. Similar to the dependent variables, this mediator was measured on a fivepoint scale, and it is dichotomized to facilitate interpretation in the analysis. The binary version of the mediator captures whether or not each respondent believed the policy decision to be immoral, which will be called Perceived Immorality. This enables estimation of the portion of each treatment e ect, ATE 1 (informal) and ATE 2 (legal), that is transmitted via Perceived Immoralitythat is, estimation of ACME 1 and ACME 2 .
As described above, the problem with traditional causal mediation analysis is that, even with pretreatment covariates included as controls, those mediated e ects are likely to be biased and inconsistent. However, under the assumptions stated earlier, the CCM estimands can be estimated consistently (or conservatively). The first estimand ACME 2 ACME 1 measures the extent to which the morality mediator transmits a stronger e ect for the legal treatment than for the informal treatment. The second estimand ( ACME 2 ATE 2 )/( ACME 1 ATE 1 ) measures the extent to which the morality mediator comprises a larger proportion of the total e ect of (i.e. is more important for) the legal treatment, compared to the informal treatment.
The decision was made to focus on punishment of senators rather than the president under the assumption that this would decrease the amount of partisan priming respondents were exposed to, thereby allowing for better and less contaminated measurement of their attitudes toward the scenario. .

Results
The results of the survey experiment provide statistically and substantively strong evidence that the legal treatment does indeed cause a larger increase in the probability of Disapproval and Punishment than the informal treatment, as shown by Table , providing support for the theory that legalization enhances audience costs. Specifically, the legal treatment had an estimated . -percentage-point larger e ect on the probability of Disapproval and a . -percentage-point larger e ect on the probability of Punishment than the informal treatment.
More importantly in the context of this study, however, the results of the CCM analysis also provide support for the theory that this enhancement of audience costs by legalization is, at least in part, due to an increase in Perceived Immorality. Table shows the results of the CCM analysis.
The assumption of no interaction between the treatments and mediator was tested in the case of both dependent variables. The test failed to reject the null hypothesis of no interactions in the case of the Disapproval dependent variable, and hence the no-interaction assumption was maintained in that case.
However, the test rejected the null hypothesis of no interactions in the case of the Punishment dependent variable, which is why the causal mediation estimates in the Punishment case involve the ACMEs for the treated (ACMETs)-that is κ 1 (1) and κ 2 (1). Furthermore, additional tests provide support for the conditions necessary for the CCM estimators to be conservative given the interactions between the treatments and mediator. Specifically, the tests provide evidence that ω 2 ξ 1 > ω 1 ξ 2 .
Table presents the causal mediation results, including the estimates of each treatment's mediation e ect transmitted via the morality mechanism as well as the CCM e ects. Note that the individual AC M E estimates should not be interpreted at face value themselves as they are used specifically as inputs for the CCM estimators and are likely to be individually biased and inconsistent. In contrast, under the assumptions presented in this study, the CCM estimates (presented in bold) can be interpreted. Given the large sample size, these estimates were obtained using the simple estimators, and the % confidence intervals were computed via the nonparametric percentile bootstrap. As can be seen, the estimates of the ratio of mediation e ects, AC M E 2 AC M E 1 , are statistically (and substantively) distinguishable from 1 for both dependent variables. These estimates can be interpreted as meaning that the e ect on Disapproval (Punishment) mediated via Perceived Immorality is about % ( %) larger for the legal treatment than for the informal treatment. In contrast, the estimates of the ratio of proportions mediated, , are not statistically distinguishable from for either dependent variable. This means that while Perceived Immorality transmitted a larger e ect for the legal treatment than the informal treatment, it did not necessarily constitute a larger proportion of the overall ATE for the legal treatment.
In combination, these results suggest that Perceived Immorality is an important factor that leads to a scaling up of the audience costs e ect given legalization. Yet it appears that other mediation channels also help scale up that e ect such that while the mediation channel via Perceived Immorality expands, it does not increase as a proportion of the total e ect. Appendix G presents the results when analyzing the variables on their raw five-point scale. While on a di erent scale, the results remain substantively and statistically unchanged. .

Discussion
In addition to illustrating the CCM methods, the results of this application also contribute to the literature on audience costs. As described above, the results add to the recent accumulation of experimental evidence that reneging on foreign policy commitments can indeed substantially decrease approval of the policy decision in question. The ATEs estimated in this application, of approximately to percentage points greater disapproval, are substantively large and consistent in magnitude with the higher end of e ects detected in previous experimental research on audience costs.
In addition, this application makes a more novel contribution in specifically distinguishing between audience costs e ects when the violated commitment is legalized versus not legalized. The roughly -to -percentage-point boost attributable to legalization in this application provides new evidence on the extent to which legalization enhances audience costs. Furthermore, the CCM results provide support for the theory that international legalization enhances audience costs specifically by amplifying the perceived immorality of violating the commitment. However, the results also suggest that this is not the only mechanism by which legalization enhances As explained in Appendix C, this is tested partially by verifying thatω 2 Var(M i T 1i = 0, T 2i = 1) >ω 1 Var(M i T 1i = 1, T 2i = 0). The finite-sample adjusted estimates are virtually identical, as should be expected given the sample size. For instance, the adjusted estimate of AC M E 2 AC M E 1 for the Disapproval dependent variable is . , and the adjusted estimate of AC M ET 2 AC M ET 1 for the Punishment dependent variable is . . These results correspond to the case of "proportionate scaling up" presented in Table H in Appendix H. For instance, the seminal experimental study by Tomz ( ) estimated audience cost e ects between -and -percentage-point increases in disapproval in the context of security commitments and escalation management. Follow-up research in this area (e.g. Levendusky and Horowitz ) has also estimated e ects of up to approximately percentage points. Other experimental research on audience costs in areas of international legal and regulatory cooperation (e.g. Chaudoin ; Chilton ) have detected smaller e ects of roughly -percentage-point increases in disapproval. audience costs. In fact, additional evidence presented in Appendix G shows that another important mediation channel that contributes to these results is the fear of concrete international consequences or harm. In the scenario, this takes the form of concerns that other countries would follow suit in developing and potentially using similar weapons in the future, thus harming U.S. security in the long run.
In sum, legalization appears to have the potential to add to the domestic sources of credible commitment via multiple channels. However, the evidence presented here pertains to a specific international security context. Whether these findings would hold in other policy areas would be useful to explore in future research. For instance, in contexts where normative considerations are less salient, the morality channel may play a smaller role. The same argument could be made for the international consequences channel in contexts where the possibility of other countries reciprocating or retaliating is less of a concern. In such cases, would legalization continue to enhance audience costs, and if so, via what channels?

Conclusions
This study has introduced a novel set of causal mediation estimands which compare the causal mediation e ects of multiple treatments. It has shown that these estimands can be estimated consistently or conservatively under weaker assumptions than can any single ACME. In particular, the usual assumption of no confounding of the mediator-outcome relationship, which is required for consistent estimation of a single ACME, is not necessary in the CCM context presented in this study.
Of course, the usefulness of these CCM methods is limited to experimental designs that feature multiple treatments, which are less common than single-treatment designs in many research settings. However, with the gradual accumulation of knowledge and empirical results in various academic subfields and program evaluation contexts, experimental research questions will increasingly evolve to require evaluating multiple treatments-that is, investigating the relative strengths and comparing the causal anatomies of distinct but conceptually or administratively related treatments-rather than simply testing the e ects of single treatments. The method of CCM analysis presented in this study provides a new tool for researchers who are interested in comparing, discovering, and testing the causal mechanism di erences between multiple treatments, and would like to do so under the weakest possible set of assumptions.