## 1. Introduction

Political scientists employ maximum likelihood (ML) to estimate a variety of statistical models. ML estimators have desirable and widely understood properties. But for many research questions, the model coefficient estimates do not directly interest the researcher. Following King *et al.* (Reference King, Tomz and Wittenberg2000), researchers often use the coefficient estimates to compute substantively meaningful quantities of interest, such as predicted probabilities, expected counts, marginal effects, and first differences. The literature offers two methods to compute point estimates for these quantities of interest.

Researchers estimate quantities of interest either by simulating quantities of interest and then averaging (e.g., King *et al.*, Reference King, Tomz and Wittenberg2000) or by directly transforming coefficients into quantities of interest (e.g., Herron, Reference Herron1999).Footnote ^{1} In practice, researchers’ choice between these two approaches seems idiosyncratic rather than principled, depending on their preferred software package rather than any statistical criteria. Further, the methodological literature has not distinguished or compared the two approaches to estimating quantities of interest.

How does the simulation approach compare to directly transforming coefficients? Which should researchers prefer? Or are the two approaches interchangeable, as the literature seems to imply? Rainey (Reference Rainey2017) shows that directly transforming coefficients into quantities of interest creates “transformation-induced” bias. I show that when researchers use the average of simulations to estimate quantities of interest, they replicate the logic of transformation-induced bias and add an additional, unnecessary bias to their estimates. I refer to this additional bias as “simulation-induced” bias. I show that simulation-induced bias occurs *in addition to* transformation-induced bias and is approximately the same size and direction. While this bias is usually small relative to the standard error, methodologists should not recommend methods that add unnecessary bias to point estimates. Instead, we should recommend methods that better adhere to the usual justifications.

## 2. The current practice

### 2.1. The plug-in estimator

First, researchers can directly transform the coefficient estimates into quantities of interest. The invariance property of ML estimators allows a researcher to find the ML estimate of a function of a parameter (i.e., a quantity of interest) by first using ML to estimate the model parameter and then applying the function to that estimate (or “plugging in the estimate”) (King, Reference King1998, 75–76; Casella and Berger, Reference Casella and Berger2002, 320–321). I refer to this approach as the “plug-in” estimator. Importantly, the plug-in estimator *remains an ML estimator*. Thus, it retains all of the (desirable) properties of ML estimators (e.g., consistency, asymptotic efficiency).

As a concrete example, suppose a researcher uses ML to estimate a statistical model in which *y* _{i} ~ *f*(*θ* _{i}), where *i* ∈ {1, …, *N*} and *f* represents a probability distribution. The parameter *θ* _{i} connects to a design matrix *X* of *k* explanatory variables and a column of ones by a link function *g*( ⋅ ), so that *g*(*θ* _{i}) = *X* _{i}*β*, where $\beta \in {\opf R}^{k + 1}$ represents a vector of parameters with length *k* + 1. The researcher uses ML to compute estimates $\hat {\beta }^{{\rm mle}}$ for the parameter vector *β*. I denote the function that transforms model coefficients into quantities of interest as *τ*( ⋅ ). For example, if the researcher uses a logit model and focuses on the predicted probability for a specific observation *X* _{c}, then $\tau ( \beta ) = {\rm logit}^{-1}( X_c \beta ) = {1}/({1 + {\rm e}^{-X_c\beta })}$. The researcher can use the invariance property to quickly obtain a ML estimate of the predicted probability: $\hat {\tau }^{{\rm mle}} = \tau ( \hat {\beta }^{{\rm mle}}) = {\rm logit}^{-1} ( X_c \hat {\beta }^{{\rm mle}} ) = {1}/({1 + {\rm e}^{-X_c \hat {\beta }^{{\rm mle}}})}$.

### 2.2. The average-of-simulations estimator

Second, researchers can use the average of simulated quantities of interest as the point estimator. King *et al.* (Reference King, Tomz and Wittenberg2000) suggest the following approach:

1.

*Fit the model.*Use ML to estimate the model coefficients $\hat {\beta }^{{\rm mle}}$ and their covariance $\hat {V} ( \hat {\beta }^{{\rm mle}} )$.2.

*Simulate the coefficients.*Simulate a large number*M*of coefficient vectors $\tilde {\beta }^{( i) }$, for*i*∈ {1, 2, …,*M*}, using $\tilde {\beta }^{( i) } \sim {\rm MVN} [ \hat {\beta }^{{\rm mle}},\; \hat {V} ( \hat {\beta }^{{\rm mle}}) ]$, where MVN represents the multivariate normal distribution.3.

*Convert simulated coefficients into simulated quantity of interest.*Compute $\tilde {\tau }^{( i) } = \tau ( \tilde {\beta }^{( i) })$ for*i*∈ {1, 2, …,*M*}. Most quantities of interest depend on the values of the explanatory variables. In this situation, researchers either focus on a specific observation (typically some kind of “average case”) or average across all sample observations (Hanmer and Kalkan, Reference Hanmer and Kalkan2013). In any case, the transformation*τ*( ⋅ ) includes this choice.Footnote^{2}4.

*Average the simulations of the quantity of interest.*Estimate the quantity of interest using the average of the simulations of the quantity of interest, so that $\hat {\tau }^{{\rm avg}} = {1}/{M} \sum _{i = 1}^{M} \tilde {\tau }^{( i) }$.Footnote^{3}

I refer to this as the “average-of-simulations” estimator. But what are the properties of this estimator?

While the estimates it provides are sometimes similar to well-behaved plug-in estimates, King *et al.* (Reference King, Tomz and Wittenberg2000) develop their method informally. Much of their theoretical argument happens quickly when they write “we draw many plausible sets of parameters from their posterior or sampling distribution” (349).Footnote ^{4} One might justify their method from a frequentist perspective by first thinking of their method as “informal” Bayesian posterior simulation (Gelman and Hill, Reference Gelman and Hill2007). This is helpful because the theory and practice of simulating from a posterior distribution are well-developed and widely understood. The Berstein–von Mises theorem (Van der Vaart, Reference Van der Vaart2000, 140–146) guarantees, under relatively weak assumptions, that posterior simulations are asymptotically equivalent to the simulation procedure suggested by King, Tomz, and Wittenberg. And because the point estimator $\hat {\beta }^{{\rm mle}}$ (and functions of $\hat {\beta }^{{\rm mle}}$) are consistent, then the mean of the simulations (and the mean of functions of the simulations) are consistent as well. Therefore, one can defend $\hat {\tau }^{{\rm avg}}$ on the grounds that it is a consistent estimator of *τ*.

However, the small sample properties of the average-of-simulations remain poorly understood. Methodologists and applied researchers seem to assume that $\hat {\tau }^{{\rm avg}}$ is interchangeable with $\hat {\tau }^{{\rm mle}}$. But below, I show that the average-of-simulations algorithm compounds the transformation-induced bias described by Rainey (Reference Rainey2017) and adds a similar bias to the estimate that I refer to as “simulation-induced bias.”

## 3. A theory of simulation-induced bias

Before developing the theory of simulation-induced bias, I review the concept of transformation-induced bias from Rainey (Reference Rainey2017). As Rainey (Reference Rainey2017) shows, transforming unbiased model coefficient estimates can introduce bias into estimates of quantities of interest. Rainey (Reference Rainey2017, 404) decomposes the bias in the estimate of the quantity of interest, which he refers to as total *τ*-bias, into two components: transformation-induced *τ*-bias and coefficient-induced *τ*-bias. Rainey (Reference Rainey2017) defines these as

Transformation-induced *τ*-bias behaves systematically. The shape of the transformation *τ*( ⋅ ) determines the direction of the bias. In general, any strictly convex (concave) *τ*( ⋅ ) creates upward (downward) transformation-induced *τ*-bias. The direction and magnitude of the coefficient-induced *τ*-bias depend on the choice of *τ*( ⋅ ) and the bias in the coefficient estimates, but an unbiased estimator $\hat {\beta }^{\rm mle}$ implies the absence of coefficient-induced *τ*-bias.

But Rainey (Reference Rainey2017) does not consider the average-of-simulations estimator. This raises the question: Does the average-of-simulations estimator $\hat {\tau }^{{\rm avg}}$ suffer the same transformation-induced bias as the plug-in estimator $\hat {\tau }^{\rm mle}$? I now turn to the average-of-simulations estimator and develop the idea of “simulation-induced bias.”

If the transformation of coefficient estimates into an estimate of the quantity of interest is always convex (or always concave), then Jensen's inequality allows the simple statement relating $\hat {\tau }^{\rm avg}$ and $\hat {\tau }^{{\rm mle}}$ given in Lemma 1.

Lemma 1: Suppose a nondegenerate ML estimator $\hat {\beta }^{\rm mle}$. Then any strictly convex (concave) *τ*( ⋅ ) guarantees that $\hat {\tau }^{{\rm avg}}$ is strictly greater (less) than $\hat {\tau }^{\rm mle}$.

This result is intuitive. Since I simulate using a multivariate normal distribution, $\tilde {\beta }$ has a symmetric distribution. But the distribution of $\tau ( \tilde {\beta })$ is *not* symmetric. If $\tilde {\beta }$ happens to fall below the mode $\hat {\beta }^{\rm mle}$, then *τ*( ⋅ ) pulls $\tau ( \tilde {\beta })$ in toward $\hat {\tau }^{\rm mle}$. If $\tilde {\beta }$ happens to fall above the mode $\hat {\beta }^{\rm mle}$, then *τ*( ⋅ ) pushes $\tau ( \tilde {\beta })$ away from $\hat {\tau }^{\rm mle}$. This creates a right-skewed distribution for $\tau ( \tilde {\beta })$, which pushes the average $\hat {\tau }^{\rm avg}$ above $\hat {\tau }^{\rm mle}$. See Appendix A for the proof.

For a convex transformation, Lemma 1 shows that $\hat {\tau }^{\rm avg}$ is always larger than $\hat {\tau }^{\rm mle}$. I refer to the expectation of this difference between $\hat {\tau }^{\rm avg}$ and $\hat {\tau }^{\rm mle}$ as “simulation-induced bias,” so that

Theorem 1: compares the sum of simulation- and transformation-induced *τ*-bias in $\hat {\tau }^{\rm avg}$ to transformation-induced *τ*-bias in $\hat {\tau }^{\rm avg}$.

Theorem 1 Suppose a nondegenerate ML estimator $\hat {\beta }^{\rm mle}$. Then for any strictly convex or concave *τ*( ⋅ ), the sum of the simulation-induced and transformation-induced *τ*-bias in $\hat {\tau }^{{\rm avg}}$ is strictly greater in magnitude than the transformation-induced *τ*-bias in $\hat {\tau }^{{\rm mle}}$.

Regardless of the direction of simulation-induced and transformation-induced *τ*-bias, Theorem 1 shows that the magnitude of the combination in $\hat {\tau }^{{\rm avg}}$ is *always* larger than the transformation-induced bias alone in $\hat {\tau }^{{\rm mle}}$ for strictly convex or concave *τ*( ⋅ ). The proof follows directly from Jensen's inequality, but see Appendix A for the details.

Theorem 1 shows that $\hat {\tau }^{\rm avg}$ compounds transformation-induced *τ*-bias with simulation-induced *τ*-bias. But is this bias substantively important? An analytical approximation provides a helpful guideline.

I approximate the simulation-induced *τ*-bias in $\hat {\tau }^{\rm avg}$ as

where the remaining expectation occurs with respect to $\hat {\beta }^{\rm mle}$, $H( \hat {\beta }^{\rm mle} )$ represents the Hessian matrix of second derivatives of *τ*( ⋅ ) at the point $\hat {\beta }^{\rm mle}$ and, conveniently, $\hat {V} ( \hat {\beta }^{{\rm mle}})$ represents the estimated covariance matrix for $\hat {\beta }^{\rm mle}$.

This approximation appears similar to the approximation for the transformation-induced *τ*-bias, which (adjusting notation slightly) Rainey (Reference Rainey2017, 405, Equation (1)) computes as

where $H[ {\rm E} ( \hat {\beta }^{\rm mle} ) ]$ represents the Hessian matrix of second derivatives of *τ*( ⋅ ) at the point ${\rm E} ( \hat {\beta }^{\rm mle} )$ and $V ( \hat {\beta }^{{\rm mle}} )$ represents the covariance matrix of the sampling distribution of $\hat {\beta }^{\rm mle}$.

When one compares Equations (2) and (3), they yet again compare the *expectation of a function* to the *function of the expectation*. Therefore, Equations (2) and (3) are not exactly equal. But, as a rough guideline, one should expect them to be similar. And to the extent that the two are similar, the additional simulation-induced *τ*-bias in $\hat {\tau }^{\rm avg}$ is about the same as the transformation-induced *τ*-bias in $\hat {\tau }^{\rm mle}$.

Because of the similarity between Equations (2) and (3), the simulation-induced *τ*-bias becomes large under the conditions identified by Rainey (Reference Rainey2017) as leading to large transformation-induced *τ*-bias: when the non-linearity in the transformation *τ*( ⋅ ) is severe and when the standard errors of $\hat {\beta }^{\rm mle}$ are large. While the transformation-induced *τ*-bias vanishes as the number of observations grows large, it can be substantively meaningful for the sample sizes commonly encountered in social science research. In standard modeling situations, Rainey (2017) demonstrates that transformation-induced bias can (1) be larger than the bias in the estimates of the model coefficients and (2) shrink to zero more slowly as the sample size increases. By extension, the same claims hold for simulation-induced bias (which, again, appears *in addition to* transformation-induced bias).

## 4. The intuition of simulation-induced bias

To develop the intuition for the theoretical results above, I examine a stylized example with simulations, an alternative analytical approach, and an empirical example.

### 4.1. Using a drastic, convex transformation: *τ*(*μ*) = *μ* ^{2}

To develop an intuition for the simulation-induced *τ*-bias in $\hat {\tau }^{\rm avg}$, consider the simple (unrealistic, but heuristically useful) scenario in which *y* _{i} ~ N(0, 1), for *i* ∈ {1, 2, …, *n* = 100}, and the researcher wishes to estimate *μ* ^{2}. Suppose that the researcher knows that the variance equals one but does not know that the mean *μ* equals zero. The researcher uses the unbiased ML estimator $\hat {\mu }^{\rm mle} = {\sum _{i = 1}^n y_i}/{n}$ of *μ*, but ultimately cares about the quantity of interest *τ*(*μ*) = *μ* ^{2}. The researcher can use the plug-in estimator $\hat {\tau }^{\rm mle} = ( \hat {\mu }^{\rm mle} ) ^2$ of *τ*(*μ*). Alternatively, the researcher can use the average-of-simulations estimator, estimating *τ*(*μ*) as $\hat {\tau }^{\rm avg} = {1}/{M} \sum _{i = 1}^M \tau ( \tilde {\mu }^{( i) } )$, where $\tilde {\mu }^{( i) } \sim {\rm N} ( \hat {\mu }^{\rm mle},\; {1}/{\sqrt {n}})$ for *i* ∈ {1, 2, …, *M*}.

The true value of the quantity of interest is *τ*(0) = 0^{2} = 0. However, the ML estimator $\hat {\tau }^{\rm mle} = ( \hat {\mu }^{\rm mle} ) ^2$ equals zero if and only if $\hat {\mu }^{\rm mle} = 0$. Otherwise, $\hat {\tau }^{\rm mle} > 0$. Since $\hat {\mu }^{\rm mle}$ almost surely differs from zero, $\hat {\tau }^{\rm mle}$ is biased upward.

Moreover, even if $\hat {\mu }^{\rm mle} = 0$, $\tilde {\mu }^{( i) }$ almost surely differs from zero. If $\tilde {\mu }^{( i) } \neq 0$, then $( \tilde {\mu }^{( i) } ) ^2 > 0$. Thus, $\hat {\mu }^{\rm avg}$ is almost surely larger than the true value *τ*(*μ*) = 0 even when $\hat {\mu } = 0$.

I illustrate this fact clearly by repeatedly simulating *y* and computing $\hat {\tau }^{\rm mle}$ and $\hat {\tau }^{\rm avg}$. Figure 1 shows the first four of 10,000 total simulations. The figure shows how the unbiased estimate $\hat {\mu }^{\rm mle}$ is translated into $\hat {\tau }^{\rm mle}$ and $\hat {\tau }^{\rm avg}$.

First, to find $\hat {\tau }^{\rm avg}$, I complete three steps: (1) simulate $\tilde {\mu }^{( i) } \sim {\rm N} ( \hat {\mu }^{\rm mle},\; {1}/{10})$ for *i* ∈ {1, 2, …, *M* = 1,000}, (2) calculate $\tilde {\tau }^{( i) } = \tau ( \tilde {\mu }^{( i) } )$, and (3) calculate $\hat {\tau }^{\rm avg} = {1}/{M} \sum _{i = 1}^M \tilde {\tau }^{( i) }$. The rug plot along the horizontal axis shows the distribution of $\tilde {\mu }$. The hollow points in Figure 1 shows the transformation of each point $\tilde {\mu }^{( i) }$ into $\tilde {\tau }^{( i) }$. The rug plot along the vertical axis shows the distribution of $\tilde {\tau }$. Focus on the top-left panel of Figure 1. Notice that $\hat {\mu }^{\rm mle}$ estimates the true value *μ* = 0 quite well. However, after simulating $\tilde {\mu }$ and transforming $\tilde {\mu }$ into $\tilde {\tau }$, the $\tilde {\tau }$s fall far from the true value *τ*(0) = 0. The dashed orange line shows the average of $\tilde {\tau }$. Notice that although $\hat {\mu }^{\rm mle}$ is unusually close to the truth *μ* = 0 in this sample, $\hat {\tau }^{\rm avg}$ is much too large.

Second, to find $\hat {\tau }^{\rm mle}$, I transform $\hat {\mu }^{\rm mle}$ directly using $\hat {\tau }^{\rm mle} = ( \hat {\mu }^{\rm mle}) ^2$. The solid green lines show this transformation. The convex transformation *τ*( ⋅ ) has the effect of lengthening the right tail of the distribution of $\tilde {\tau }$, pulling the average well above the mode. This provides the basic intuition for Lemma 1.

The remaining panels of Figure 1 repeat this process with three more random samples. Each sample presents a similar story—the convex transformation stretches the distribution of $\tilde {\tau }$ to the right, which pulls $\hat {\tau }^{\rm avg}$ above $\hat {\tau }^{\rm mle}$.

I repeat this process 10,000 total times to produce 10,000 estimates of $\hat {\mu }^{\rm mle}$, $\hat {\tau }^{\rm mle}$, and $\hat {\tau }^{\rm avg}$. Figure 2 shows the density plots for the 10,000 estimates (i.e., the sampling distributions of $\hat {\mu }^{\rm mle}$, $\hat {\tau }^{\rm mle}$, and $\hat {\tau }^{\rm avg}$). As I show analytically, $\hat {\mu }^{\rm mle}$ is unbiased with a standard error of ${\sigma }/{\sqrt {n}} = {1}/{\sqrt {100}} = {1}/{10}$. Both $\hat {\tau }^{\rm mle}$ and $\hat {\tau }^{\rm avg}$ are biased upward, but, as Theorem 1 suggests, $\hat {\tau }^{\rm avg}$ has more bias than $\hat {\tau }^{\rm mle}$. And as the approximation suggestions, $\hat {\tau }^{\rm avg}$ has about twice the bias of $\hat {\tau }^{\rm mle}$. Indeed, the exact bias of each estimator is easy to compute for this simple example. Appendix B shows that the biases are 1/*n* = 1/100 and 2/*n* = 2/100 in this example.

### 4.2. Using the law of iterated expectations

One can also develop the argument analytically via the law of iterated expectations. It helps to alter the notation slightly, making two implicit dependencies explicit. I explain each change below and use the alternate, more expansive notation only in this section.

The law of iterated expectations states that ${\rm E} _Y ( {\rm E} _{X \mid Y}( X \mid Y) ) = {\rm E} _X( X)$, where *X* and *Y* represent random variables. The three expectations occur with respect to three different distributions: ${\rm E} _Y$ denotes the expectation with respect to the marginal distribution of *Y*, ${\rm E} _{X \mid Y}$ denotes the expectation with respect to the conditional distribution of *X*| *Y*, and ${\rm E} _X$ denotes the expectation with respect to the marginal distribution of *X*.

Outside of this section, the reader should understand that the distribution of $\tilde {\beta }$ depends on $\hat {\beta }^{\rm mle}$ and could be written as $\tilde {\beta } \mid \hat {\beta }^{\rm mle}$. To remain consistent with previous work, especially Herron (Reference Herron1999) and King *et al.* (Reference King, Tomz and Wittenberg2000), I use $\tilde {\beta }$ to represent $\tilde {\beta } \mid \hat {\beta }^{\rm mle}$. The definition of $\tilde {\beta }$ makes this usage clear. In this section only, I use $\tilde {\beta } \mid \hat {\beta }^{\rm mle}$ to represent the conditional distribution of $\tilde {\beta }$ and use $\tilde {\beta }$ to represent the unconditional distribution of $\tilde {\beta }$. Intuitively, one might imagine (1) generating a data set *y*, (2) estimating $\hat {\beta }^{\rm mle}$, and (3) simulating $\tilde {\beta } \mid \hat {\beta }^{\rm mle}$. If I perform steps (1) and (2) just once, but step (3) repeatedly, I generate a sample from the conditional distribution $\tilde {\beta } \mid \hat {\beta }^{\rm mle}$. If I perform steps (1), (2), and (3) repeatedly, then I generate a sample from the unconditional distribution $\tilde {\beta }$. The unconditional distribution helps us understand the nature of the simulation-induced *τ*-bias.

Applying the law of iterated expectations, I obtain ${\rm E} _{\tilde {\beta }} ( \tilde {\beta } ) = {\rm E} _{\hat {\beta }^{\rm mle}}( {\rm E} _{\tilde {\beta } \mid \hat {\beta }^{\rm mle}} ( \tilde {\beta } \mid \hat {\beta }^{\rm mle}) )$. The three identities below connect the three key quantities from Theorem 1 to three versions of ${\rm E} _{\hat {\beta }^{\rm mle}}( {\rm E} _{\tilde {\beta } \mid \hat {\beta }^{\rm mle}} ( \tilde {\beta } \mid \hat {\beta }^{\rm mle}) )$, with the transformation *τ*( ⋅ ) applied at different points.

If I subtract Equation (5) from Equation (4), I obtain the transformation-induced *τ*-bias in $\hat {\tau }^{\rm mle}$ (see Equation (1) for the definition of transformation-induced *τ*-bias). To move from Equation (4) to (5) I must swap *τ*( ⋅ ) with an expectation once. This implies that, if *τ*( ⋅ ) is convex, Equation (5) must be greater than Equation (4). This, in turn, implies that the bias is positive.

To obtain the *τ*-bias in $\hat {\tau }^{\rm avg}$ I must subtract Equation (6) from (4). But to move from Equation (4) to (6) I must swap *τ*( ⋅ ) with an expectation *twice*. Again, if *τ*( ⋅ ) is convex, then Equation (6) must be greater than Equation (4). However, because one should expect $\hat {\beta }^{\rm mle}$ and $\tilde {\beta } \mid \hat {\beta }^{\rm mle}$ to have similar distributions, one should expect the additional swap to roughly double the bias in $\hat {\tau }^{\rm avg}$ compared to $\hat {\tau }^{\rm mle}$. This additional swap creates the additional, simulation-induced *τ*-bias.

### 4.3. Using a re-analysis of Holland (2015)

Holland (Reference Holland2015) presents a nuanced theory that describes the conditions under which politicians choose to enforce laws and supports the theoretical argument with a rich variety of evidence. In particular, she elaborates on the *electoral* incentives of politicians to enforce laws. I borrow three Poisson regressions and hypotheses about a single explanatory variable to illustrate how the plug-in estimates can differ from the average-of-simulations estimate.

Holland writes:

My first hypothesis is that enforcement operations drop off with the fraction of poor residents in an electoral district. So district poverty should be a negative and significant predictor of enforcement, but only in politically decentralized cities [Lima and Santiago]. Poverty should have no relationship with enforcement in politically centralized cities [Bogota] once one controls for the number of vendors.

I use Holland's hypothesis and data to illustrate the behavior of the average-of-simulations and plug-in estimators. I refit Model 1 from Table 2 in Holland (Reference Holland2015) for each city. I then use each model to compute the percent increase in the enforcement operations for each district in the city if the percent of the district in the lower class dropped by half. For example, in the Villa Maria El Triunfo district in Lima, 84 percent of the district is in the lower class. If this dropped to 42 percent, then the average-of-simulations estimate suggests that the number of enforcement operations would increase by about 284 percent (from about 5 to 20). The plug-in estimate, on the other hand, suggests an increase of 264 percent (from about 5 to 17). The plug-in estimate, then, is about 7 percent smaller than the average-of-simulations estimate—a small, but noticeable shrinkage.

Figure 3 shows how the estimates change (usually shrink) for all districts when I switch from the average-of-simulations estimates to plug-in estimates. Table 1 presents the details for the labeled cases in Figure 3. In Bogota, the estimate shrinks by 11 percent in Sante Fe and 16 percent in Usme. In Lima, the estimate shrinks by 5 percent in Chacalacayo and 7 percent Villa Maria El Triunfo. The shrinkage is much larger in Santiago, where the standard errors for the coefficient estimates are much larger. The estimate shrinks by about 47 percent in San Ramon and 53 percent in La Pintana. The median shrinkage is 6 percent in Bogota, 2 percent in Lima, and 37 percent in Santiago. For many districts in Santiago, the average-of-simulations estimate is about *twice* the plugin estimate. These estimates clearly show that the average of simulations and the ML estimate can differ meaningfully in actual analyses.

^{a}Quantity of interest; percent change in enforcement operations when the percent in the lower class drops by half.

^{b}Enforcement operations when the percent in the lower class equals its observed value.

^{c} Enforcement operations when the percent in the lower class equals half its observed value.

^{d} Shrinkage in the quantity of interest due to switching from the average of simulations to the ML estimator.

## 5. Conclusion

Many social scientists turn to King *et al.* (Reference King, Tomz and Wittenberg2000) for advice on interpreting, summarizing, and presenting empirical results. The authors improved empirical research by highlighting the importance of substantively meaningful quantities of interest. I agree with King and Zeng's (Reference King and Zeng2006) summary of the literature: “[w]hether such effects are calculated via analytical derivation or what is now the more common approach of statistical simulation, political scientists have made much progress in learning how to make sophisticated methods speak directly to their substantive research questions” (132).

Researchers estimate quantities of interest either by averaging simulated quantities of interest (e.g., CLARIFY in Stata, Zelig in R, or clarify in R) or using the invariance property of ML estimators (e.g., margins in Stata and R). In practice, researchers’ choice between these two estimators seems idiosyncratic rather than principled, depending on their preferred software package rather than any statistical criteria. The methodological literature recommends both, but has not distinguished or compared the two approaches to estimating quantities of interest.

When researchers use the average of simulations (King *et al.*, 2000) to estimate quantities of interest, they replicate the logic of transformation-induced bias (Rainey, 2017) and add simulation-induced bias to the estimates. This additional bias is roughly the same magnitude and direction as transformation-induced bias and occurs *in addition to* transformation-induced bias. While the additional bias is usually small relative to the standard error, methodologists should not recommend methods that add unnecessary bias to point estimates. Instead, we should recommend methods that better adhere to the usual evaluative standards. Even if we recommend using simulation to estimate standard errors, we should not recommend averaging simulations to obtain the point estimate. Instead, researchers should directly transform ML estimates of coefficients to obtain ML estimates of the quantities of interest. The resulting point estimates inherit the desirable properties of ML estimators and avoid unnecessary simulation-induced bias.

## Supplementary material

The supplementary material for this article can be found at https://doi.org/10.1017/psrm.2023.8. All data and computer code necessary to reproduce the results are available on Dataverse at https://doi.org/10.7910/DVN/5YKD4S.

## Acknowledgments

I thank Bill Berry, Christopher Gandrud, Michael Hanmer, John Holbein, Gary King, Justin Kirkland, Thomas Leeper, Matt Pietryka, Arthur Spirling, Michael Tomz, Jason Wittenberg, and Chris Wlezien for helpful comments. Holger Kern initiated the conversation that led to the paper and provided helpful feedback throughout its development. I thank an anonymous reviewer for pointing out the analytical results in Appendix B. I also thank audiences at Florida State University and the 2018 Texas Methods Meeting for productive discussions. All remaining errors are my own.