Skip to main content Accessibility help
×
Home

Causal Inference without Balance Checking: Coarsened Exact Matching

  • Stefano M. Iacus (a1), Gary King (a2) and Giuseppe Porro (a3)

Abstract

We discuss a method for improving causal inferences called “Coarsened Exact Matching” (CEM), and the new “Monotonic Imbalance Bounding” (MIB) class of matching methods from which CEM is derived. We summarize what is known about CEM and MIB, derive and illustrate several new desirable statistical properties of CEM, and then propose a variety of useful extensions. We show that CEM possesses a wide range of statistical properties not available in most other matching methods but is at the same time exceptionally easy to comprehend and use. We focus on the connection between theoretical properties and practical applications. We also make available easy-to-use open source software for R, Stata, and SPSS that implement all our suggestions.

Copyright

Corresponding author

e-mail: king@harvard.edu (corresponding author)

Footnotes

Hide All

Edited by Jonathan N. Katz

Authors' note: Open source R, Stata, and SPSS software to implement the methods described herein (called CEM) is available at http://gking.harvard.edu/cem; the CEM algorithm is also available via a standard interface offered in the R package MatchIt. Thanks to Erich Battistin, Nathaniel Beck, Matt Blackwell, Andy Eggers, Adam Glynn, Justin Grimmer, Jens Hainmueller, Ben Hansen, Kosuke Imai, Guido Imbens, Fabrizia Mealli, Walter Mebane, Clayton Nall, Enrico Rettore, Jamie Robins, Don Rubin, Jas Sekhon, Jeff Smith, Kevin Quinn, and Chris Winship for helpful comments. All information necessary to replicate the results in this paper appear in Iacus, King, and Porro (2011b).

Footnotes

References

Hide All
Abadie, Alberto, and Gardeazabal, Javier. 2003. The economic costs of conflict: A case study of the Basque Country. American Economic Review 93: 113–32.
Abadie, Alberto, and Imbens, Guido W. 2007. Bias-corrected matching estimators for average treatment effects. Unpublished manuscript. http://ksghome.harvard.edu/aabadie/research.html.
Austin, Peter C., and Mamdani, Muhammad M. 2006. A comparison of propensity score methods: A case-study estimating the effectiveness of post-AMI statin use. Statistics in Medicine 25: 2084–106.
Battistin, Erich, and Chesher, Andrew. 2004. The impact of measurement error on evaluation methods based on strong ignorability. Working paper, Institute for Fiscal Studies, London.
Carpenter, Daniel Paul. 2002. Groups, the media, agency waiting costs, and FDA drug approval. American Journal of Political Science 46: 490505.
Cochran, William G., and Rubin, Donald B. 1973. Controlling bias in observational studies: A review. Sankhya: The Indian Journal of Statistics, Series A 35, Part 4:417–66.
Crump, Richard K., Joseph Hotz, V., Imbens, Guido W., and Mitnik, Oscar. 2009. Dealing with limited overlap in estimation of average treatment effects. Biometrika 96: 187.
Dehejia, Rajeev H., and Wahba, Sadek. 1999. Causal effects in nonexperimental studies: Re-evaluating the evaluation of training programs. Journal of the American Statistical Association 94: 1053–62.
Dehejia, Rajeev H., and Wahba, Sadek. 2002. Propensity score matching methods for non-experimental causal studies. Review of Economics and Statistics 84: 151–61.
Diamond, Alexis, and Sekhon, Jasjeet. 2005. Genetic matching for estimating causal effects: A new method of achieving balance in observational studies. Working paper, http://jsekhon.fas.harvard.edu/ (accessed 2005).
Freedman, David, and Diaconis, Persi. 1981. On the histogram as a density estimator: L2 theory. Probability Theory and Related Fields 57: 453–76.
Galdo, Jose, Smith, Jeffrey, and Black, Dan. 2008. Bandwidth selection and the estimation of treatment effects with unbalanced data. Working paper, University of Michigan.
Girosi, Federico, and King, Gary. 2008. Demographic forecasting. Princeton, NJ: Princeton University Press. Unpublished manuscript. http://gking.harvard.edu/files/smooth/ (accessed 2008).
Hansen, Ben. 2008. The prognostic analogy of the propensity score. Biometrika 95: 481–88.
Heckman, James, Ichimura, H., and Todd, P. 1997. Matching as an econometric evaluation estimator: Evidence from evaluating a job training program. Review of Economic Studies 64: 605–54.
Ho, Daniel, Imai, Kosuke, King, Gary, and Stuart, Elizabeth. 2007. Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Analysis 15: 199236. http://gking.harvard.edu/files/abs/matchp-abs.shtml (accessed 2007).
Iacus, Stefano M., King, Gary, and Porro, Giuseppe. 2009. CEM: Coarsened Exact Matching Software. Journal of Statistical Software 30(9), http://gking.harvard.edu/cem.
Iacus, Stefano M., King, Gary, and Porro, Giuseppe. 2011. Multivariate matching methods that are Monotonic Imbalance Bounding. Journal of the American Statistical Association. http://gking.harvard.edu/files/abs/cem-math-abs.shtml.
Iacus, Stefano M., King, Gary, and Porro, Giuseppe. 2011b. Replication data for: Causal inference without balance checking: Coarsened Exact Matching. Murray Research Archive [distributor] V1 [version]. http://hdl.handle.net/1902.1/15601.
Iacus, Stefano M., and Porro, Giuseppe. 2007. Missing data imputation, matching and other applications of random recursive partitioning. Computational Statistics and Data Analysis 52: 773–89.
Iacus, Stefano M., and Porro, Giuseppe. 2008. Invariant and metric free proximities for data matching: An R package. Journal of Statistical Software 25(11): 122.
Iacus, Stefano M., and Porro, Giuseppe. 2009. Random recursive partitioning: A matching method for the estimation of the average treatment effect. Journal of Applied Econometrics 24: 163–85.
Imai, Kosuke, King, Gary, and Nall, Clayton. 2009. The essential role of pair matching in cluster-randomized experiments, with application to the Mexican universal health insurance evaluation. Statistical Science 24(1): 2953. http://gking.harvard.edu/files/abs/cluster-abs.shtml.
Imai, Kosuke, King, Gary, and Stuart, Elizabeth. 2008. Misunderstandings among experimentalists and observationalists about causal inference. Journal of the Royal Statistical Society, Series A 171, 2: 481502. http://gking.harvard.edu/files/abs/matchse-abs.shtml (accessed 2008).
Imai, Kosuke, and van Dyk, D. A. 2004. Causal inference with general treatment regimes: Generalizing the propensity score. Journal of the American Statistical Association 99: 854–66.
Imbens, Guido W. 2000. The role of the propensity score in estimating dose-response functions. Biometrika 87: 706–10.
Imbens, Guido W. 2003. Sensitivity to exogeneity assumptions in program evaluation. American Economic Review 96: 126–32.
Imbens, Guido W. 2004. Nonparametric estimation of average treatment effects under exogeneity: A review. Review of Economics and Statistics 86: 429.
Imbens, Guido W., and Angrist, Joshua D. 1994. Identification and estimation of local average treatment effects. Econometrica 62: 467–75.
King, Gary, Honaker, James, Joseph, Anne, and Scheve, Kenneth. 2001. Analyzing incomplete political science data: An alternative algorithm for multiple imputation. American Political Science Review 95: 4969. http://gking.harvard.edu/files/abs/evil-abs.shtml (accessed 2001).
King, Gary, Nielsen, Richard, Coberley, Carter, Pope, James, and Wells, Aaron. 2011. Comparative effectiveness of matching methods for causal inference.
King, Gary, and Zeng, Langche. 2006. The dangers of extreme counterfactuals. Political Analysis 14: 131–59. http://gking.harvard.edu/files/abs/counterft-abs.shtml.
King, Gary, and Zeng, Langche. 2007. When can history be our guide? The pitfalls of counterfactual inference. International Studies Quarterly 51: 183210. http://gking.harvard.edu/files/abs/counterf-abs.shtml.
Lalonde, Robert. 1986. Evaluating the econometric evaluations of training programs. American Economic Review 76: 604–20.
Lu, Bo, Zanuto, Elaine, Hornik, Robert, and Rosenbaum, Paul R. 2001. Matching with doses in an observational study of a media campaign against drug abuse. Journal of the American Statistical Association 96: 1245–53.
Manski, Charles F. 1995. Identification problems in the social sciences. Cambridge, MA: Harvard University Press.
Mielke, Paul W., and Berry, Kenneth J. 2007. Permutation methods: A distance function approach. New York: Springer.
Morgan, Stephen L., and Winship, Christopher. 2007. Counterfactuals and causal inference: Methods and principles for social research. Cambridge: Cambridge University Press.
Rosenbaum, Paul R., Ross, Richard N., and Silber, Jeffrey H. 2007. Minimum distance matched sampling with fine balance in an observational study of treatment for ovarian cancer. Journal of the American Statistical Association 102: 7583.
Rubin, Donald B. 1976. Inference and missing data. Biometrika 63: 581–92.
Rubin, Donald B. 1987. Multiple imputation for nonresponse in surveys. New York: John Wiley.
Rubin, Donald B. 2001. Using propensity scores to help design observational studies: Application to the tobacco litigation. Health Services & Outcomes Research Methodology 2: 169–88.
Rubin, Donald B. 2006. Matched sampling for causal effects. Cambridge, UK: Cambridge University Press.
Scott, David W. 1992. Multivariate density estimation. Theory, practice and visualization. New York: John Wiley & Sons, Inc.
Shimazaki, Hideaki, and Shinomoto, Shigeru. 2007. A method for selecting the bin size of a time histogram. Neural Computation 19: 1503–27.
Smith, Jeffrey A., and Todd, Petra E. 2005. Does matching overcome LaLonde's critique of nonexperimental estimators? Journal of Econometrics 125: 305–53.
Washington, Ebonya L. 2008. Female socialization: How daughters affect their legislator fathers' voting on woman's issues. American Economic Review 98: 311–32.
MathJax
MathJax is a JavaScript display engine for mathematics. For more information see http://www.mathjax.org.

Metrics

Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed