Skip to main content
×
×
Home

Practical and Effective Approaches to Dealing With Clustered Data*

  • Justin Esarey and Andrew Menger
Abstract

Cluster-robust standard errors (as implemented by the eponymous cluster option in Stata) can produce misleading inferences when the number of clusters G is small, even if the model is consistent and there are many observations in each cluster. Nevertheless, political scientists commonly employ this method in data sets with few clusters. The contributions of this paper are: (a) developing new and easy-to-use Stata and R packages that implement alternative uncertainty measures robust to small G, and (b) explaining and providing evidence for the advantages of these alternatives, especially cluster-adjusted t-statistics based on Ibragimov and Müller. To illustrate these advantages, we reanalyze recent work where results are based on cluster-robust standard errors.

Copyright
Footnotes
Hide All
*

Justin Esarey is an Assistant Professor of Political Science, Rice University, 6100 Main St, MS-24, Houston, TX 77005 (justin@justinesarey.com). Andrew Menger, Ph.D. Candidate, Department of Political Science, Rice University, 6100 Main St, MS-24, Houston, TX 77005 (Andrew.M.Menger@rice.edu). The authors thank Ulrich Müller, Carlisle Rainey, Jonathan Kropko, Matthew Webb, Neal Beck, Jens Hainmueller, Shuai Jin, Jens Grosser, Ernesto Reuben, our anonymous reviewers, and participants at the 2015 Annual Meeting of the Midwest Political Science Association, the 2015 Annual Meeting of the Society for Political Methodology, and the 2016 Annual Meeting of the Southern Political Science Association for helpful comments and suggestions on earlier drafts of this paper. To view supplementary material for this article, please visit https://doi.org/10.1017/psrm.2017.42

Footnotes
References
Hide All
Anderson, Theodore W. 2003. An Introduction to Multivariate Statistical Analysis 3rd ed. New York, NY: Wiley.
Angrist, Joshua D., and Pischke, Jorn-Steffen. 2009. Mostly Harmless Econometrics: An Empiricist's Companion. Princeton, NJ: Princeton University Press.
Arellano, Manuel. 1987. ‘Computing Robust Standard Errors for Within-Groups Estimators’. Oxford Bulletin of Economics and Statistics 49(4):431434.
Bafumi, Joseph, and Gelman, Andrew. 2006. ‘Fitting Multilevel Models When Predictors and Group Effects Correlate’. Available at http://goo.gl/usvQsn, accessed 21 December 2017.
Bakirov, Nail K., and Szekely, Gabor J.. 2006. ‘Student’s t-Test for Gaussian Scale Mixtures’. Journal of Mathematical Sciences 139(3):64976505.
Bates, Douglas, Maechler, Martin, Bolker, Ben, and Walker, Steven. 2014. ‘lme4: Linear Mixed Effects Models Using Eigen and S4’. R package version 1.1-7. Available at http://CRAN.R-project.org/package=lme4, accessed 21 December 2017.
Beck, Nathaniel, and Katz, Jonathan N.. 1995. ‘What To Do (And Not To Do) With Time-Series Cross-Section Data’. American Political Science Review 89(3):634647.
Beck, Nathaniel L., Katz, Jonathan N., and Mignozzetti, Umberto G.. 2014. ‘Of Nickell Bias and its Cures: Comment on Gaibulloev, Sandler, and Sul’. Political Analysis 22(2):274278.
Bertrand, Marianne, Duflo, Esther, and Mullainathan, Sendhil. 2004. ‘How Much Should We Trust Differences-In-Differences Estimates?’. The Quarterly Journal of Economics 119(1):249275.
Brambor, Thomas, Clark, William Roberts, and Golder, Matthew. 2006. ‘Understanding Interaction Models: Improving Empirical Analyses’. Political Analysis 14(1):6382.
Cameron, A. Colin, and Miller, Douglas L.. 2015. ‘A Practitioner’s Guide to Cluster-Robust Inference’. Journal of Human Resources 50(2):317372.
Cameron, A. Colin, Gelbach, Jonah B., and Miller, Douglas L.. 2008. ‘Bootstrap-Based Improvements for Inference With Clustered Errors’. Review of Economics and Statistics 90(3):414427.
Cameron, A. Colin, and Trivedi, Pravin K.. 2005. Microeconometrics: Methods and Applications. Cambridge, UK: Cambridge University Press.
Canay, Ivan A., Romano, Joseph P., and Shaikh, Azeem M.. 2014. ‘Randomization Tests Under an Approximate Symmetry Assumption’. Working Paper (version: December 19, 2014). Available at https://goo.gl/TUEQee, accessed 29 January 2017.
Clark, Tom S., and Linzer, Drew A.. 2015. ‘Should I Use Fixed or Random Effects?’. Political Science Research and Methods 3(2):399408.
Croissant, Yves. 2015. ‘Package “mlogit”.’ CRAN. Available at http://cran.r-project.org/web/packages/mlogit/mlogit.pdf, accessed 21 December 2017.
Croissant, Yves, and Millo, Giovanni. 2008. ‘Panel Data Econometrics in R: The plm Package’. Journal of Statistical Software 27(2):143.
Donald, Stephen G., and Lang, Kevin. 2007. ‘Inference With Difference-in-Differences and Other Panel Data’. The Review of Economics and Statistics 89(2):221233.
Donner, Allan. 1998. ‘Some Aspects of the Design and Analysis of Cluster Randomization Trials’. Journal of the Royal Statistical Society: Series C (Applied Statistics) 47(1):95113.
Efron, Bradley. 1979. ‘Bootstrap Methods: Another Look at the Jackknife’. Annals of Statistics 7(1):126.
Field, Chris A., and Welsh, Alan H.. 2007. ‘Bootstrapping Clustered Data’. Journal of the Royal Statistical Society: Series B 69(3):369390.
Gaibulloev, Khusrav, Sandler, Todd, and Sul, Donggyu. 2014. ‘Dynamic Panel Analysis Under Cross-Sectional Dependence’. Political Analysis 22:258273.
Green, Donald P., and Vavreck, Lynn. 2008. ‘Analysis of Cluster-Randomized Experiments: A Comparison of Alternative Estimation Approaches’. Political Analysis 16(2):138152.
Grosser, Jens, Reuben, Ernesto, and Tymula, Agnieszka. 2013. ‘Political Quid Pro Quo Agreements: An Experimental Study’. American Journal of Political Science 57:582597.
Hainmueller, Jens, Hiscox, Michael, and Sequeira, Sandra. 2015. ‘Consumer Demand for the Fair Trade Label: Evidence from a Multistore Field Experiment’. Review of Economics and Statistics 97(2):242256.
Hansen, Christian B. 2007. ‘Asymptotic Properties of a Robust Variance Matrix Estimator for Panel Data When T is Large’. Journal of Econometrics 141(2):597620.
Harden, Jeffrey J. 2011. ‘A Bootstrap Method for Conducting Statistical Inference With Clustered Data’. State Politics & Policy Quarterly 11(2):223246.
Hardin, James W., and Hilbe, Joseph M.. 2003. Generalized Estimating Equations. Boca Raton, FL: Chapman & Hall/CRC.
Horowitz, Joel L. 1997. ‘Bootstrap Methods in Econometrics: Theory and Numerical Performance’. In David M. Kreps and Kenneth F. Wallis (eds), Advances in Economics and Econometrics: Theory and Applications: Seventh World Congress, 189222. Cambridge, UK: Cambridge University Press.
Hu, Feifang, and Kalbeisch, John D.. 2000. ‘The Estimating Function Bootstrap’. Canadian Journal of Statistics 28(3):449481.
Ibragimov, Rustam, and Müller, Ulrich K.. 2010. ‘t-Statistic Based Correlation and Heterogeneity Robust Inference’. Journal of Business & Economic Statistics 28(4):453468.
Imbens, Guido W., and Kolesar, Michal. 2012. ‘Robust Standard Errors in Small Samples: Some Practical Advice’ 98(4):701–12.
Judge, George G., Hill, R. Carter, Griffths, William E., Lutkepohl, Helmut, and Lee, Tsoung-Chao. 1988. Introduction to the Theory and Practice of Econometrics. New York, NY: Wiley.
Kezdi, Gabor. 2004. ‘Robust Standard Error Estimation in Fixed-Effects Panel Models’. Hungarian Statistical Review 9:95116.
King, Gary, and Roberts, Margaret E.. 2014. ‘How Robust Standard Errors Expose Methodological Problems They Do Not Fix, and What to Do About it’. Political Analysis 23:159179.
Klar, Neil, and Donner, Allan. 2001. ‘Current and Future Challenges in the Design and Analysis of Cluster Randomization Trials’. Statistics in Medicine 20(24):37293740.
Lacina, Bethany. 2014. ‘How Governments Shape the Risk of Civil Violence: India’s Federal Reorganization, 1950–56’. American Journal of Political Science 58(3):720738.
Liang, Kung-Yee, and Zeger, Scott L.. 1986. ‘Longitudinal Data Analysis Using Generalized Linear Models’. Biometrika 73(1):1322.
Liang, Kung-Yee, and Zeger, Scott L.. 1993. ‘Regression Analysis for Correlated Data’. Annual Review of Public Health 14(1):4368.
Liu, Regina Y. 1988. ‘Bootstrap Procedures Under Some Non-I.I.D. Models’. The Annals of Statistics 16(4):16961708.
Liu, Regina Y., and Singh, Kesar. 1987. ‘On a Partial Correction by the Bootstrap’. The Annals of Statistics 15(4):17131718.
MacKinnon, James G. 2015. ‘Wild Cluster Bootstrap Confidence Intervals’. L’Actualité économique 91(1-2):1133.
MacKinnon, James G., and Webb, Matthew D.. 2017. ‘Wild Bootstrap Inference for Wildly Different Cluster Sizes’. Journal of Applied Econometrics 32(2):233254.
Mancl, Lloyd A., and DeRouen, Timothy A.. 2001. ‘A Covariance Estimator for GEE With Improved Small-Sample Properties’. Biometrics 57(1):126134.
Moulton, Brent R. 1986. ‘Random Group Effects and the Precision of Regression Estimates’. Journal of Econometrics 32(3):385397.
Moulton, Brent R. 1990. ‘An Illustration of a Pitfall in Estimating the Effects of Aggregate Variables on Micro Units’. The Review of Economics and Statistics 72(2):334338.
Nickell, Stephen. 1981. ‘Biases in Dynamic Models With Fixed Effects’. Econometrica 49:14171426.
Rogers, William. 1993. ‘Regression Standard Errors in Clustered Samples’. Stata Technical Bulletin 13:1923.
van der Vaart, Aad W. 1998. Asymptotic Statistics. Cambridge, UK: Cambridge University Press.
White, Halbert. 1980. ‘A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity’. Econometrica 48(4):817838.
Williams, Rick L. 2000. ‘A Note on Robust Variance Estimation for Cluster-Correlated Data’. Biometrics 56(2):645646.
Wooldridge, Jeffrey M. 2002. Econometric Analysis of Cross Section and Panel Data. Cambridge, MA: MIT Press.
Wu, C. F. Jeff. 1986. ‘Jackknife, Bootstrap and Other Resampling Methods in Regression Analysis’. The Annals of Statistics 14(4):12611295.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Political Science Research and Methods
  • ISSN: 2049-8470
  • EISSN: 2049-8489
  • URL: /core/journals/political-science-research-and-methods
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×
Type Description Title
UNKNOWN
Supplementary materials

Esarey and Menger Dataset
Dataset

 Unknown
PDF
Supplementary materials

Esarey and Menger supplementary material
Esarey and Menger supplementary material 1

 PDF (595 KB)
595 KB

Metrics

Altmetric attention score

Full text views

Total number of HTML views: 29
Total number of PDF views: 191 *
Loading metrics...

Abstract views

Total abstract views: 984 *
Loading metrics...

* Views captured on Cambridge Core between 19th January 2018 - 18th August 2018. This data will be updated every 24 hours.