Skip to main content

Practical and Effective Approaches to Dealing With Clustered Data*

  • Justin Esarey and Andrew Menger

Cluster-robust standard errors (as implemented by the eponymous cluster option in Stata) can produce misleading inferences when the number of clusters G is small, even if the model is consistent and there are many observations in each cluster. Nevertheless, political scientists commonly employ this method in data sets with few clusters. The contributions of this paper are: (a) developing new and easy-to-use Stata and R packages that implement alternative uncertainty measures robust to small G, and (b) explaining and providing evidence for the advantages of these alternatives, especially cluster-adjusted t-statistics based on Ibragimov and Müller. To illustrate these advantages, we reanalyze recent work where results are based on cluster-robust standard errors.

Hide All

Justin Esarey is an Assistant Professor of Political Science, Rice University, 6100 Main St, MS-24, Houston, TX 77005 ( Andrew Menger, Ph.D. Candidate, Department of Political Science, Rice University, 6100 Main St, MS-24, Houston, TX 77005 ( The authors thank Ulrich Müller, Carlisle Rainey, Jonathan Kropko, Matthew Webb, Neal Beck, Jens Hainmueller, Shuai Jin, Jens Grosser, Ernesto Reuben, our anonymous reviewers, and participants at the 2015 Annual Meeting of the Midwest Political Science Association, the 2015 Annual Meeting of the Society for Political Methodology, and the 2016 Annual Meeting of the Southern Political Science Association for helpful comments and suggestions on earlier drafts of this paper. To view supplementary material for this article, please visit

Hide All
Anderson Theodore W. 2003. An Introduction to Multivariate Statistical Analysis 3rd ed. New York, NY: Wiley.
Angrist Joshua D., and Pischke Jorn-Steffen. 2009. Mostly Harmless Econometrics: An Empiricist's Companion. Princeton, NJ: Princeton University Press.
Arellano Manuel. 1987. ‘Computing Robust Standard Errors for Within-Groups Estimators’. Oxford Bulletin of Economics and Statistics 49(4):431434.
Bafumi Joseph, and Gelman Andrew. 2006. ‘Fitting Multilevel Models When Predictors and Group Effects Correlate’. Available at, accessed 21 December 2017.
Bakirov Nail K., and Szekely Gabor J.. 2006. ‘Student’s t-Test for Gaussian Scale Mixtures’. Journal of Mathematical Sciences 139(3):64976505.
Bates Douglas, Maechler Martin, Bolker Ben, and Walker Steven. 2014. ‘lme4: Linear Mixed Effects Models Using Eigen and S4’. R package version 1.1-7. Available at, accessed 21 December 2017.
Beck Nathaniel, and Katz Jonathan N.. 1995. ‘What To Do (And Not To Do) With Time-Series Cross-Section Data’. American Political Science Review 89(3):634647.
Beck Nathaniel L., Katz Jonathan N., and Mignozzetti Umberto G.. 2014. ‘Of Nickell Bias and its Cures: Comment on Gaibulloev, Sandler, and Sul’. Political Analysis 22(2):274278.
Bertrand Marianne, Duflo Esther, and Mullainathan Sendhil. 2004. ‘How Much Should We Trust Differences-In-Differences Estimates?’. The Quarterly Journal of Economics 119(1):249275.
Brambor Thomas, Clark William Roberts, and Golder Matthew. 2006. ‘Understanding Interaction Models: Improving Empirical Analyses’. Political Analysis 14(1):6382.
Cameron A. Colin, and Miller Douglas L.. 2015. ‘A Practitioner’s Guide to Cluster-Robust Inference’. Journal of Human Resources 50(2):317372.
Cameron A. Colin, Gelbach Jonah B., and Miller Douglas L.. 2008. ‘Bootstrap-Based Improvements for Inference With Clustered Errors’. Review of Economics and Statistics 90(3):414427.
Cameron A. Colin, and Trivedi Pravin K.. 2005. Microeconometrics: Methods and Applications. Cambridge, UK: Cambridge University Press.
Canay Ivan A., Romano Joseph P., and Shaikh Azeem M.. 2014. ‘Randomization Tests Under an Approximate Symmetry Assumption’. Working Paper (version: December 19, 2014). Available at, accessed 29 January 2017.
Clark Tom S., and Linzer Drew A.. 2015. ‘Should I Use Fixed or Random Effects?’. Political Science Research and Methods 3(2):399408.
Croissant Yves. 2015. ‘Package “mlogit”.’ CRAN. Available at, accessed 21 December 2017.
Croissant Yves, and Millo Giovanni. 2008. ‘Panel Data Econometrics in R: The plm Package’. Journal of Statistical Software 27(2):143.
Donald Stephen G., and Lang Kevin. 2007. ‘Inference With Difference-in-Differences and Other Panel Data’. The Review of Economics and Statistics 89(2):221233.
Donner Allan. 1998. ‘Some Aspects of the Design and Analysis of Cluster Randomization Trials’. Journal of the Royal Statistical Society: Series C (Applied Statistics) 47(1):95113.
Efron Bradley. 1979. ‘Bootstrap Methods: Another Look at the Jackknife’. Annals of Statistics 7(1):126.
Field Chris A., and Welsh Alan H.. 2007. ‘Bootstrapping Clustered Data’. Journal of the Royal Statistical Society: Series B 69(3):369390.
Gaibulloev Khusrav, Sandler Todd, and Sul Donggyu. 2014. ‘Dynamic Panel Analysis Under Cross-Sectional Dependence’. Political Analysis 22:258273.
Green Donald P., and Vavreck Lynn. 2008. ‘Analysis of Cluster-Randomized Experiments: A Comparison of Alternative Estimation Approaches’. Political Analysis 16(2):138152.
Grosser Jens, Reuben Ernesto, and Tymula Agnieszka. 2013. ‘Political Quid Pro Quo Agreements: An Experimental Study’. American Journal of Political Science 57:582597.
Hainmueller Jens, Hiscox Michael, and Sequeira Sandra. 2015. ‘Consumer Demand for the Fair Trade Label: Evidence from a Multistore Field Experiment’. Review of Economics and Statistics 97(2):242256.
Hansen Christian B. 2007. ‘Asymptotic Properties of a Robust Variance Matrix Estimator for Panel Data When T is Large’. Journal of Econometrics 141(2):597620.
Harden Jeffrey J. 2011. ‘A Bootstrap Method for Conducting Statistical Inference With Clustered Data’. State Politics & Policy Quarterly 11(2):223246.
Hardin James W., and Hilbe Joseph M.. 2003. Generalized Estimating Equations. Boca Raton, FL: Chapman & Hall/CRC.
Horowitz Joel L. 1997. ‘Bootstrap Methods in Econometrics: Theory and Numerical Performance’. In David M. Kreps and Kenneth F. Wallis (eds), Advances in Economics and Econometrics: Theory and Applications: Seventh World Congress, 189222. Cambridge, UK: Cambridge University Press.
Hu Feifang, and Kalbeisch John D.. 2000. ‘The Estimating Function Bootstrap’. Canadian Journal of Statistics 28(3):449481.
Ibragimov Rustam, and Müller Ulrich K.. 2010. ‘t-Statistic Based Correlation and Heterogeneity Robust Inference’. Journal of Business & Economic Statistics 28(4):453468.
Imbens Guido W., and Kolesar Michal. 2012. ‘Robust Standard Errors in Small Samples: Some Practical Advice’ 98(4):701–12.
Judge George G., Hill R. Carter, Griffths William E., Lutkepohl Helmut, and Lee Tsoung-Chao. 1988. Introduction to the Theory and Practice of Econometrics. New York, NY: Wiley.
Kezdi Gabor. 2004. ‘Robust Standard Error Estimation in Fixed-Effects Panel Models’. Hungarian Statistical Review 9:95116.
King Gary, and Roberts Margaret E.. 2014. ‘How Robust Standard Errors Expose Methodological Problems They Do Not Fix, and What to Do About it’. Political Analysis 23:159179.
Klar Neil, and Donner Allan. 2001. ‘Current and Future Challenges in the Design and Analysis of Cluster Randomization Trials’. Statistics in Medicine 20(24):37293740.
Lacina Bethany. 2014. ‘How Governments Shape the Risk of Civil Violence: India’s Federal Reorganization, 1950–56’. American Journal of Political Science 58(3):720738.
Liang Kung-Yee, and Zeger Scott L.. 1986. ‘Longitudinal Data Analysis Using Generalized Linear Models’. Biometrika 73(1):1322.
Liang Kung-Yee, and Zeger Scott L.. 1993. ‘Regression Analysis for Correlated Data’. Annual Review of Public Health 14(1):4368.
Liu Regina Y. 1988. ‘Bootstrap Procedures Under Some Non-I.I.D. Models’. The Annals of Statistics 16(4):16961708.
Liu Regina Y., and Singh Kesar. 1987. ‘On a Partial Correction by the Bootstrap’. The Annals of Statistics 15(4):17131718.
MacKinnon James G. 2015. ‘Wild Cluster Bootstrap Confidence Intervals’. L’Actualité économique 91(1-2):1133.
MacKinnon James G., and Webb Matthew D.. 2017. ‘Wild Bootstrap Inference for Wildly Different Cluster Sizes’. Journal of Applied Econometrics 32(2):233254.
Mancl Lloyd A., and DeRouen Timothy A.. 2001. ‘A Covariance Estimator for GEE With Improved Small-Sample Properties’. Biometrics 57(1):126134.
Moulton Brent R. 1986. ‘Random Group Effects and the Precision of Regression Estimates’. Journal of Econometrics 32(3):385397.
Moulton Brent R. 1990. ‘An Illustration of a Pitfall in Estimating the Effects of Aggregate Variables on Micro Units’. The Review of Economics and Statistics 72(2):334338.
Nickell Stephen. 1981. ‘Biases in Dynamic Models With Fixed Effects’. Econometrica 49:14171426.
Rogers William. 1993. ‘Regression Standard Errors in Clustered Samples’. Stata Technical Bulletin 13:1923.
van der Vaart Aad W. 1998. Asymptotic Statistics. Cambridge, UK: Cambridge University Press.
White Halbert. 1980. ‘A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity’. Econometrica 48(4):817838.
Williams Rick L. 2000. ‘A Note on Robust Variance Estimation for Cluster-Correlated Data’. Biometrics 56(2):645646.
Wooldridge Jeffrey M. 2002. Econometric Analysis of Cross Section and Panel Data. Cambridge, MA: MIT Press.
Wu C. F. Jeff. 1986. ‘Jackknife, Bootstrap and Other Resampling Methods in Regression Analysis’. The Annals of Statistics 14(4):12611295.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Political Science Research and Methods
  • ISSN: 2049-8470
  • EISSN: 2049-8489
  • URL: /core/journals/political-science-research-and-methods
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
Type Description Title
Supplementary materials

Esarey and Menger Dataset

Supplementary materials

Esarey and Menger supplementary material
Esarey and Menger supplementary material 1

 PDF (595 KB)
595 KB


Altmetric attention score

Full text views

Total number of HTML views: 12
Total number of PDF views: 87 *
Loading metrics...

Abstract views

Total abstract views: 474 *
Loading metrics...

* Views captured on Cambridge Core between 19th January 2018 - 24th February 2018. This data will be updated every 24 hours.