Skip to main content Accessibility help
×
Home
Hostname: page-component-65d66dc8c9-rpcb2 Total loading time: 0.273 Render date: 2021-09-29T02:36:29.524Z Has data issue: true Feature Flags: { "shouldUseShareProductTool": true, "shouldUseHypothesis": true, "isUnsiloEnabled": true, "metricsAbstractViews": false, "figures": true, "newCiteModal": false, "newCitedByModal": true, "newEcommerce": true, "newUsageEvents": true }

Practical and Effective Approaches to Dealing With Clustered Data

Published online by Cambridge University Press:  19 January 2018

Abstract

Cluster-robust standard errors (as implemented by the eponymous cluster option in Stata) can produce misleading inferences when the number of clusters G is small, even if the model is consistent and there are many observations in each cluster. Nevertheless, political scientists commonly employ this method in data sets with few clusters. The contributions of this paper are: (a) developing new and easy-to-use Stata and R packages that implement alternative uncertainty measures robust to small G, and (b) explaining and providing evidence for the advantages of these alternatives, especially cluster-adjusted t-statistics based on Ibragimov and Müller. To illustrate these advantages, we reanalyze recent work where results are based on cluster-robust standard errors.

Type
Original Articles
Copyright
© The European Political Science Association 2018 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

*

Justin Esarey is an Assistant Professor of Political Science, Rice University, 6100 Main St, MS-24, Houston, TX 77005 (justin@justinesarey.com). Andrew Menger, Ph.D. Candidate, Department of Political Science, Rice University, 6100 Main St, MS-24, Houston, TX 77005 (Andrew.M.Menger@rice.edu). The authors thank Ulrich Müller, Carlisle Rainey, Jonathan Kropko, Matthew Webb, Neal Beck, Jens Hainmueller, Shuai Jin, Jens Grosser, Ernesto Reuben, our anonymous reviewers, and participants at the 2015 Annual Meeting of the Midwest Political Science Association, the 2015 Annual Meeting of the Society for Political Methodology, and the 2016 Annual Meeting of the Southern Political Science Association for helpful comments and suggestions on earlier drafts of this paper. To view supplementary material for this article, please visit https://doi.org/10.1017/psrm.2017.42

References

Anderson, Theodore W. 2003. An Introduction to Multivariate Statistical Analysis 3rd ed. New York, NY: Wiley.Google Scholar
Angrist, Joshua D., and Pischke, Jorn-Steffen. 2009. Mostly Harmless Econometrics: An Empiricist's Companion. Princeton, NJ: Princeton University Press.CrossRefGoogle Scholar
Arellano, Manuel. 1987. ‘Computing Robust Standard Errors for Within-Groups Estimators’. Oxford Bulletin of Economics and Statistics 49(4):431434.CrossRefGoogle Scholar
Bafumi, Joseph, and Gelman, Andrew. 2006. ‘Fitting Multilevel Models When Predictors and Group Effects Correlate’. Available at http://goo.gl/usvQsn, accessed 21 December 2017.Google Scholar
Bakirov, Nail K., and Szekely, Gabor J.. 2006. ‘Student’s t-Test for Gaussian Scale Mixtures’. Journal of Mathematical Sciences 139(3):64976505.CrossRefGoogle Scholar
Bates, Douglas, Maechler, Martin, Bolker, Ben, and Walker, Steven. 2014. ‘lme4: Linear Mixed Effects Models Using Eigen and S4’. R package version 1.1-7. Available at http://CRAN.R-project.org/package=lme4, accessed 21 December 2017.Google Scholar
Beck, Nathaniel, and Katz, Jonathan N.. 1995. ‘What To Do (And Not To Do) With Time-Series Cross-Section Data’. American Political Science Review 89(3):634647.CrossRefGoogle Scholar
Beck, Nathaniel L., Katz, Jonathan N., and Mignozzetti, Umberto G.. 2014. ‘Of Nickell Bias and its Cures: Comment on Gaibulloev, Sandler, and Sul’. Political Analysis 22(2):274278.CrossRefGoogle Scholar
Bertrand, Marianne, Duflo, Esther, and Mullainathan, Sendhil. 2004. ‘How Much Should We Trust Differences-In-Differences Estimates?’. The Quarterly Journal of Economics 119(1):249275.CrossRefGoogle Scholar
Brambor, Thomas, Clark, William Roberts, and Golder, Matthew. 2006. ‘Understanding Interaction Models: Improving Empirical Analyses’. Political Analysis 14(1):6382.CrossRefGoogle Scholar
Cameron, A. Colin, and Miller, Douglas L.. 2015. ‘A Practitioner’s Guide to Cluster-Robust Inference’. Journal of Human Resources 50(2):317372.CrossRefGoogle Scholar
Cameron, A. Colin, Gelbach, Jonah B., and Miller, Douglas L.. 2008. ‘Bootstrap-Based Improvements for Inference With Clustered Errors’. Review of Economics and Statistics 90(3):414427.CrossRefGoogle Scholar
Cameron, A. Colin, and Trivedi, Pravin K.. 2005. Microeconometrics: Methods and Applications. Cambridge, UK: Cambridge University Press.CrossRefGoogle Scholar
Canay, Ivan A., Romano, Joseph P., and Shaikh, Azeem M.. 2014. ‘Randomization Tests Under an Approximate Symmetry Assumption’. Working Paper (version: December 19, 2014). Available at https://goo.gl/TUEQee, accessed 29 January 2017.Google Scholar
Clark, Tom S., and Linzer, Drew A.. 2015. ‘Should I Use Fixed or Random Effects?’. Political Science Research and Methods 3(2):399408.CrossRefGoogle Scholar
Croissant, Yves. 2015. ‘Package “mlogit”.’ CRAN. Available at http://cran.r-project.org/web/packages/mlogit/mlogit.pdf, accessed 21 December 2017.Google Scholar
Croissant, Yves, and Millo, Giovanni. 2008. ‘Panel Data Econometrics in R: The plm Package’. Journal of Statistical Software 27(2):143.CrossRefGoogle Scholar
Donald, Stephen G., and Lang, Kevin. 2007. ‘Inference With Difference-in-Differences and Other Panel Data’. The Review of Economics and Statistics 89(2):221233.CrossRefGoogle Scholar
Donner, Allan. 1998. ‘Some Aspects of the Design and Analysis of Cluster Randomization Trials’. Journal of the Royal Statistical Society: Series C (Applied Statistics) 47(1):95113.CrossRefGoogle Scholar
Efron, Bradley. 1979. ‘Bootstrap Methods: Another Look at the Jackknife’. Annals of Statistics 7(1):126.CrossRefGoogle Scholar
Field, Chris A., and Welsh, Alan H.. 2007. ‘Bootstrapping Clustered Data’. Journal of the Royal Statistical Society: Series B 69(3):369390.CrossRefGoogle Scholar
Gaibulloev, Khusrav, Sandler, Todd, and Sul, Donggyu. 2014. ‘Dynamic Panel Analysis Under Cross-Sectional Dependence’. Political Analysis 22:258273.CrossRefGoogle Scholar
Green, Donald P., and Vavreck, Lynn. 2008. ‘Analysis of Cluster-Randomized Experiments: A Comparison of Alternative Estimation Approaches’. Political Analysis 16(2):138152.CrossRefGoogle Scholar
Grosser, Jens, Reuben, Ernesto, and Tymula, Agnieszka. 2013. ‘Political Quid Pro Quo Agreements: An Experimental Study’. American Journal of Political Science 57:582597.CrossRefGoogle Scholar
Hainmueller, Jens, Hiscox, Michael, and Sequeira, Sandra. 2015. ‘Consumer Demand for the Fair Trade Label: Evidence from a Multistore Field Experiment’. Review of Economics and Statistics 97(2):242256.CrossRefGoogle Scholar
Hansen, Christian B. 2007. ‘Asymptotic Properties of a Robust Variance Matrix Estimator for Panel Data When T is Large’. Journal of Econometrics 141(2):597620.CrossRefGoogle Scholar
Harden, Jeffrey J. 2011. ‘A Bootstrap Method for Conducting Statistical Inference With Clustered Data’. State Politics & Policy Quarterly 11(2):223246.CrossRefGoogle Scholar
Hardin, James W., and Hilbe, Joseph M.. 2003. Generalized Estimating Equations. Boca Raton, FL: Chapman & Hall/CRC.Google Scholar
Horowitz, Joel L. 1997. ‘Bootstrap Methods in Econometrics: Theory and Numerical Performance’. In David M. Kreps and Kenneth F. Wallis (eds), Advances in Economics and Econometrics: Theory and Applications: Seventh World Congress, 189222. Cambridge, UK: Cambridge University Press.Google Scholar
Hu, Feifang, and Kalbeisch, John D.. 2000. ‘The Estimating Function Bootstrap’. Canadian Journal of Statistics 28(3):449481.CrossRefGoogle Scholar
Ibragimov, Rustam, and Müller, Ulrich K.. 2010. ‘t-Statistic Based Correlation and Heterogeneity Robust Inference’. Journal of Business & Economic Statistics 28(4):453468.CrossRefGoogle Scholar
Imbens, Guido W., and Kolesar, Michal. 2012. ‘Robust Standard Errors in Small Samples: Some Practical Advice’ 98(4):701–12.CrossRefGoogle Scholar
Judge, George G., Hill, R. Carter, Griffths, William E., Lutkepohl, Helmut, and Lee, Tsoung-Chao. 1988. Introduction to the Theory and Practice of Econometrics. New York, NY: Wiley.CrossRefGoogle Scholar
Kezdi, Gabor. 2004. ‘Robust Standard Error Estimation in Fixed-Effects Panel Models’. Hungarian Statistical Review 9:95116.Google Scholar
King, Gary, and Roberts, Margaret E.. 2014. ‘How Robust Standard Errors Expose Methodological Problems They Do Not Fix, and What to Do About it’. Political Analysis 23:159179.CrossRefGoogle Scholar
Klar, Neil, and Donner, Allan. 2001. ‘Current and Future Challenges in the Design and Analysis of Cluster Randomization Trials’. Statistics in Medicine 20(24):37293740.CrossRefGoogle ScholarPubMed
Lacina, Bethany. 2014. ‘How Governments Shape the Risk of Civil Violence: India’s Federal Reorganization, 1950–56’. American Journal of Political Science 58(3):720738.CrossRefGoogle Scholar
Liang, Kung-Yee, and Zeger, Scott L.. 1986. ‘Longitudinal Data Analysis Using Generalized Linear Models’. Biometrika 73(1):1322.CrossRefGoogle Scholar
Liang, Kung-Yee, and Zeger, Scott L.. 1993. ‘Regression Analysis for Correlated Data’. Annual Review of Public Health 14(1):4368.CrossRefGoogle ScholarPubMed
Liu, Regina Y. 1988. ‘Bootstrap Procedures Under Some Non-I.I.D. Models’. The Annals of Statistics 16(4):16961708.CrossRefGoogle Scholar
Liu, Regina Y., and Singh, Kesar. 1987. ‘On a Partial Correction by the Bootstrap’. The Annals of Statistics 15(4):17131718.CrossRefGoogle Scholar
MacKinnon, James G. 2015. ‘Wild Cluster Bootstrap Confidence Intervals’. L’Actualité économique 91(1-2):1133.CrossRefGoogle Scholar
MacKinnon, James G., and Webb, Matthew D.. 2017. ‘Wild Bootstrap Inference for Wildly Different Cluster Sizes’. Journal of Applied Econometrics 32(2):233254.CrossRefGoogle Scholar
Mancl, Lloyd A., and DeRouen, Timothy A.. 2001. ‘A Covariance Estimator for GEE With Improved Small-Sample Properties’. Biometrics 57(1):126134.CrossRefGoogle ScholarPubMed
Moulton, Brent R. 1986. ‘Random Group Effects and the Precision of Regression Estimates’. Journal of Econometrics 32(3):385397.CrossRefGoogle Scholar
Moulton, Brent R. 1990. ‘An Illustration of a Pitfall in Estimating the Effects of Aggregate Variables on Micro Units’. The Review of Economics and Statistics 72(2):334338.CrossRefGoogle Scholar
Nickell, Stephen. 1981. ‘Biases in Dynamic Models With Fixed Effects’. Econometrica 49:14171426.CrossRefGoogle Scholar
Rogers, William. 1993. ‘Regression Standard Errors in Clustered Samples’. Stata Technical Bulletin 13:1923.Google Scholar
van der Vaart, Aad W. 1998. Asymptotic Statistics. Cambridge, UK: Cambridge University Press.CrossRefGoogle Scholar
White, Halbert. 1980. ‘A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity’. Econometrica 48(4):817838.CrossRefGoogle Scholar
Williams, Rick L. 2000. ‘A Note on Robust Variance Estimation for Cluster-Correlated Data’. Biometrics 56(2):645646.CrossRefGoogle ScholarPubMed
Wooldridge, Jeffrey M. 2002. Econometric Analysis of Cross Section and Panel Data. Cambridge, MA: MIT Press.Google Scholar
Wu, C. F. Jeff. 1986. ‘Jackknife, Bootstrap and Other Resampling Methods in Regression Analysis’. The Annals of Statistics 14(4):12611295.CrossRefGoogle Scholar
Supplementary material: Link

Esarey and Menger Dataset

Link
Supplementary material: PDF

Esarey and Menger supplementary material

Esarey and Menger supplementary material 1

Download Esarey and Menger supplementary material(PDF)
PDF 595 KB
29
Cited by

Send article to Kindle

To send this article to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

Note you can select to send to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Practical and Effective Approaches to Dealing With Clustered Data
Available formats
×

Send article to Dropbox

To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

Practical and Effective Approaches to Dealing With Clustered Data
Available formats
×

Send article to Google Drive

To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

Practical and Effective Approaches to Dealing With Clustered Data
Available formats
×
×

Reply to: Submit a response

Please enter your response.

Your details

Please enter a valid email address.

Conflicting interests

Do you have any conflicting interests? *