Hostname: page-component-77c89778f8-cnmwb Total loading time: 0 Render date: 2024-07-16T18:56:58.890Z Has data issue: false hasContentIssue false

Reliable Inference in Highly Stratified Contingency Tables: Using Latent Class Models as Density Estimators

Published online by Cambridge University Press:  04 January 2017

Drew A. Linzer*
Department of Political Science, Emory University, 327 Tarbutton Hall, 1555 Dickey Drive, Atlanta, GA 30322 e-mail:


Contingency tables are among the most basic and useful techniques available for analyzing categorical data, but they produce highly imprecise estimates in small samples or for population subgroups that arise following repeated stratification. I demonstrate that preprocessing an observed set of categorical variables using a latent class model can greatly improve the quality of table-based inferences. As a density estimator, the latent class model closely approximates the underlying joint distribution of the variables of interest, which enables reliable estimation of conditional probabilities and marginal effects, even among subgroups containing fewer than 40 observations. Though here focused on applications to public opinion, the procedure has a wide range of potential uses. I illustrate the benefits of the latent class model—based approach for greatly improved accuracy in estimating and forecasting vote preferences within small demographic subgroups using survey data from the 2004 and 2008 U.S. presidential election campaigns.

Regular Articles
Copyright © The Author 2011. Published by Oxford University Press on behalf of the Society for Political Methodology 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)


Abrajano, Marisa A., Michael Alvarez, R., and Nagler, Jonathan. 2008. The Hispanic Vote in the 2004 presidential election: insecurity and moral concerns. The Journal of Politics 70: 368–82.CrossRefGoogle Scholar
Achen, Christopher H. 2002. Toward a new political methodology: microfoundations and ART. Annual Review of Political Science 5: 423–50.CrossRefGoogle Scholar
Achen, Christopher H. 2005. Let's put garbage-can regressions and garbage-can probits where they belong. Conflict Management and Peace Science 22: 327–39.CrossRefGoogle Scholar
Agresti, Alan. 2002. Categorical data analysis. 2nd ed. Hoboken, NJ: John Wiley & Sons.CrossRefGoogle Scholar
Agresti, Alan, Booth, James G., Hobert, James P., and Caffo, Brian. 2000. Random-effects modeling of categorical response data. Sociological Methodology 30: 2780.CrossRefGoogle Scholar
Agresti, Alan, and Hitchcock, David B. 2005. Bayesian inference for categorical data analysis. Statistical Methods & Applications 14: 297330.CrossRefGoogle Scholar
Aitchison, J., and Aitken, C. G. C. 1976. Multivariate binary discrimination by the kernel method. Biometrika 63: 413–20.CrossRefGoogle Scholar
Asher, Herbert. 2007. Polling and the public: What every citizen should know. 7th ed. Washington, DC: CQ Press.Google Scholar
Balz, Daniel J., and Johnson, Haynes. 2009. The battle for America 2008: The story of an extraordinary election. New York: Viking.Google Scholar
Bandeen-Roche, Karen, Miglioretti, Diana L., Zeger, Scott L., and Rathouz, Paul J. 1997. Latent variable regression for multiple discrete outcomes. Journal of the American Statistical Association 92: 1375–86.CrossRefGoogle Scholar
Bartholomew, David J., Steele, Fiona, Moustaki, Irini, and Galbraith, Jane I. 2008. Analysis of multivariate social science data. 2nd ed. Boca Raton, FL: Chapman & Hall.CrossRefGoogle Scholar
Berry, William, DeMeritt, Jacqueline H., and Esarey, Justin. 2010. Testing for interaction in binary logit and probit models: is a product term essential? American Journal of Political Science 54: 248–66.CrossRefGoogle Scholar
CBS News. 2008a. CBS News Monthly Poll #2, October 2008 (Computer file). ICPSR26826-v1. Ann Arbor, MI: Inter-University Consortium for Political and Social Research [distributor] January 29, 2010. doi:10.3886/ICPSR26826.Google Scholar
CBS News. 2008b. CBS News Monthly Poll #3, October 2008 [Computer file]. ICPSR26827-v1. Ann Arbor, MI: Inter-University Consortium for Political and Social Research [distributor] January 29, 2010. doi:10.3886/ICPSR26827.Google Scholar
CBS News. 2008c. CBS News Monthly Poll #4, October 2008 (Computer file). ICPSR26832-v1. Ann Arbor, MI: Inter-University Consortium for Political and Social Research [distributor] January 04, 2010. doi:10.3886/ICPSR26832.Google Scholar
CBS News. 2008d. CBS News Monthly Poll #5, October 2008 (Computer file). ICPSR26828-v1. Ann Arbor, MI: Inter-University Consortium for Political and Social Research [distributor] December 14, 2009. doi:10.3886/ICPSR26828.Google Scholar
Cillizza, Chris. 2007. Romney's data cruncher. The Washington Post, 5 July.Google Scholar
Congdon, Peter. 2005. Bayesian models for categorical data. Chichester, UK: John Wiley & Sons.CrossRefGoogle Scholar
de la Garza, Rodolfo O., and Cortina, Jeronimo. 2007. Are Latinos Republicans but just don't know it? The Latino Vote in the 2000 and 2004 presidential elections. American Politics Research 35: 202–23.CrossRefGoogle Scholar
Fraley, Chris, and Raftery, Adrian E. 2002. Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association 97: 611–31.CrossRefGoogle Scholar
Garrett, Elizabeth S., and Zeger, Scott L. 2000. Latent class model diagnosis. Biometrics 56: 1055–67.CrossRefGoogle ScholarPubMed
Gelman, Andrew, and Hill, Jennifer. 2007. Data analysis using regression and multilevel/hierarchical models. New York: Cambridge University Press.Google Scholar
Ghosh, M., and Rao, J. N. K. 1994. Small area estimation: an appraisal. Statistical Science 9: 5576.Google Scholar
Goodman, Leo A. 1974a. The analysis of systems of qualitative variables when some of the variables are unobservable. Part I—a modified latent structure approach. The American Journal of Sociology 79: 1179–259.CrossRefGoogle Scholar
Goodman, Leo A. 1974b. Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika 61: 215–31.CrossRefGoogle Scholar
Grund, B. 1993. Kernel estimators for cell probabilities. Journal of Multivariate Analysis 46: 283308.CrossRefGoogle Scholar
Hagenaars, Jacques A., and McCutcheon, Allan L. 2002. Applied latent class analysis. New York: Cambridge University Press.CrossRefGoogle Scholar
Hall, Peter. 1981. On nonparametric multivariate binary discrimination. Biometrika 68: 287–94.CrossRefGoogle Scholar
Heeringa, Steven G., West, Brady T., and Berglund, Patricia A. 2010. Applied survey data analysis. Boca Raton, FL: Chapman and Hall.CrossRefGoogle Scholar
Huang, Guan-Hua. 2005. Selecting the number of classes under latent class regression: a factor analytic analogue. Psychometrika 70: 325–45.CrossRefGoogle Scholar
Jackson, John. 1989. An errors-in-variables approach to estimating models with small area data. Political Analysis 1: 157–80.CrossRefGoogle Scholar
Jamieson, Kathleen Hall 2009. Electing the President, 2008: The insiders' view. Philadelphia, PA: University of Pennsylvania Press.Google Scholar
Lax, Jeffrey R., and Phillips, Justin H. 2009. How should we estimate public opinion in the states? American Journal of Political Science 53: 107–21.CrossRefGoogle Scholar
Lazarsfeld, Paul F. 1950. The logical and mathematical goundations of latent structure analysis. In Measurement and prediction, ed. Stouffer, Samuel A., 362412. New York: John Wiley & Sons.Google Scholar
Leal, David L., Barreto, Matt A., Lee, Jongho, and de la Garza, Rodolfo O. 2005. The Latino Vote in the 2004 Election. PS: Political Science & Politics 38: 41–9.Google Scholar
Linzer, Drew A., and Lewis, Jeffrey. 2010. poLCA: Polytomous variable latent class analysis. R package version 1.2.∼dlinzer/poLCA.Google Scholar
Long, J. Scott. 1997. Regression models for categorical and limited dependent variables. Thousand Oaks, CA: Sage Publications.Google Scholar
Maddala, G. S. 1983. Limited-Dependent and Qualitative Variables in Econometrics. New York: Cambridge University Press.CrossRefGoogle Scholar
McLachlan, Geoffrey J., and Peel, David. 2000. Finite mixture models. New York: John Wiley & Sons.CrossRefGoogle Scholar
National Election Pool. 2008. Poll #2008-NATELEC: National Election Day Exit Poll (USMI2008-NATELEC). ABC News/Associated Press/CBS News/CNN/Fox News/NBC News.Google Scholar
National Election Pool, Edison Media Research, and Mitofsky International. 2004. National Election Pool General Election Exit Polls (Computer file). ICPSR version. Somerville, NJ: Edison Media Research/New York, NY: Mitofsky International [producers], 2004. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor].Google Scholar
Nylund, Karen L., Asparouhov, Tihomi, and Muthén, Bengt O. 2007. Deciding on the number of classes in latent class analysis and growth mixture modeling: a Monte Carlo simulation study. Structural Equation Modeling 14: 535–69.CrossRefGoogle Scholar
Park, David K., Gelman, Andrew, and Bafumi, Joseph. 2004. Bayesian multilevel estimation with poststratification: state-level estimates from national polls. Political Analysis 12: 375–85.CrossRefGoogle Scholar
R Development Core Team. 2010. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. ISBN 3-900051-07-0. Scholar
Rao, J. N. K. 2003. Small area estimation. Hoboken, NJ: John Wiley & Sons.CrossRefGoogle Scholar
Simonoff, Jeffrey S. 1995. Smoothing categorical data. Journal of Statistical Planning and Inference 47: 4169.CrossRefGoogle Scholar
Taylor, Paul, and Fry, Richard. 2007. Hispanics and the 2008 Election: A swing vote? Washington, DC: Pew Hispanic Center.Google Scholar
Titterington, D. M. 1980. A comparative study of kernel-based density estimates for categorical data. Technometrics 22: 259–68.CrossRefGoogle Scholar
Todd, Chuck, and Gawiser, Sheldon. 2009. How Barack Obama won: A state-by-state guide to the historic 2008 presidential election. New York: Vintage Books.Google Scholar
Census Bureau, U. S. 2008. American Community Survey 1-year estimates. (accessed May 13, 2010).Google Scholar
Vermunt, Jeroen K., Van Ginkel, Joost R., Andries Van der Ark, L., and Sijtsma, Klaas. 2008. Multiple imputation of incomplete categorical data using latent class analysis. Sociological Methodology 38: 369–97.CrossRefGoogle Scholar