Skip to main content Accessibility help

Reliable Inference in Highly Stratified Contingency Tables: Using Latent Class Models as Density Estimators

  • Drew A. Linzer (a1)


Contingency tables are among the most basic and useful techniques available for analyzing categorical data, but they produce highly imprecise estimates in small samples or for population subgroups that arise following repeated stratification. I demonstrate that preprocessing an observed set of categorical variables using a latent class model can greatly improve the quality of table-based inferences. As a density estimator, the latent class model closely approximates the underlying joint distribution of the variables of interest, which enables reliable estimation of conditional probabilities and marginal effects, even among subgroups containing fewer than 40 observations. Though here focused on applications to public opinion, the procedure has a wide range of potential uses. I illustrate the benefits of the latent class model—based approach for greatly improved accuracy in estimating and forecasting vote preferences within small demographic subgroups using survey data from the 2004 and 2008 U.S. presidential election campaigns.



Hide All
Abrajano, Marisa A., Michael Alvarez, R., and Nagler, Jonathan. 2008. The Hispanic Vote in the 2004 presidential election: insecurity and moral concerns. The Journal of Politics 70: 368–82.
Achen, Christopher H. 2002. Toward a new political methodology: microfoundations and ART. Annual Review of Political Science 5: 423–50.
Achen, Christopher H. 2005. Let's put garbage-can regressions and garbage-can probits where they belong. Conflict Management and Peace Science 22: 327–39.
Agresti, Alan. 2002. Categorical data analysis. 2nd ed. Hoboken, NJ: John Wiley & Sons.
Agresti, Alan, Booth, James G., Hobert, James P., and Caffo, Brian. 2000. Random-effects modeling of categorical response data. Sociological Methodology 30: 2780.
Agresti, Alan, and Hitchcock, David B. 2005. Bayesian inference for categorical data analysis. Statistical Methods & Applications 14: 297330.
Aitchison, J., and Aitken, C. G. C. 1976. Multivariate binary discrimination by the kernel method. Biometrika 63: 413–20.
Asher, Herbert. 2007. Polling and the public: What every citizen should know. 7th ed. Washington, DC: CQ Press.
Balz, Daniel J., and Johnson, Haynes. 2009. The battle for America 2008: The story of an extraordinary election. New York: Viking.
Bandeen-Roche, Karen, Miglioretti, Diana L., Zeger, Scott L., and Rathouz, Paul J. 1997. Latent variable regression for multiple discrete outcomes. Journal of the American Statistical Association 92: 1375–86.
Bartholomew, David J., Steele, Fiona, Moustaki, Irini, and Galbraith, Jane I. 2008. Analysis of multivariate social science data. 2nd ed. Boca Raton, FL: Chapman & Hall.
Berry, William, DeMeritt, Jacqueline H., and Esarey, Justin. 2010. Testing for interaction in binary logit and probit models: is a product term essential? American Journal of Political Science 54: 248–66.
CBS News. 2008a. CBS News Monthly Poll #2, October 2008 (Computer file). ICPSR26826-v1. Ann Arbor, MI: Inter-University Consortium for Political and Social Research [distributor] January 29, 2010. doi:10.3886/ICPSR26826.
CBS News. 2008b. CBS News Monthly Poll #3, October 2008 [Computer file]. ICPSR26827-v1. Ann Arbor, MI: Inter-University Consortium for Political and Social Research [distributor] January 29, 2010. doi:10.3886/ICPSR26827.
CBS News. 2008c. CBS News Monthly Poll #4, October 2008 (Computer file). ICPSR26832-v1. Ann Arbor, MI: Inter-University Consortium for Political and Social Research [distributor] January 04, 2010. doi:10.3886/ICPSR26832.
CBS News. 2008d. CBS News Monthly Poll #5, October 2008 (Computer file). ICPSR26828-v1. Ann Arbor, MI: Inter-University Consortium for Political and Social Research [distributor] December 14, 2009. doi:10.3886/ICPSR26828.
Cillizza, Chris. 2007. Romney's data cruncher. The Washington Post, 5 July.
Congdon, Peter. 2005. Bayesian models for categorical data. Chichester, UK: John Wiley & Sons.
de la Garza, Rodolfo O., and Cortina, Jeronimo. 2007. Are Latinos Republicans but just don't know it? The Latino Vote in the 2000 and 2004 presidential elections. American Politics Research 35: 202–23.
Fraley, Chris, and Raftery, Adrian E. 2002. Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association 97: 611–31.
Garrett, Elizabeth S., and Zeger, Scott L. 2000. Latent class model diagnosis. Biometrics 56: 1055–67.
Gelman, Andrew, and Hill, Jennifer. 2007. Data analysis using regression and multilevel/hierarchical models. New York: Cambridge University Press.
Ghosh, M., and Rao, J. N. K. 1994. Small area estimation: an appraisal. Statistical Science 9: 5576.
Goodman, Leo A. 1974a. The analysis of systems of qualitative variables when some of the variables are unobservable. Part I—a modified latent structure approach. The American Journal of Sociology 79: 1179–259.
Goodman, Leo A. 1974b. Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika 61: 215–31.
Grund, B. 1993. Kernel estimators for cell probabilities. Journal of Multivariate Analysis 46: 283308.
Hagenaars, Jacques A., and McCutcheon, Allan L. 2002. Applied latent class analysis. New York: Cambridge University Press.
Hall, Peter. 1981. On nonparametric multivariate binary discrimination. Biometrika 68: 287–94.
Heeringa, Steven G., West, Brady T., and Berglund, Patricia A. 2010. Applied survey data analysis. Boca Raton, FL: Chapman and Hall.
Huang, Guan-Hua. 2005. Selecting the number of classes under latent class regression: a factor analytic analogue. Psychometrika 70: 325–45.
Jackson, John. 1989. An errors-in-variables approach to estimating models with small area data. Political Analysis 1: 157–80.
Jamieson, Kathleen Hall 2009. Electing the President, 2008: The insiders' view. Philadelphia, PA: University of Pennsylvania Press.
Lax, Jeffrey R., and Phillips, Justin H. 2009. How should we estimate public opinion in the states? American Journal of Political Science 53: 107–21.
Lazarsfeld, Paul F. 1950. The logical and mathematical goundations of latent structure analysis. In Measurement and prediction, ed. Stouffer, Samuel A., 362412. New York: John Wiley & Sons.
Leal, David L., Barreto, Matt A., Lee, Jongho, and de la Garza, Rodolfo O. 2005. The Latino Vote in the 2004 Election. PS: Political Science & Politics 38: 41–9.
Linzer, Drew A., and Lewis, Jeffrey. 2010. poLCA: Polytomous variable latent class analysis. R package version 1.2.∼dlinzer/poLCA.
Long, J. Scott. 1997. Regression models for categorical and limited dependent variables. Thousand Oaks, CA: Sage Publications.
Maddala, G. S. 1983. Limited-Dependent and Qualitative Variables in Econometrics. New York: Cambridge University Press.
McLachlan, Geoffrey J., and Peel, David. 2000. Finite mixture models. New York: John Wiley & Sons.
National Election Pool. 2008. Poll #2008-NATELEC: National Election Day Exit Poll (USMI2008-NATELEC). ABC News/Associated Press/CBS News/CNN/Fox News/NBC News.
National Election Pool, Edison Media Research, and Mitofsky International. 2004. National Election Pool General Election Exit Polls (Computer file). ICPSR version. Somerville, NJ: Edison Media Research/New York, NY: Mitofsky International [producers], 2004. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor].
Nylund, Karen L., Asparouhov, Tihomi, and Muthén, Bengt O. 2007. Deciding on the number of classes in latent class analysis and growth mixture modeling: a Monte Carlo simulation study. Structural Equation Modeling 14: 535–69.
Park, David K., Gelman, Andrew, and Bafumi, Joseph. 2004. Bayesian multilevel estimation with poststratification: state-level estimates from national polls. Political Analysis 12: 375–85.
R Development Core Team. 2010. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. ISBN 3-900051-07-0.
Rao, J. N. K. 2003. Small area estimation. Hoboken, NJ: John Wiley & Sons.
Simonoff, Jeffrey S. 1995. Smoothing categorical data. Journal of Statistical Planning and Inference 47: 4169.
Taylor, Paul, and Fry, Richard. 2007. Hispanics and the 2008 Election: A swing vote? Washington, DC: Pew Hispanic Center.
Titterington, D. M. 1980. A comparative study of kernel-based density estimates for categorical data. Technometrics 22: 259–68.
Todd, Chuck, and Gawiser, Sheldon. 2009. How Barack Obama won: A state-by-state guide to the historic 2008 presidential election. New York: Vintage Books.
Census Bureau, U. S. 2008. American Community Survey 1-year estimates. (accessed May 13, 2010).
Vermunt, Jeroen K., Van Ginkel, Joost R., Andries Van der Ark, L., and Sijtsma, Klaas. 2008. Multiple imputation of incomplete categorical data using latent class analysis. Sociological Methodology 38: 369–97.
MathJax is a JavaScript display engine for mathematics. For more information see

Related content

Powered by UNSILO

Reliable Inference in Highly Stratified Contingency Tables: Using Latent Class Models as Density Estimators

  • Drew A. Linzer (a1)


Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed.