Reliable Inference in Highly Stratified Contingency Tables: Using Latent Class Models as Density Estimators

Drew A. Linzer

doi:10.1093/pan/mpr006

Reliable Inference in Highly Stratified Contingency Tables: Using Latent Class Models as Density Estimators

Published online by Cambridge University Press: 04 January 2017

Drew A. Linzer

Show author details

Drew A. Linzer*: Affiliation:
Department of Political Science, Emory University, 327 Tarbutton Hall, 1555 Dickey Drive, Atlanta, GA 30322 e-mail: dlinzer@emory.edu

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Contingency tables are among the most basic and useful techniques available for analyzing categorical data, but they produce highly imprecise estimates in small samples or for population subgroups that arise following repeated stratification. I demonstrate that preprocessing an observed set of categorical variables using a latent class model can greatly improve the quality of table-based inferences. As a density estimator, the latent class model closely approximates the underlying joint distribution of the variables of interest, which enables reliable estimation of conditional probabilities and marginal effects, even among subgroups containing fewer than 40 observations. Though here focused on applications to public opinion, the procedure has a wide range of potential uses. I illustrate the benefits of the latent class model—based approach for greatly improved accuracy in estimating and forecasting vote preferences within small demographic subgroups using survey data from the 2004 and 2008 U.S. presidential election campaigns.

Information

Type: Regular Articles
Information: Political Analysis , Volume 19 , Issue 2 , Spring 2011 , pp. 173 - 187

DOI: https://doi.org/10.1093/pan/mpr006 [Opens in a new window]
Copyright: Copyright © The Author 2011. Published by Oxford University Press on behalf of the Society for Political Methodology

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Abrajano, Marisa A., Michael Alvarez, R., and Nagler, Jonathan. 2008. The Hispanic Vote in the 2004 presidential election: insecurity and moral concerns. The Journal of Politics 70: 368–82.CrossRef Google Scholar

Achen, Christopher H. 2002. Toward a new political methodology: microfoundations and ART. Annual Review of Political Science 5: 423–50.CrossRef Google Scholar

Achen, Christopher H. 2005. Let's put garbage-can regressions and garbage-can probits where they belong. Conflict Management and Peace Science 22: 327–39.CrossRef Google Scholar

Agresti, Alan. 2002. Categorical data analysis. 2nd ed. Hoboken, NJ: John Wiley & Sons.CrossRef Google Scholar

Agresti, Alan, Booth, James G., Hobert, James P., and Caffo, Brian. 2000. Random-effects modeling of categorical response data. Sociological Methodology 30: 27–80.CrossRef Google Scholar

Agresti, Alan, and Hitchcock, David B. 2005. Bayesian inference for categorical data analysis. Statistical Methods & Applications 14: 297–330.CrossRef Google Scholar

Aitchison, J., and Aitken, C. G. C. 1976. Multivariate binary discrimination by the kernel method. Biometrika 63: 413–20.CrossRef Google Scholar

Asher, Herbert. 2007. Polling and the public: What every citizen should know. 7th ed. Washington, DC: CQ Press.Google Scholar

Balz, Daniel J., and Johnson, Haynes. 2009. The battle for America 2008: The story of an extraordinary election. New York: Viking.Google Scholar

Bandeen-Roche, Karen, Miglioretti, Diana L., Zeger, Scott L., and Rathouz, Paul J. 1997. Latent variable regression for multiple discrete outcomes. Journal of the American Statistical Association 92: 1375–86.CrossRef Google Scholar

Bartholomew, David J., Steele, Fiona, Moustaki, Irini, and Galbraith, Jane I. 2008. Analysis of multivariate social science data. 2nd ed. Boca Raton, FL: Chapman & Hall.CrossRef Google Scholar

Berry, William, DeMeritt, Jacqueline H., and Esarey, Justin. 2010. Testing for interaction in binary logit and probit models: is a product term essential? American Journal of Political Science 54: 248–66.CrossRef Google Scholar

CBS News. 2008a. CBS News Monthly Poll #2, October 2008 (Computer file). ICPSR26826-v1. Ann Arbor, MI: Inter-University Consortium for Political and Social Research [distributor] January 29, 2010. doi:10.3886/ICPSR26826.Google Scholar

CBS News. 2008b. CBS News Monthly Poll #3, October 2008 [Computer file]. ICPSR26827-v1. Ann Arbor, MI: Inter-University Consortium for Political and Social Research [distributor] January 29, 2010. doi:10.3886/ICPSR26827.Google Scholar

CBS News. 2008c. CBS News Monthly Poll #4, October 2008 (Computer file). ICPSR26832-v1. Ann Arbor, MI: Inter-University Consortium for Political and Social Research [distributor] January 04, 2010. doi:10.3886/ICPSR26832.Google Scholar

CBS News. 2008d. CBS News Monthly Poll #5, October 2008 (Computer file). ICPSR26828-v1. Ann Arbor, MI: Inter-University Consortium for Political and Social Research [distributor] December 14, 2009. doi:10.3886/ICPSR26828.Google Scholar

Cillizza, Chris. 2007. Romney's data cruncher. The Washington Post, 5 July.Google Scholar

Congdon, Peter. 2005. Bayesian models for categorical data. Chichester, UK: John Wiley & Sons.CrossRef Google Scholar

de la Garza, Rodolfo O., and Cortina, Jeronimo. 2007. Are Latinos Republicans but just don't know it? The Latino Vote in the 2000 and 2004 presidential elections. American Politics Research 35: 202–23.CrossRef Google Scholar

Fraley, Chris, and Raftery, Adrian E. 2002. Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association 97: 611–31.CrossRef Google Scholar

Garrett, Elizabeth S., and Zeger, Scott L. 2000. Latent class model diagnosis. Biometrics 56: 1055–67.CrossRef Google Scholar PubMed

Gelman, Andrew, and Hill, Jennifer. 2007. Data analysis using regression and multilevel/hierarchical models. New York: Cambridge University Press.Google Scholar

Ghosh, M., and Rao, J. N. K. 1994. Small area estimation: an appraisal. Statistical Science 9: 55–76.Google Scholar

Goodman, Leo A. 1974a. The analysis of systems of qualitative variables when some of the variables are unobservable. Part I—a modified latent structure approach. The American Journal of Sociology 79: 1179–259.CrossRef Google Scholar

Goodman, Leo A. 1974b. Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika 61: 215–31.CrossRef Google Scholar

Grund, B. 1993. Kernel estimators for cell probabilities. Journal of Multivariate Analysis 46: 283–308.CrossRef Google Scholar

Hagenaars, Jacques A., and McCutcheon, Allan L. 2002. Applied latent class analysis. New York: Cambridge University Press.CrossRef Google Scholar

Hall, Peter. 1981. On nonparametric multivariate binary discrimination. Biometrika 68: 287–94.CrossRef Google Scholar

Heeringa, Steven G., West, Brady T., and Berglund, Patricia A. 2010. Applied survey data analysis. Boca Raton, FL: Chapman and Hall.CrossRef Google Scholar

Huang, Guan-Hua. 2005. Selecting the number of classes under latent class regression: a factor analytic analogue. Psychometrika 70: 325–45.CrossRef Google Scholar

Jackson, John. 1989. An errors-in-variables approach to estimating models with small area data. Political Analysis 1: 157–80.CrossRef Google Scholar

Jamieson, Kathleen Hall 2009. Electing the President, 2008: The insiders' view. Philadelphia, PA: University of Pennsylvania Press.Google Scholar

Lax, Jeffrey R., and Phillips, Justin H. 2009. How should we estimate public opinion in the states? American Journal of Political Science 53: 107–21.CrossRef Google Scholar

Lazarsfeld, Paul F. 1950. The logical and mathematical goundations of latent structure analysis. In Measurement and prediction, ed. Stouffer, Samuel A., 362–412. New York: John Wiley & Sons.Google Scholar

Leal, David L., Barreto, Matt A., Lee, Jongho, and de la Garza, Rodolfo O. 2005. The Latino Vote in the 2004 Election. PS: Political Science & Politics 38: 41–9.Google Scholar

Linzer, Drew A., and Lewis, Jeffrey. 2010. poLCA: Polytomous variable latent class analysis. R package version 1.2. http://userwww.service.emory.edu/∼dlinzer/poLCA.Google Scholar

Long, J. Scott. 1997. Regression models for categorical and limited dependent variables. Thousand Oaks, CA: Sage Publications.Google Scholar

Maddala, G. S. 1983. Limited-Dependent and Qualitative Variables in Econometrics. New York: Cambridge University Press.CrossRef Google Scholar

McLachlan, Geoffrey J., and Peel, David. 2000. Finite mixture models. New York: John Wiley & Sons.CrossRef Google Scholar

National Election Pool. 2008. Poll #2008-NATELEC: National Election Day Exit Poll (USMI2008-NATELEC). ABC News/Associated Press/CBS News/CNN/Fox News/NBC News.Google Scholar

National Election Pool, Edison Media Research, and Mitofsky International. 2004. National Election Pool General Election Exit Polls (Computer file). ICPSR version. Somerville, NJ: Edison Media Research/New York, NY: Mitofsky International [producers], 2004. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor].Google Scholar

Nylund, Karen L., Asparouhov, Tihomi, and Muthén, Bengt O. 2007. Deciding on the number of classes in latent class analysis and growth mixture modeling: a Monte Carlo simulation study. Structural Equation Modeling 14: 535–69.CrossRef Google Scholar

Park, David K., Gelman, Andrew, and Bafumi, Joseph. 2004. Bayesian multilevel estimation with poststratification: state-level estimates from national polls. Political Analysis 12: 375–85.CrossRef Google Scholar

R Development Core Team. 2010. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. ISBN 3-900051-07-0. http://www.R-project.org.Google Scholar

Rao, J. N. K. 2003. Small area estimation. Hoboken, NJ: John Wiley & Sons.CrossRef Google Scholar

Simonoff, Jeffrey S. 1995. Smoothing categorical data. Journal of Statistical Planning and Inference 47: 41–69.CrossRef Google Scholar

Taylor, Paul, and Fry, Richard. 2007. Hispanics and the 2008 Election: A swing vote? Washington, DC: Pew Hispanic Center.Google Scholar

Titterington, D. M. 1980. A comparative study of kernel-based density estimates for categorical data. Technometrics 22: 259–68.CrossRef Google Scholar

Todd, Chuck, and Gawiser, Sheldon. 2009. How Barack Obama won: A state-by-state guide to the historic 2008 presidential election. New York: Vintage Books.Google Scholar

Census Bureau, U. S. 2008. American Community Survey 1-year estimates. http://www.census.gov/acs/www (accessed May 13, 2010).Google Scholar

Vermunt, Jeroen K., Van Ginkel, Joost R., Andries Van der Ark, L., and Sijtsma, Klaas. 2008. Multiple imputation of incomplete categorical data using latent class analysis. Sociological Methodology 38: 369–97.CrossRef Google Scholar

Article contents

Reliable Inference in Highly Stratified Contingency Tables: Using Latent Class Models as Density Estimators

Abstract

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests