Skip to main content Accessibility help

Ignoramus, Ignorabimus? On Uncertainty in Ecological Inference

  • Martin Elff (a1), Thomas Gschwend (a2) and Ron J. Johnston (a3)

Models of ecological inference (EI) have to rely on crucial assumptions about the individual-level data-generating process, which cannot be tested because of the unavailability of these data. However, these assumptions may be violated by the unknown data and this may lead to serious bias of estimates and predictions. The amount of bias, however, cannot be assessed without information that is unavailable in typical applications of EI. We therefore construct a model that at least approximately accounts for the additional, nonsampling error that may result from possible bias incurred by an EI procedure, a model that builds on the Principle of Maximum Entropy. By means of a systematic simulation experiment, we examine the performance of prediction intervals based on this second-stage Maximum Entropy model. The results of this simulation study suggest that these prediction intervals are at least approximately correct if all possible configurations of the unknown data are taken into account. Finally, we apply our method to a real-world example, where we actually know the true values and are able to assess the performance of our method: the prediction of district-level percentages of split-ticket voting in the 1996 General Election of New Zealand. It turns out that in 95.5% of the New Zealand voting districts, the actual percentage of split-ticket votes lies inside the 95% prediction intervals constructed by our method.

Corresponding author
e-mail: (corresponding author)
Hide All

Authors' note: We thank three anonymous reviewers for helpful comments and suggestions on earlier versions of this paper. An appendix giving some technical background information concerning our proposed method, as well as data, R code, and C code to replicate analyses presented in this paper are available from the Political Analysis Web site. Later versions of the code will be packaged into an R library and made publicly available on CRAN ( and on the corresponding author's Web site.

Hide All
Abramovitz, Milton, and Stegun, Irene A., eds. 1964. Handbook of mathematical functions with formulas, graphs, and mathematical tables. Washington, DC: National Bureau of Standards.
Abramson, Paul R., and Claggett, William. 1984. Race-related differences in self-reported and validated turnout. Journal of Politics 46: 719–38.
Benoit, Kenneth, Laver, Michael, and Gianetti, Daniela. 2004. Multiparty split-ticket voting estimation as an ecological inference problem. In Ecological inference: New methodological strategies, ed. King, Gary, Rosen, Ori, and Tanner, Martin, 333–50. Cambridge, UK: Cambridge University Press.
Brown, Philip J., and Payne, Clive D. 1986. Aggregate data, ecological regression, and voting transitions. Journal of the American Statistical Association 81: 452–60.
Burden, Barry C., and Kimball, David C. 1998. A new approach to the study of ticket splitting. American Political Science Review 92: 533–44.
Cho, Wendy K. Tam. 1998. Iff the assumption fits …: A comment on the King ecological inference solution. Political Analysis 7: 143–63.
Cho, Wendy K. Tam, and Manski, Charles F. Forthcoming. Cross-level/ecological inference. In Oxford handbook of political methodology, ed. Box-Steffensmeier, Janet, Brady, Henry, and Collier, David. Oxford, UK: Oxford University Press.
Cirincione, C., Darling, T. A., and O'Rourke, T. G. 2000. Assessing South Carolina's congressional districting. Political Geography 19: 189211.
Crowder, Martin J. 1978. Beta-binomial ANOVA for proportions. Applied Statistics 27: 34–7.
Cziszar, Imre. 1991. Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems. Annals of Statistics 19: 2032–66.
Fienberg, Stephen E., Holland, Paul W., and Bishop, Yvonne. 1977. Discrete multivariate analysis: Theory and practice. Cambridge, MA: MIT Press.
Fienberg, Stephen E., and Robert, Christian P. 2004. Comment to ‘Ecological inference for 2 × 2 tables' by Jon Wakefield. Journal of the Royal Statistical Society: Series A (Statistics in Society) 167: 432–4.
Golan, Amos, Judge, George, and Perloff, Jeffrey M. 1996. A maximum entropy approach to recovering information from multinomial response data. Journal of the American Statistical Association 91: 841–53.
Good, I. J. 1963. Maximum entropy for hypothesis formulation, especially for multidimensional contingency tables. Annals of Mathematical Statistics 34: 911–34.
Goodman, Leo A. 1953. Ecological regressions and the behavior of individuals. American Sociological Review 18: 663–4.
Goodman, Leo A. 1959. Some alternatives to ecological correlation. American Journal of Sociology 64: 610–25.
Groetsch, Charles W. 1993. Inverse problems in the mathematical sciences. Braunschweig and Wiesbaden: Vieweg.
Gschwend, Thomas, Johnston, Ron, and Pattie, Charles. 2003. Split-ticket patterns in mixed-member proportional election systems: Estimates and analyses of their spatial variation at the German federal election, 1998. British Journal of Political Science 33: 109–27.
Herron, Michael C., and Shotts, Kenneth W. 2003. Using ecological inference point estimates as dependent variables in second-stage linear regressions. Political Analysis 11: 4464.
Herron, Michael C., and Shotts, Kenneth W. 2004. Logical inconsistency in EI-based second-stage regressions. American Journal of Political Science 48: 172–83.
Hoadley, Bruce. 1969. The compound multinomial distribution and Bayesian analysis of categorical data from finite populations. Journal of the American Statistical Association 64: 216–29.
Jaynes, Edwin T. 1957. Information theory and statistical mechanics. Physical Review 106: 620–30.
Jaynes, Edwin T. 1968. Prior probabilities. IEEE Transactions on Systems Science and Cybernetics 4: 227–41.
Johnston, Ron J., and Hay, A. M. 1983. Voter transition probability estimates: An entropy-maximizing approach. European Journal of Political Research 11: 405–22.
Johnston, Ron J., and Pattie, Charles. 2000. Ecological inference and entropy-maximizing: An alternative estimation procedure for split-ticket voting. Political Analysis 8: 333–45.
Judge, George G., Miller, Douglas J., and Tam Cho, Wendy K. 2004. An information theoretic approach to ecological estimation and inference. In Ecological inference: New methodological strategies, ed. King, Gary, Rosen, Ori, and Tanner, Martin, 162–87. Cambridge, UK: Cambridge University Press.
King, Gary. 1997. A solution to the ecological inference problem: Reconstructing individual behavior from aggregate data. Princeton: Princeton University Press.
King, Gary. 1998. Unifying political methodology: The likelihood theory of statistical inference. Ann Arbor, MI: Michigan University Press.
King, Gary, Honaker, James, Joseph, Anne, and Scheve, Kenneth. 2001. Analyzing incomplete political science data: An alternative algorithm for multiple imputation. American Political Science Review 95: 4969.
King, Gary, Rosen, Ori, and Tanner, Martin A. 1999. Binomial-beta hierarchical models for ecological inference. Sociological Methods and Research 28: 6190.
Kullback, Solomon. 1959. Information theory and statistics. New York: Wiley.
Levine, Stephen, and Roberts, Nigel S. 1997. Surveying the snark: Voting behaviour in the 1996 New Zealand general election. In From campaign to coalition: New Zealand's first general election under proportional representation, ed. Boston, Jonathan, Levine, Stephen, McLeay, Elizabeth, and Roberts, Nigel, 183–97. Palmerston North, NZ: Dunmore Press.
Mosimann, James E. 1962. On the compound multinomial distribution, the multivariate β-distribution, and correlations among proportions. Biometrika 49: 6582.
Openshaw, S., and Taylor, P. J. 1979. A million or so correlation coefficients: Three experiments on the modifiable areal unit problem. In Statistical methods in the spatial sciences, ed. Wrigley, N., 127–44. London: Pion.
Openshaw, S., and Taylor, P. J. 1981. The modifiable areal unit problem. In Quantitative geography: A British view, ed. Wrigley, N. and Bennett, R. J., 6070. London: Routledge and Kegan Paul.
Prentice, R. L. 1986. Binary regression using an extended beta-binomial distribution, with discussion of correlation induced by covariate measurement errors. Journal of the American Statistical Association 81: 321–7.
Rosen, Ori, Jiang, Wenxing, King, Gary, and Tanner, Martin A. 2001. Bayesian and frequentist inference for ecological inference: The r × c case. Statistica Neerlandica 55: 134–56.
Shannon, Claude E. 1948. A mathematical theory of communication. Bell System Technical Journal 27: 379423, 623-56.
Skellam, J. G. 1948. A probability distribution derived from the binomial distribution by regarding the probability as variable between the sets of trials. Journal of the Royal Statistical Society. Series B (Methodological) 10: 257–61.
Steel, David G., Beh, Eric J., and Chambers, Ray L. 2004. The information in aggregate data. In Ecological inference: New methodological strategies, ed. King, Gary, Rosen, Ori, and Tanner, Martin, 5168. Cambridge, UK: Cambridge University Press.
Uffink, Jos. 1995. Can the maximum entropy principle be explained as a consistency requirement? Studies in History and Philosophy of Modern Physics 26B: 223–61.
Vardi, Y., and Lee, D. 1993. From image deblurring to optimal investments: Maximum likelihood solutions for positive linear inverse problems. Journal of the Royal Statistical Society. Series B (Methodological) 55: 569612.
Vasicek, Oldrich Alfonso. 1980. A conditional law of large numbers. Annals of Probability 8: 142–7.
Wakefield, Jon. 2004. Ecological inference for 2 × 2 tables. Journal of the Royal Statistical Society: Series A (Statistics in Society) 167: 385426.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Political Analysis
  • ISSN: 1047-1987
  • EISSN: 1476-4989
  • URL: /core/journals/political-analysis
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed