Skip to main content Accessibility help

Improving Data Analysis in Political Science

  • Edward R. Tufte

Students of politics use statistical and quantitative techniques to: summarize a large body of numbers into a small collection of typical values;

confirm (and perhaps sanctify) the results of the analysis by using tests of statistical significance that help protect against sampling and measurement error;

discover what's going on in their data and expose some new relationships; and

inform their audience what's going on in the data.

Hide All

1 For similar categories, see Tukey, John W. and Wilk, M. B., “Data Analysis: Techniques and Approaches,” Proceedings of the Symposium on Information Processing in Sight Sensory Systems (Pasadena, California Institute of Technology, November 1965). This paper is also reprinted in Edward Tufte, R., ed., The Quantitative Analysis of Social Problems (Reading, Mass. 1969).

2 An important exception is Wallis, W. Allen and Roberts, Harry V., Statistics: A New Approach (Glencoe, 111. 1956).

3 J. David Singer, ed. (New York 1968).

4 See Kaplan, Abraham, The Conduct of Inquiry (San Francisco 1964), chap. 1.

5 Kish, Leslie, “Some Statistical Problems in Research Design,” American Sociological Review, 24 (June 1959), 336. Another good discussion of significance tests is Kruskal, William H., “Tests of Significance,” International Encyclopedia of the Social Sciences (New York 1968), vol. 14, 238–50.

6 Edwards, Ward, Lindman, Harold, and Savage, Leonard J., “Bayesian Statistical Inference for Psychological Research,” Psychological Review, 70 (May 1963), 217.

7 Kish, 336.

8 Mosteller, Frederick and Hammel, E. A., book review, Journal of the American Statistical Association, 58 (September 1963), 836.

9 For a recent statement of Stevens, S. S., see his “Measurement, Statistics, and the Schemapiric View,” Science, 161 (August 30, 1968), 849–56.

10 For a discussion of the problem of “being arbitrary,” see Nunnally, J. C., Psychometric Theory (New York 1967), chap. 1.

11 One common practice is to convert the numerical values of variables into ordered ranks before computing measures of association. Such a transformation, presumably made because it somehow seems statistically more conservative (it is not), may throw away useful information in the data and also sometimes discourage efforts at multi-variate analysis. One other alternative is to employ some of the nonmetric multivariate methods.

12 The discussion here is necessarily rather brief. For more information, see Abelson, Robert P. and Tukey, John W., “Efficient Conversion of Non-Metric Information into Metric Information,” Proceedings of the Social Statistics Section of the American Statistical Association (Washington 1959), 226–30 (also in Tufte); , Abelson and , Tukey, “Efficient Utilization of Nonnumerical Information in Quantitative Analysis: General Theory and the Case of Simple Order,” Annals of Mathematical Statistics, 34 (December 1963), 1347–69; and Shepard, Roger N., “Metric Structures in Ordinal Data,” Journal of Mathematical Psychology, 3 (1966), 287315.

13 Tukey, John, “Causation, Regression, and Path Analysis,” in Kempthorne, Oscar and others, eds., Statistics and Mathematics in Biology (Ames, Iowa 1954), 38.

14 See Forbes, Hugh Donald and Tufte, Edward R., “A Note of Caution in Causal Modelling,” American Political Science Review, LXII (December 1968), 1258–64, and further discussion at 1269–71.

15 See Wallis and Roberts, 546–56, on the hazards of ratios. The problem was discussed by Pearson, Karl, “Mathematical Contributions to the Theory of Evolution—On a Form of Spurious Correlation Which May Arise When Indices Are Used in the Measurement of Organs,” Proceedings of the Royal Society of London, LX (1897), 489–98. See also Kuh, Edwin and Meyer, John R., “Correlation and Regression Estimates When the Data are Ratios,” Econometrica, 23 (October 1955), 400–16; and Briggs, F.E.A., “The Influence of Errors on the Correlation of Ratios,” Econometrica, 30 (January 1962), 162–77.

16 Tukey, 35–66. Blalock, Hubert M. Jr., makes a similar argument in his “Causal Inferences, Closed Populations, and Measures of Association,” American Political Science Review, LXI (March 1967), 130–36. Blalock's application of the argument to the Miller-Stokes data, however, is a most inappropriate example. For some useful applications and contrasts between standardized and unstandardized regression coefficients, see Alker, Hayward R. Jr. and Russett, Bruce, “Multifactor Explanations of Social Change,” in Russett and others, World Handbook and Political and Social Indicators (New Haven 1964), 311–21.

17 Alker's, Hayward“The Long Road to International Relations Theory: Problems of Statistical Nonadditivity,” World Politics, xviii (July 1966), at 646–47, has a useful discussion of this point.

18 There are a number of other areas in which current packaged programs are deficient for the needs of social scientists. Tw o examples here serve to show that we must be careful even though the result came out of the computer. Longley, in a test of commonly used regression programs, found many inaccuracies in the output—including even the wrong sign attached to some coefficients! In this analysis of difficult but real test data (with highly collinear variables), several well-known programs proved accurate to only one or two digits in their estimates of regression coefficients. See Longley, James W., “An Appraisal of Least Squares Programs from the Point of View of the User,” Journal of the American Statistical Association, 62 (September 1962), 819–41. Second, many cross-tabulation programs have contributed to the frequent misuse of the chi-square test in the analysis of contingency tables. The test is not appropriate for ordered metrics. Of course, it is not entirely the fault of programs when their users dutifully report whatever the printout says.

19 Tukey and Wilk, 12.

20 For converting nonlinear models into linear fit problems, see the useful book by Draper, N. R. and Smith, H., Applied Regression Analysis (New York 1966), chap. 5. The best place to learn about transformations is in the informative and straightforward essay by Kruskal, Joseph B., “Transformations of Data,” International Encyclopedia of the Social Sciences (New York 1968), vol. 16, 182–53.

21 See , Alker and , Russett, 311–13; also J. Johnston, Econometric Methods (New York 1963). 4452.

22 See J. B. Kruskal and the references cited there. Another useful discussion is Car. Hovland, I., Lumsdaine, Arthur A., and Sheffield, Fred D., “A Baseline for Measurement of Percentage Change,” in Lazarsfeld, Paul and Rosenberg, Morris, eds., The Language of Social Research (Glencoe, 111. 1955), 7782.

23 Johnston, J., 201–07; Blalock, Hubert M. Jr., “Correlated Independent Variables: The Problem of Multicollinearity,” Social Forces, 62 (December 1963), 233–37; and Farrar, Donald E. and Glauber, Robert R., “Multicollinearity in Regression Analysis: The Problem Revisited,” Review of Economics and Statistics, 49 (February 1967), 92107.

24 See Putnam, Robert D., “Toward Explaining Military Intervention in Latin American Politics,” World Politics, xx (October 1967), 9495. The finding that economic development is positively correlated with military intervention after the effect of social mobilization is removed is unfortunately not testable because of the high instability of the partial correlation due to multicollinearity. Another example of the problem is discussed in Forbes and Tufte, 1262–64.

25 Johnston, 207. See Farrar and Glauber for discussion of some modest palliatives.

* I wish to thank Hayward Alker, Jr., Stanley Kelley, Jr., Gerald Kramer, John McCarthy, Susanne Mueller, Walter Murphy, Dennis Thompson, and John Tukey for their advice and criticism. I also thank Joseph Verbalis, who constructed the figures.

Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

World Politics
  • ISSN: 0043-8871
  • EISSN: 1086-3338
  • URL: /core/journals/world-politics
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed