Hostname: page-component-7479d7b7d-jwnkl Total loading time: 0 Render date: 2024-07-12T16:40:44.416Z Has data issue: false hasContentIssue false

Recognizing Sample-Selection Bias in Historical Data

Published online by Cambridge University Press:  06 July 2020

Ariell Zimran*
Vanderbilt University and the National Bureau of Economic Research


Recent research has ignited a debate in social science history over whether and how to draw conclusions for whole populations from sources that describe only select subsets of these populations. The idiosyncratic availability and survival of historical sources create a threat of sample-selection bias—an error that arises when there are systematic differences between the observed sample and the population of interest. This danger is common in studying trends in health as measured by average stature—scholars can often observe these trends only for soldiers and other similar groups; but whether these patterns are representative of those of the broader population is unclear. This article illustrates what simple patterns in a potentially selected sample can be used to recognize the presence of sample-selection bias in a source, and to understand how such bias might affect conclusions drawn from this source. Applying this intuition to the use of military data to describe stature in the antebellum United States, I present several simple empirical exercises based on these patterns. Finally, I use the results of these exercises to describe how sample-selection bias might affect the use of these data in testing for differences in average stature between the Northeast and the Midwest.

Special Issue Article
© The Author(s), 2020. Published by Cambridge University Press on behalf of the Social Science History Association

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)


Abramitzky, Ran (2015) “Economics and the modern economic historian.Journal of Economic History 75 (4): 1240–51.CrossRefGoogle Scholar
Biavaschi, Costanza, Guilietti, Corrado, and Siddique, Zahra (2017) “The economic payoff of name Americanization.Journal of Labor Economics 35 (4): 1089–116.CrossRefGoogle Scholar
Bodenhorn, Howard, Guinnane, Timothy W, and Mroz, Thomas A (2017) “Sample-selection biases and the industrialization puzzle.Journal of Economic History 77 (1): 171207.CrossRefGoogle Scholar
Bodenhorn, Howard, Guinnane, Timothy W, and Mroz, Thomas A (2019) “Diagnosing sample-selection bias in historical heights: A reply to Komlos and A’Hearn.Journal of Economic History 79 (4): 1154–75.CrossRefGoogle Scholar
Bushway, Shawn, Johnson, Brian D, and Slocum, Lee Ann (2007) “Is the magic still there? The use of the Heckman two-step correction for selection bias in criminology.Journal of Quantitative Criminology 23 (2): 151–78.CrossRefGoogle Scholar
Coffman, Edward M. (1986) The Old Army: A Portrait of the American Army in Peacetime, 1784–1898. New York: Oxford University Press.Google Scholar
Collins, William J. (2015) “Looking forward: Positive and normative views of economic history’s future.Journal of Economic History 75 (4): 1228–33.CrossRefGoogle Scholar
Cosslett, Stephen R. (1981) “Efficient estimation of discrete-choice models,” in Manski, Charles F and McFadden, Daniel (eds.) Structural Analysis of Discrete Data with Econometric Applications. Cambridge, MA: MIT Press: 51111.Google Scholar
Costa, Dora L., and Kahn, Matthew E (2003) “Cowards and heroes: Group loyalty in the American Civil War.Quarterly Journal of Economics 118 (2): 519–48.CrossRefGoogle Scholar
Costa, Dora L., and Kahn, Matthew E (2007) “Deserters, social norms, and migration.Journal of Law and Economics 50 (2): 323–53.CrossRefGoogle Scholar
Cuff, Timothy (2005) The Hidden Cost of Economic Development: The Biological Standard of Living in Antebellum Pennsylvania. Burlington, VT: Ashgate.Google Scholar
Easterlin, Richard (1960) “Interregional differences in per capita income, population, and total income, 1840–1950,” in National Bureau of Economic Research Council (eds.) Trends in the American Economy in the Nineteenth Century. Conference on Research in Income and Wealth. Princeton, NJ: Princeton University Press: 73140.Google Scholar
Eli, Shari, Salisbury, Laura, and Shertzer, Allison (2018) “Ideology and migration after the American Civil War.Journal of Economic History 78 (3): 822–61.CrossRefGoogle Scholar
Ferrie, Joseph P. (1997) “Migration to the frontier in mid-nineteenth century America: A re-examination of Turner’s ‘safety valve.’Mimeograph, Northwestern University.Google Scholar
Floud, Roderick, Fogel, Robert W, Harris, Bernard, and Hong, Sok Chul (2011) The Changing Body: Health, Nutrition, and Human Development in the Western World since 1700. New York: Cambridge University Press.CrossRefGoogle Scholar
Fogel, Robert W. (1986) “Nutrition and the decline in mortality since 1700: Some preliminary findings,” in Engerman, Stanley L and Gallman, Robert E (eds.) Long-Term Factors in American Economic Growth. Chicago: University of Chicago Press: 439556.Google Scholar
Fogel, Robert W., Costa, Dora L, Haines, Michael R, Lee, Chulhee, Nguyen, Louis, Pope, Clayne, Rosenberg, Irvin, Scrimshaw, Nevin, Trussell, James, Wilson, Sven, Wimmer, Larry T, Kim, John, Bassett, Julene, Burton, Joseph, and Yetter, Noelle (2000) Aging of Veterans of the Union Army: Version M-5. Chicago: Center for Population Economics, University of Chicago Graduate School of Business, Department of Economics, Brigham Young University, and the National Bureau of Economic Research.Google Scholar
Fogel, Robert W., and Engerman, Stanley L (1974) Time on the Cross: The Economics of American Negro Slavery. Boston: Little, Brown, and Co.Google Scholar
Fogel, Robert W., Engerman, Stanley L, Floud, Roderick, Friedman, Gerald, Margo, Robert A., KennethSokoloff, , Steckel, Richard H., Trussell, T. James, Villaflor, Georgia, and Wachter, Kenneth W (1983) “Secular changes in American and British Stature and Nutrition.Journal of Interdisciplinary History 14 (2): 445–81.CrossRefGoogle ScholarPubMed
Foner, Jack D. (1970) The United States Soldier between Two Wars: Army Life and Reforms, 1865–1898. New York: Humanities Press.Google Scholar
Gallman, Robert E. (1996) “Dietary change in antebellum America.Journal of Economic History 56 (1): 193201.CrossRefGoogle Scholar
Gould, Benjamin Apthorp (1869) Investigations in the Military and Anthropological Statistics of American Soldiers. Sanitary Memoirs of the War of the Rebellion. Collected and Published by the United States Sanitary Commission. New York: Hurd and Houghton.Google Scholar
Heckman, James J. (1979) “Sample selection bias as a specification error.Econometrica 47 (1): 153–61.CrossRefGoogle Scholar
ICPSR (1999) United States Historical Election Returns, 1824–1968 (ICPSR 1) [machine-readable database]. Ann Arbor, MI.Google Scholar
Komlos, John (2004) “How to (and how not to) analyze deficient height samples: An introduction.Historical Methods 37 (4): 160–73.CrossRefGoogle Scholar
Komlos, John (2012) “A three-decade history of the antebellum puzzle: Explaining the shrinking of the US Population at the Onset of Modern Economic Growth.Journal of the Historical Society 12 (4): 395445.CrossRefGoogle Scholar
Komlos, John (2019) “Shrinking in a growing economy is not so puzzling after all.Economics and Human Biology 32: 4055.CrossRefGoogle Scholar
Komlos, John (2020) “Multicollinearity in the presence of errors-in-variables can increase the probability of type-I error.” Journal of Economics and Econometrics 63 (1): 117.Google Scholar
Komlos, John, and A’Hearn, Brian (2019) “Clarifications of a puzzle: The decline in nutritional status at the onset of modern economic growth in the United States.” Journal of Economic History 79 (4): 1129–53.CrossRefGoogle Scholar
Kosack, Edward, and Ward, Zachary (2014) “Who crossed the border? Self-selection of Mexican migrants in the early twentieth century.Journal of Economic History 74 (4): 1015–44.CrossRefGoogle Scholar
Logan, Trevon D., and Pritchett, Jonathan B (2018) “On the marital status of US slaves: Evidence from Touro Infirmary, New Orleans, Louisiana.Explorations in Economic History 69: 5063.CrossRefGoogle Scholar
Manson, Steven, Schroeder, Jonathan, Van Riper, David, and Ruggles, Steven (2017) IPUMS National Historical Geographic Information System: Version 12.0 [Database]. Minneapolis: University of Minnesota.Google Scholar
Margo, Robert A., and Steckel, Richard H (1983) “Heights of native-born whites during the antebellum period.Journal of Economic History 43 (1): 167–74.CrossRefGoogle ScholarPubMed
McKeown, Thomas (1976) The Modern Rise of Population. London: Arnold.Google Scholar
Mitch, David (1993) “The role of human capital in the first Industrial Revolution,” in Mokyr, Joel (ed.) The British Industrial Revolution: An Economic Perspective. Boulder, CO: Westview: 267307.Google Scholar
Mokyr, Joel, and Gráda, Cormac Ó (1996) “Height and health in the United Kingdom 1815–1860: Evidence from the East India Company Army.Explorations in Economic History 33: 141–68.CrossRefGoogle Scholar
Records of the Adjutant General’s Office (1861–1865) Regimental records, including descriptive rolls, order, and letter books, and morning reports, of volunteer organizations, Civil War, 1861–65. Records relating to volunteers and volunteer organizations. Record Group 94.2.4. Washington, DC: National Archives Building.Google Scholar
Ruggles, Steven, Genadek, Katie, Goeken, Ronald, Grover, Josiah, and Sobek, Matthew (2015) Integrated Public Use Microdata Series: Version 6.0 [machine-readable database]. Minneapolis: University of Minnesota.Google Scholar
Steckel, Richard H., and Ziebarth, Nicolas (2016) “Selectivity and measured catch-up growth of American Slaves.Journal of Economic History 76 (1): 104–38.CrossRefGoogle Scholar
Stewart, James I. (2006) “Migration to the agricultural frontier and wealth accumulation.Explorations in Economic History 43: 547–77.CrossRefGoogle Scholar
Vella, Francis (1998) “Estimating models with sample selection bias: A survey.Journal of Human Resources 33 (1): 127–69.CrossRefGoogle Scholar
Weigley, Russell F. (1967) History of the United States Army. New York: The Macmillan Company.Google Scholar
Zehetmayer, Matthias (2011) “The continuation of the antebellum puzzle: Stature in the US, 1847–1894.European Review of Economic History 15: 313–27.CrossRefGoogle Scholar
Zimran, Ariell (2018) “Replication: Sample-selection bias and height trends in the nineteenth-century United States.” Ann Arbor, MI: Inter-University Consortium for Political and Social Research [distributor]. Scholar
Zimran, Ariell (2019) “Sample-selection bias and height trends in the nineteenth-century United States.Journal of Economic History 79 (1): 99138.CrossRefGoogle Scholar
Zimran, Ariell (2020) “Transportation and health in the antebellum United States, 1820–1847.” Journal of Economic History, Forthcoming.CrossRefGoogle Scholar