Skip to main content Accessibility help

Recognizing Sample-Selection Bias in Historical Data

  • Ariell Zimran (a1)


Recent research has ignited a debate in social science history over whether and how to draw conclusions for whole populations from sources that describe only select subsets of these populations. The idiosyncratic availability and survival of historical sources create a threat of sample-selection bias—an error that arises when there are systematic differences between the observed sample and the population of interest. This danger is common in studying trends in health as measured by average stature—scholars can often observe these trends only for soldiers and other similar groups; but whether these patterns are representative of those of the broader population is unclear. This article illustrates what simple patterns in a potentially selected sample can be used to recognize the presence of sample-selection bias in a source, and to understand how such bias might affect conclusions drawn from this source. Applying this intuition to the use of military data to describe stature in the antebellum United States, I present several simple empirical exercises based on these patterns. Finally, I use the results of these exercises to describe how sample-selection bias might affect the use of these data in testing for differences in average stature between the Northeast and the Midwest.

  • View HTML
    • Send article to Kindle

      To send this article to your Kindle, first ensure is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

      Note you can select to send to either the or variations. ‘’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

      Find out more about the Kindle Personal Document Service.

      Recognizing Sample-Selection Bias in Historical Data
      Available formats

      Send article to Dropbox

      To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

      Recognizing Sample-Selection Bias in Historical Data
      Available formats

      Send article to Google Drive

      To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

      Recognizing Sample-Selection Bias in Historical Data
      Available formats


Corresponding author


Hide All
Abramitzky, Ran (2015) “Economics and the modern economic historian.Journal of Economic History 75 (4): 1240–51.
Biavaschi, Costanza, Guilietti, Corrado, and Siddique, Zahra (2017) “The economic payoff of name Americanization.Journal of Labor Economics 35 (4): 1089–116.
Bodenhorn, Howard, Guinnane, Timothy W, and Mroz, Thomas A (2017) “Sample-selection biases and the industrialization puzzle.Journal of Economic History 77 (1): 171207.
Bodenhorn, Howard, Guinnane, Timothy W, and Mroz, Thomas A (2019) “Diagnosing sample-selection bias in historical heights: A reply to Komlos and A’Hearn.Journal of Economic History 79 (4): 1154–75.
Bushway, Shawn, Johnson, Brian D, and Slocum, Lee Ann (2007) “Is the magic still there? The use of the Heckman two-step correction for selection bias in criminology.Journal of Quantitative Criminology 23 (2): 151–78.
Coffman, Edward M. (1986) The Old Army: A Portrait of the American Army in Peacetime, 1784–1898. New York: Oxford University Press.
Collins, William J. (2015) “Looking forward: Positive and normative views of economic history’s future.Journal of Economic History 75 (4): 1228–33.
Cosslett, Stephen R. (1981) “Efficient estimation of discrete-choice models,” in Manski, Charles F and McFadden, Daniel (eds.) Structural Analysis of Discrete Data with Econometric Applications. Cambridge, MA: MIT Press: 51111.
Costa, Dora L., and Kahn, Matthew E (2003) “Cowards and heroes: Group loyalty in the American Civil War.Quarterly Journal of Economics 118 (2): 519–48.
Costa, Dora L., and Kahn, Matthew E (2007) “Deserters, social norms, and migration.Journal of Law and Economics 50 (2): 323–53.
Cuff, Timothy (2005) The Hidden Cost of Economic Development: The Biological Standard of Living in Antebellum Pennsylvania. Burlington, VT: Ashgate.
Easterlin, Richard (1960) “Interregional differences in per capita income, population, and total income, 1840–1950,” in National Bureau of Economic Research Council (eds.) Trends in the American Economy in the Nineteenth Century. Conference on Research in Income and Wealth. Princeton, NJ: Princeton University Press: 73140.
Eli, Shari, Salisbury, Laura, and Shertzer, Allison (2018) “Ideology and migration after the American Civil War.Journal of Economic History 78 (3): 822–61.
Ferrie, Joseph P. (1997) “Migration to the frontier in mid-nineteenth century America: A re-examination of Turner’s ‘safety valve.’Mimeograph, Northwestern University.
Floud, Roderick, Fogel, Robert W, Harris, Bernard, and Hong, Sok Chul (2011) The Changing Body: Health, Nutrition, and Human Development in the Western World since 1700. New York: Cambridge University Press.
Fogel, Robert W. (1986) “Nutrition and the decline in mortality since 1700: Some preliminary findings,” in Engerman, Stanley L and Gallman, Robert E (eds.) Long-Term Factors in American Economic Growth. Chicago: University of Chicago Press: 439556.
Fogel, Robert W., Costa, Dora L, Haines, Michael R, Lee, Chulhee, Nguyen, Louis, Pope, Clayne, Rosenberg, Irvin, Scrimshaw, Nevin, Trussell, James, Wilson, Sven, Wimmer, Larry T, Kim, John, Bassett, Julene, Burton, Joseph, and Yetter, Noelle (2000) Aging of Veterans of the Union Army: Version M-5. Chicago: Center for Population Economics, University of Chicago Graduate School of Business, Department of Economics, Brigham Young University, and the National Bureau of Economic Research.
Fogel, Robert W., and Engerman, Stanley L (1974) Time on the Cross: The Economics of American Negro Slavery. Boston: Little, Brown, and Co.
Fogel, Robert W., Engerman, Stanley L, Floud, Roderick, Friedman, Gerald, Margo, Robert A., KennethSokoloff, , Steckel, Richard H., Trussell, T. James, Villaflor, Georgia, and Wachter, Kenneth W (1983) “Secular changes in American and British Stature and Nutrition.Journal of Interdisciplinary History 14 (2): 445–81.
Foner, Jack D. (1970) The United States Soldier between Two Wars: Army Life and Reforms, 1865–1898. New York: Humanities Press.
Gallman, Robert E. (1996) “Dietary change in antebellum America.Journal of Economic History 56 (1): 193201.
Gould, Benjamin Apthorp (1869) Investigations in the Military and Anthropological Statistics of American Soldiers. Sanitary Memoirs of the War of the Rebellion. Collected and Published by the United States Sanitary Commission. New York: Hurd and Houghton.
Heckman, James J. (1979) “Sample selection bias as a specification error.Econometrica 47 (1): 153–61.
ICPSR (1999) United States Historical Election Returns, 1824–1968 (ICPSR 1) [machine-readable database]. Ann Arbor, MI.
Komlos, John (2004) “How to (and how not to) analyze deficient height samples: An introduction.Historical Methods 37 (4): 160–73.
Komlos, John (2012) “A three-decade history of the antebellum puzzle: Explaining the shrinking of the US Population at the Onset of Modern Economic Growth.Journal of the Historical Society 12 (4): 395445.
Komlos, John (2019) “Shrinking in a growing economy is not so puzzling after all.Economics and Human Biology 32: 4055.
Komlos, John (2020) “Multicollinearity in the presence of errors-in-variables can increase the probability of type-I error.” Journal of Economics and Econometrics 63 (1): 117.
Komlos, John, and A’Hearn, Brian (2019) “Clarifications of a puzzle: The decline in nutritional status at the onset of modern economic growth in the United States.” Journal of Economic History 79 (4): 1129–53.
Kosack, Edward, and Ward, Zachary (2014) “Who crossed the border? Self-selection of Mexican migrants in the early twentieth century.Journal of Economic History 74 (4): 1015–44.
Logan, Trevon D., and Pritchett, Jonathan B (2018) “On the marital status of US slaves: Evidence from Touro Infirmary, New Orleans, Louisiana.Explorations in Economic History 69: 5063.
Manson, Steven, Schroeder, Jonathan, Van Riper, David, and Ruggles, Steven (2017) IPUMS National Historical Geographic Information System: Version 12.0 [Database]. Minneapolis: University of Minnesota.
Margo, Robert A., and Steckel, Richard H (1983) “Heights of native-born whites during the antebellum period.Journal of Economic History 43 (1): 167–74.
McKeown, Thomas (1976) The Modern Rise of Population. London: Arnold.
Mitch, David (1993) “The role of human capital in the first Industrial Revolution,” in Mokyr, Joel (ed.) The British Industrial Revolution: An Economic Perspective. Boulder, CO: Westview: 267307.
Mokyr, Joel, and Gráda, Cormac Ó (1996) “Height and health in the United Kingdom 1815–1860: Evidence from the East India Company Army.Explorations in Economic History 33: 141–68.
Records of the Adjutant General’s Office (1861–1865) Regimental records, including descriptive rolls, order, and letter books, and morning reports, of volunteer organizations, Civil War, 1861–65. Records relating to volunteers and volunteer organizations. Record Group 94.2.4. Washington, DC: National Archives Building.
Ruggles, Steven, Genadek, Katie, Goeken, Ronald, Grover, Josiah, and Sobek, Matthew (2015) Integrated Public Use Microdata Series: Version 6.0 [machine-readable database]. Minneapolis: University of Minnesota.
Steckel, Richard H., and Ziebarth, Nicolas (2016) “Selectivity and measured catch-up growth of American Slaves.Journal of Economic History 76 (1): 104–38.
Stewart, James I. (2006) “Migration to the agricultural frontier and wealth accumulation.Explorations in Economic History 43: 547–77.
Vella, Francis (1998) “Estimating models with sample selection bias: A survey.Journal of Human Resources 33 (1): 127–69.
Weigley, Russell F. (1967) History of the United States Army. New York: The Macmillan Company.
Zehetmayer, Matthias (2011) “The continuation of the antebellum puzzle: Stature in the US, 1847–1894.European Review of Economic History 15: 313–27.
Zimran, Ariell (2018) “Replication: Sample-selection bias and height trends in the nineteenth-century United States.” Ann Arbor, MI: Inter-University Consortium for Political and Social Research [distributor].
Zimran, Ariell (2019) “Sample-selection bias and height trends in the nineteenth-century United States.Journal of Economic History 79 (1): 99138.
Zimran, Ariell (2020) “Transportation and health in the antebellum United States, 1820–1847.” Journal of Economic History, Forthcoming.

Recognizing Sample-Selection Bias in Historical Data

  • Ariell Zimran (a1)


Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed.