Hostname: page-component-cd4964975-g4d8c Total loading time: 0 Render date: 2023-03-30T02:40:12.334Z Has data issue: true Feature Flags: { "useRatesEcommerce": false } hasContentIssue true

A simple way to improve multivariate analyses of paleoecological data sets

Published online by Cambridge University Press:  24 February 2015

John Alroy*
Department of Biological Sciences, Macquarie University, New South Wales 2109, Australia. E-mail:


Multivariate methods such as cluster analysis and ordination are basic to paleoecology, but the messy nature of fossil occurrence data often makes it difficult to recover clear patterns. A recently described faunal similarity index based on the Forbes coefficient improves results when its complement is employed as a distance metric. This index involves adding terms to the Forbes equation and ignoring one of the counts it employs (that of species found in neither of the samples under consideration). Analyses of simulated data matrices demonstrate its advantages. These matrices include large and small samples from two partially overlapping species pools. In a cluster analysis, the widely used Dice coefficient and the Euclidean distance metric both create groupings that reflect sample size, the Simpson index suggests large differences that do not exist, and the corrected Forbes index creates groupings based strictly on true faunal overlap. In a principal coordinates analysis (PCoA) the Forbes index almost removes the sample-size signal but other approaches create a second axis strongly dominated by sample size. Meanwhile, species lists of late Pleistocene mammals from the United States capture biogeographic signals that standard ordination methods do recover, but the adjusted Forbes coefficient spaces the points out more sensibly. Finally, when biome-scale lists for living mammals are added to the data set and extinct species are removed, correspondence analysis misleadingly separates out the biome lists, and PCoA based on the Dice coefficient places them to the edge of the cloud of fossil assemblage data points. PCoA based on the Forbes index places them in more reasonable positions. Thus, only the adjusted Forbes index is able to recover true biological patterns. These results suggest that the index may be useful in analyzing not only paleontological data sets but any data set that includes species lists having highly variable lengths.

Featured Article
Copyright © 2015 The Paleontological Society. All rights reserved. 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)


Literature Cited

Alroy, J. 1999. Putting North America’s end-Pleistocene megafaunal extinction in context: large scale analyses of spatial patterns, extinction rates, and size distributions. Pp. 105143in R. D. E. MacPhee, ed. Plenum, New York.Google Scholar
Alroy, J. 2015. A new twist on a very old binary similarity coefficient. Ecology (in press).Google Scholar
Bonelli, J. R. Jr., Brett, C. E., Miller, A. I., and Bennington, J. B.. 2006. Testing for faunal stability across a regional biotic transition: quantifying stasis and variation among recurring coral-rich biofacies in the Middle Devonian Appalachian Basin. Paleobiology 32:2037.CrossRefGoogle Scholar
Brown, J. H., and Nicoletto, P. F.. 1991. Spatial scaling of species composition: body masses of North American land mammals. American Naturalist 138:14781512.CrossRefGoogle Scholar
Bush, A. M., and Brame, R. I.. 2010. Multiple paleoecological controls on the composition of marine fossil assemblages from the Frasnian (Late Devonian) of Virginia, with a comparison of ordination methods. Paleobiology 36:573591.CrossRefGoogle Scholar
Chao, A., Chazdon, R. L., Colwell, R. K., and Shen, T.-J.. 2005. A new statistical approach for assessing similarity of species composition with incidence and abundance data. Ecology Letters 8:148159.CrossRefGoogle Scholar
Choi, S.-S., Cha, S.-H., and Tappert, C. C.. 2010. A survey of binary similarity and distance measures. Systemics, Cybernetics and Informatics 8:4348.Google Scholar
De’ath, G. 1999. Extended similarity: a method of robust estimation of ecology distances from high beta diversity data. Plant Ecology 144:191–190.Google Scholar
Digby, P. G. N., and Kempton, R. A.. 1987. Multivariate analysis of ecological communities. Chapman and Hall, London.Google Scholar
Gauch, H. G. 1982. Multivariate analysis in community ecology. Cambridge University Press, Cambridge.CrossRefGoogle Scholar
Forbes, S. A. 1907. On the local distribution of certain Illinois fishes: an essay in statistical ecology. Bulletin of the Illinois State Laboratory of Natural History 7:272303.Google Scholar
Gower, J. C. 1966. Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53:325338.CrossRefGoogle Scholar
Graham, R. W., Lundelius, E. L. Jr., Graham, M. A., Schroeder, E. K., Toomey, R. S. III, Anderson, E., Barnosky, A. D., Burns, J. A., Churcher, C. S., Grayson, D. K., Guthrie, R. D., Harington, C. R., Jefferson, G. T., Martin, L. D., McDonald, H. G., Morlan, R. E., Semken, H. A. Jr., Webb, S. D., Werdelin, L., and Wilson, M. C.. 1996. Spatial response of mammals to Late Quaternary environmental fluctuations. Science 272:16011606.Google ScholarPubMed
Hagmeier, E. M., and Stults, C. D.. 1964. A numerical analysis of the distributional patterns of North American mammals. Systematic Zoology 13:125155.CrossRefGoogle Scholar
Hill, M. O. 1973. Reciprocal averaging: an eigenvector method of ordination. Journal of Ecology 61:237249.CrossRefGoogle Scholar
Hill, M. O., and Gauch, H. G.. 1980. Detrended correspondence analysis, an improved ordination technique. Vegetatio 42:4758.CrossRefGoogle Scholar
Holland, S. M., Miller, A. I., Meyer, D. L., and Dattilo, B. F.. 2001. The detection and importance of subtle biofacies within a single lithofacies: the Upper Ordovician Kope Formation of the Cincinnati, Ohio region. Palaios 16:205217.2.0.CO;2>CrossRefGoogle Scholar
Hubálek, Z. 1982. Coefficients of association and similarity, based on binary (presence-absence) data: an evaluation. Biological Reviews 57:669689.CrossRefGoogle Scholar
Legendre, P., and Gallagher, E. D.. 2001. Ecologically meaningful transformations for ordination of species data. Oecologia 129:271280.CrossRefGoogle ScholarPubMed
Reyment, R. A. 1963. Multivariate analytical treatment of quantitative species associations: an example from palaeoecology. Journal of Animal Ecology 32:535547.CrossRefGoogle Scholar
Shepard, R. N. 1962. The analysis of proximities: multidimensional scaling with an unknown distance function. II. Psychometrika 27:219246.CrossRefGoogle Scholar
Simpson, G. G. 1943. Mammals and the nature of continents. American Journal of Science 241:131.CrossRefGoogle Scholar
Simpson, G. G. 1964. Species density of North American Recent mammals. Systematic Zoology 13:5773.CrossRefGoogle Scholar
Smith, F. A., Lyons, K., Ernest, S. K. M., Jones, K. E., Kaufman, D., Dayan, T., Marquet, P. A., Brown, J. H., and Haskell, J. P.. 2003. Body mass of late Quaternary mammals. Ecological Archives E084E094.Google Scholar
Tsubamoto, T., Takai, M., and Egi, N.. 2004. Quantitative analyses of biogeography and faunal evolution of middle to late Eocene mammals in East Asia. Journal of Vertebrate Paleontology 24:657667.CrossRefGoogle Scholar
Valentine, J. W., and Peddicord, R. G.. 1967. Evaluation of fossil assemblages by cluster analysis. Journal of Paleontology 41:502507.Google Scholar
Williamson, M. H. 1978. The ordination of incidence data. Journal of Ecology 66:911920.CrossRefGoogle Scholar