Hostname: page-component-848d4c4894-r5zm4 Total loading time: 0 Render date: 2024-06-15T08:21:16.923Z Has data issue: false hasContentIssue false

A statistical method for the identification and aggregation of regional linguistic variation

Published online by Cambridge University Press:  05 August 2011

Jack Grieve
University of Leuven
Dirk Speelman
University of Leuven
Dirk Geeraerts
University of Leuven


This paper introduces a method for the analysis of regional linguistic variation. The method identifies individual and common patterns of spatial clustering in a set of linguistic variables measured over a set of locations based on a combination of three statistical techniques: spatial autocorrelation, factor analysis, and cluster analysis. To demonstrate how to apply this method, it is used to analyze regional variation in the values of 40 continuously measured, high-frequency lexical alternation variables in a 26-million-word corpus of letters to the editor representing 206 cities from across the United States.

Research Article
Copyright © Cambridge University Press 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)



Allen, Harold B. (1973). The linguistic atlas of the Upper Midwest. Minneapolis: University of Minnesota Press.Google Scholar
Biber, Douglas. (1989). A typology of English texts. Language 27:343.Google Scholar
Bloch, Bernard. (1971). Postvocalic r in New England Speech, a study in American dialect geography. In Allen, H. B. & Underwood, G. N., (eds.), Readings in American dialectology. New York: Appleton Century Croft Meredith Corporation.Google Scholar
Carver, Craig. (1987). American regional dialects. Ann Arbor: University of Michigan Press.CrossRefGoogle Scholar
Chambers, Jack, & Trudgill, Peter. (1998). Dialectology. 2nd ed.Cambridge, UK: Cambridge University Press.CrossRefGoogle Scholar
Cliff, A. D., & Ord, J. K. (1973). Spatial autocorrelation. London: Pion.Google Scholar
Cliff, A. D., & Ord, J. K. (1981). Spatial processes: Models and applications. London: Pion.Google Scholar
Davis, Lawrence M., & Houck, Charles L. (1992). Is there a Midland dialect area? American Speech 67:6170.CrossRefGoogle Scholar
Geeraerts, Dirk, Grondelaers, Stefan, & Bakema, Peter. (1994). The structure of lexical variation: Meaning, naming and context. Berlin: Mouton de Gruter.CrossRefGoogle Scholar
Goebl, Hans. (1982). Dialektometrie: Prinzipien und methoden des einsatzes der numerischen taxonomie im bereich der dialektgeographie. Vienna: Verlag der Osterreichischen Akademie der Wissenschaften.Google Scholar
Goebl, Hans. (1984). Dialektometrische studien: Anhand italoromanischer, rätoromanischer und galloromanischer Sprachmaterialien aus AIS und ALF. Tübingen: Niemeyer.Google Scholar
Goebl, Hans. (2006). Recent advances in Salzburg dialectometry. Literary and Linguistic Computing 21:411435.CrossRefGoogle Scholar
Goebl, Hans. (2007). On the geolinguistic change in Northern France between 1300 and 1900: A dialectometrical inquiry. In Nerbonne, J., Ellison, T. M., & Kondrak, G. (eds.), Computing and historical phonology: Proceedings of the Ninth Meeting of the ACL Special Interest Group in Computational Morphology and Phonology. Association for Computational Linguistics 7583.Google Scholar
Grieve, Jack. (2009). A corpus-based regional dialect survey of grammatical variation in written Standard American English. Ph.D. dissertation, Northern Arizona University.Google Scholar
Hair, Joseph, Black, Bill, Babin, Barry, Anderson, Rolph E., & Tatham, Ronald L. (2006). Multivariate data analysis. 6th ed.Englewood Cliffs, NJ: Prentice-Hall.Google Scholar
Heeringa, Wilbert. (2004). Measuring dialect pronunciation differences using Levenshtein distance. Ph.D. dissertation, University of Groningen.Google Scholar
Inhalainen, et al. (1987) cited in text.Google Scholar
Ihalainen, Ossi. 1988. Creating linguistic databases from machine-readable dialect texts. In Thomas, A. (ed), Methods in dialectology. Clevedon, UK: Multilingual Matters. 569584.Google Scholar
Ihalainen, Ossi. (1990). A source of data for the study of English dialect syntax: the Helsinki Corpus. In Aarts, J. & Meijs W, W. (eds.), Theory and practice in corpus linguistics. Amsterdam: Rodopi. 83103.Google Scholar
Inhalainen, Ossi. (1991). A point of verb syntax in south-western British English: An analysis of a dialect continuum. In Aijmer, K. & Altenberg, B. (eds.), English corpus linguistics: Studies in honour of Jan Svartvik. London: Longman. 290302.Google Scholar
Kortmann, Bernd, Herrmann, Tanja, Pietsch, Lukas, & Wagner, Susanne. (2005). A comparative grammar of British English dialects. Berlin: Mouton/de Gruyter.Google Scholar
Kretzschmar, William. (1992). Isoglosses and predictive modeling. American Speech 67:227249.Google Scholar
Kretzschmar, William. (1996). Quantitative areal analysis of dialect features. Language Variation and Change 8:1339.CrossRefGoogle Scholar
Kretzschmar, William. (2003). Mapping Southern English. American Speech 78:130149.Google Scholar
Kurath, Hans. (1949). Word geography of the eastern United States. University of Michigan Press.Google Scholar
Labov, William. (1966a). The social stratification of English in New York City. Washington, DC: Center for Applied Linguistics.Google Scholar
Labov, William. (1966b). The linguistic variable as a structural unit. Washington Linguistics Review 3:422.Google Scholar
Labov, William. (1972). Sociolinguistic patterns. Philadelphia: University of Pennsylvania Press.Google Scholar
Labov, William, Ash, Sharon, & Boberg, Charles. (2006). Atlas of North American English: Phonetics, phonology, and sound change. New York: Mouton de Gruyter.Google Scholar
Lee, Jay, & Kretzschmar, William. (1993). Spatial analysis of linguistic data with GIS functions. International Journal of Geographical Information Systems 7:541560.CrossRefGoogle Scholar
Marckwardt, Albert H. (1957). Principal and subsidiary dialect areas in the North Central states. PADS 27:315.CrossRefGoogle Scholar
Moran, Patrick A. P. (1948). The interpretation of statistical maps. Journal of the Royal Statistical Society, Series B 37:243251.Google Scholar
Nerbonne, John. (2006). Identifying linguistic structure in aggregate comparison. Literary and Linguistic Computing 21:463476.Google Scholar
Nerbonne, John, & Heeringa, Wilbert. (2009). Measuring dialect differences. In Schmidt, J. E. & Auer, P. (eds), Language and space: Theories and methods. Berlin: Mouton De Gruyter. 550567.Google Scholar
Nerbonne, John, & Kleiweg, Peter. (2003). Lexical distance in LAMSAS. Computers and the Humanities 37:339357.Google Scholar
Nerbonne, John, & Kleiweg, Peter. (2007). Toward a dialectological yardstick. Journal of Quantitative Linguistics 14:148166.Google Scholar
Nerbonne, John, & Kretschmar, William. (2003). Introducing computational methods in dialectometry. Computers and the Humanities 37:245255.CrossRefGoogle Scholar
Nerbonne, John, & Kretschmar, William. (2006). Progress in dialectometry: Toward explanation. Literary and Linguistic Computing 21:387397.Google Scholar
Odland, John D. (1988). Spatial autocorrelation. Thousand Oaks, CA: Sage Publications.Google Scholar
Ord, J. K., & Getis, Arthur. (1995). Local spatial autocorrelation statistics: Distributional issues and an application. Geographical Analysis 27:286306.CrossRefGoogle Scholar
Pederson, L. (1986). Linguistic atlas of the Gulf states. Athens, GA: University of Georgia Press.Google Scholar
Perry, M. J. (2003). State to state migration flows: 1995 to 2000. Census 2000 Special Reports. CENSR-8. Available at: Scholar
Preston, Dennis. (2002). Language with attitude. In Chambers, J., Trudgill, P., & Schilling-Estes, N. (eds.), The handbook of language variation and change. Malden, MA: Blackwell. 4066.Google Scholar
Prokic, Jenna, & Nerbonne, John. (2008). Recognizing groups among dialects. International Journal of Humanities and Arts Computing 1:153172.CrossRefGoogle Scholar
Rumpf, Jonas, Pickl, Simon, Elspass, Stephan, Koenig, Werner, & Schmidt, Volker. (2009). Structural analysis of dialect maps using methods from spatial statistics. Zeitschrift für Dialektologie und Linguistik 76:280308.CrossRefGoogle Scholar
Rumpf, Jonas, Pickl, Simon, Elspass, Stephan, Koenig, Werner, & Schmidt, Volker. (2010). Quantification and statistical analysis of structural similarities in dialectological area-class maps. Dialectologia et Geolinguistica 18:73100.CrossRefGoogle Scholar
Schneider, Edgar. (2002). Investigating variation and change in written documents. In Chambers, J., Trudgill, P., & Schilling-Estes, N. (eds.), The handbook of language variation and change. London: Blackwell.Google Scholar
Séguy, Jean. (1971). La relation entre la distance spatiale et la distance lexicale. Revue de linguistique romane 35:335357.Google Scholar
Séguy, Jean. (1973a). Atlas linguistique et ethnographique de la Gascogne. Vol. 6. Paris: Centre national de la recherché scientifique.Google Scholar
Séguy, Jean. (1973b). La dialectometrie dans l'Atlas linguistique de la Gascogne. Revue de linguistique romane 37:124.Google Scholar
Shackleton, Robert G. (2005). English-American speech relationships: A quantitative approach. Journal of English Linguistics 33:99160.CrossRefGoogle Scholar
Sinnott, R. W. (1984). Virtues of the Haversine. Sky and Telescope 68:159.Google Scholar
Speelman, Dirk, Grondelaers, Stefan, & Geeraerts, Dirk. (2003). Computers and the Humanities 37:317337.Google Scholar
Szmrecsanyi, Benedikt. (2008). Corpus-based dialectometry: Aggregate morphosyntactic variability in British English dialects. International Journal of Humanities and Arts Computing. 279296.Google Scholar
Tabachnick, Barbara G., & Fidell, Linda S. (2007). Using multivariate statistics. 5th ed.Boston: Allyn and Bacon.Google Scholar
U. S. Census Bureau. (2005). State of residence in 2000 by state of birth. PHC-T-38. Available at: Scholar
Ward, Joe H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association 58:236244.Google Scholar
Wieling, Martijn, & Nerbonne, John. (2010). Hierarchical bipartite spectral graph partitioning to cluster dialect varieties and determine their most important linguistic features. Paper presented at: TextGraphs-5 Workshop on Graph-Based Methods for Natural Language Processing 16, July 16, 2010, Uppsala, Sweden. 3341.Google Scholar
Wolfram, Walt. (1969). A sociolinguistic description of Detroit Negro speech. Washington, DC: Center for Applied Linguistics.Google Scholar
Wolfram, Walt. (1991). The linguistic variable: Fact and fantasy. American Speech 66:2232.CrossRefGoogle Scholar
Wolfram, Walt. (1993). Indentifying and interpreting variables. In Preston, D. (ed.), American dialect research. Philadelphia: John Benjamins. 193221.Google Scholar
Wolfram, Walt, & Schilling-Estes, Natalie. (2006). American English: Dialects and variation. 2nd ed.Cambridge/Oxford: Basil Blackwell.Google Scholar
Zelinsky, Wilbur. (1973). Cultural geography of the United States. Englewood Cliffs, NJ: Prentice-Hall.Google Scholar
Supplementary material: PDF

Grieve supplementary material


Download Grieve supplementary material(PDF)
PDF 42.2 MB