Skip to main content
×
×
Home

A statistical method for the identification and aggregation of regional linguistic variation

  • Jack Grieve (a1), Dirk Speelman (a1) and Dirk Geeraerts (a1)
Abstract

This paper introduces a method for the analysis of regional linguistic variation. The method identifies individual and common patterns of spatial clustering in a set of linguistic variables measured over a set of locations based on a combination of three statistical techniques: spatial autocorrelation, factor analysis, and cluster analysis. To demonstrate how to apply this method, it is used to analyze regional variation in the values of 40 continuously measured, high-frequency lexical alternation variables in a 26-million-word corpus of letters to the editor representing 206 cities from across the United States.

Copyright
References
Hide All
Allen, Harold B. (1973). The linguistic atlas of the Upper Midwest. Minneapolis: University of Minnesota Press.
Biber, Douglas. (1989). A typology of English texts. Language 27:343.
Bloch, Bernard. (1971). Postvocalic r in New England Speech, a study in American dialect geography. In Allen, H. B. & Underwood, G. N., (eds.), Readings in American dialectology. New York: Appleton Century Croft Meredith Corporation.
Carver, Craig. (1987). American regional dialects. Ann Arbor: University of Michigan Press.
Chambers, Jack, & Trudgill, Peter. (1998). Dialectology. 2nd ed.Cambridge, UK: Cambridge University Press.
Cliff, A. D., & Ord, J. K. (1973). Spatial autocorrelation. London: Pion.
Cliff, A. D., & Ord, J. K. (1981). Spatial processes: Models and applications. London: Pion.
Davis, Lawrence M., & Houck, Charles L. (1992). Is there a Midland dialect area? American Speech 67:6170.
Geeraerts, Dirk, Grondelaers, Stefan, & Bakema, Peter. (1994). The structure of lexical variation: Meaning, naming and context. Berlin: Mouton de Gruter.
Goebl, Hans. (1982). Dialektometrie: Prinzipien und methoden des einsatzes der numerischen taxonomie im bereich der dialektgeographie. Vienna: Verlag der Osterreichischen Akademie der Wissenschaften.
Goebl, Hans. (1984). Dialektometrische studien: Anhand italoromanischer, rätoromanischer und galloromanischer Sprachmaterialien aus AIS und ALF. Tübingen: Niemeyer.
Goebl, Hans. (2006). Recent advances in Salzburg dialectometry. Literary and Linguistic Computing 21:411435.
Goebl, Hans. (2007). On the geolinguistic change in Northern France between 1300 and 1900: A dialectometrical inquiry. In Nerbonne, J., Ellison, T. M., & Kondrak, G. (eds.), Computing and historical phonology: Proceedings of the Ninth Meeting of the ACL Special Interest Group in Computational Morphology and Phonology. Association for Computational Linguistics 7583.
Grieve, Jack. (2009). A corpus-based regional dialect survey of grammatical variation in written Standard American English. Ph.D. dissertation, Northern Arizona University.
Hair, Joseph, Black, Bill, Babin, Barry, Anderson, Rolph E., & Tatham, Ronald L. (2006). Multivariate data analysis. 6th ed.Englewood Cliffs, NJ: Prentice-Hall.
Heeringa, Wilbert. (2004). Measuring dialect pronunciation differences using Levenshtein distance. Ph.D. dissertation, University of Groningen.
Inhalainen, et al. (1987) cited in text.
Ihalainen, Ossi. 1988. Creating linguistic databases from machine-readable dialect texts. In Thomas, A. (ed), Methods in dialectology. Clevedon, UK: Multilingual Matters. 569584.
Ihalainen, Ossi. (1990). A source of data for the study of English dialect syntax: the Helsinki Corpus. In Aarts, J. & Meijs W, W. (eds.), Theory and practice in corpus linguistics. Amsterdam: Rodopi. 83103.
Inhalainen, Ossi. (1991). A point of verb syntax in south-western British English: An analysis of a dialect continuum. In Aijmer, K. & Altenberg, B. (eds.), English corpus linguistics: Studies in honour of Jan Svartvik. London: Longman. 290302.
Kortmann, Bernd, Herrmann, Tanja, Pietsch, Lukas, & Wagner, Susanne. (2005). A comparative grammar of British English dialects. Berlin: Mouton/de Gruyter.
Kretzschmar, William. (1992). Isoglosses and predictive modeling. American Speech 67:227249.
Kretzschmar, William. (1996). Quantitative areal analysis of dialect features. Language Variation and Change 8:1339.
Kretzschmar, William. (2003). Mapping Southern English. American Speech 78:130149.
Kurath, Hans. (1949). Word geography of the eastern United States. University of Michigan Press.
Labov, William. (1966a). The social stratification of English in New York City. Washington, DC: Center for Applied Linguistics.
Labov, William. (1966b). The linguistic variable as a structural unit. Washington Linguistics Review 3:422.
Labov, William. (1972). Sociolinguistic patterns. Philadelphia: University of Pennsylvania Press.
Labov, William, Ash, Sharon, & Boberg, Charles. (2006). Atlas of North American English: Phonetics, phonology, and sound change. New York: Mouton de Gruyter.
Lee, Jay, & Kretzschmar, William. (1993). Spatial analysis of linguistic data with GIS functions. International Journal of Geographical Information Systems 7:541560.
Marckwardt, Albert H. (1957). Principal and subsidiary dialect areas in the North Central states. PADS 27:315.
Moran, Patrick A. P. (1948). The interpretation of statistical maps. Journal of the Royal Statistical Society, Series B 37:243251.
Nerbonne, John. (2006). Identifying linguistic structure in aggregate comparison. Literary and Linguistic Computing 21:463476.
Nerbonne, John, & Heeringa, Wilbert. (2009). Measuring dialect differences. In Schmidt, J. E. & Auer, P. (eds), Language and space: Theories and methods. Berlin: Mouton De Gruyter. 550567.
Nerbonne, John, & Kleiweg, Peter. (2003). Lexical distance in LAMSAS. Computers and the Humanities 37:339357.
Nerbonne, John, & Kleiweg, Peter. (2007). Toward a dialectological yardstick. Journal of Quantitative Linguistics 14:148166.
Nerbonne, John, & Kretschmar, William. (2003). Introducing computational methods in dialectometry. Computers and the Humanities 37:245255.
Nerbonne, John, & Kretschmar, William. (2006). Progress in dialectometry: Toward explanation. Literary and Linguistic Computing 21:387397.
Odland, John D. (1988). Spatial autocorrelation. Thousand Oaks, CA: Sage Publications.
Ord, J. K., & Getis, Arthur. (1995). Local spatial autocorrelation statistics: Distributional issues and an application. Geographical Analysis 27:286306.
Pederson, L. (1986). Linguistic atlas of the Gulf states. Athens, GA: University of Georgia Press.
Perry, M. J. (2003). State to state migration flows: 1995 to 2000. Census 2000 Special Reports. CENSR-8. Available at: http://www.census.gov/prod/2003pubs/censr-8.pdf.
Preston, Dennis. (2002). Language with attitude. In Chambers, J., Trudgill, P., & Schilling-Estes, N. (eds.), The handbook of language variation and change. Malden, MA: Blackwell. 4066.
Prokic, Jenna, & Nerbonne, John. (2008). Recognizing groups among dialects. International Journal of Humanities and Arts Computing 1:153172.
Rumpf, Jonas, Pickl, Simon, Elspass, Stephan, Koenig, Werner, & Schmidt, Volker. (2009). Structural analysis of dialect maps using methods from spatial statistics. Zeitschrift für Dialektologie und Linguistik 76:280308.
Rumpf, Jonas, Pickl, Simon, Elspass, Stephan, Koenig, Werner, & Schmidt, Volker. (2010). Quantification and statistical analysis of structural similarities in dialectological area-class maps. Dialectologia et Geolinguistica 18:73100.
Schneider, Edgar. (2002). Investigating variation and change in written documents. In Chambers, J., Trudgill, P., & Schilling-Estes, N. (eds.), The handbook of language variation and change. London: Blackwell.
Séguy, Jean. (1971). La relation entre la distance spatiale et la distance lexicale. Revue de linguistique romane 35:335357.
Séguy, Jean. (1973a). Atlas linguistique et ethnographique de la Gascogne. Vol. 6. Paris: Centre national de la recherché scientifique.
Séguy, Jean. (1973b). La dialectometrie dans l'Atlas linguistique de la Gascogne. Revue de linguistique romane 37:124.
Shackleton, Robert G. (2005). English-American speech relationships: A quantitative approach. Journal of English Linguistics 33:99160.
Sinnott, R. W. (1984). Virtues of the Haversine. Sky and Telescope 68:159.
Speelman, Dirk, Grondelaers, Stefan, & Geeraerts, Dirk. (2003). Computers and the Humanities 37:317337.
Szmrecsanyi, Benedikt. (2008). Corpus-based dialectometry: Aggregate morphosyntactic variability in British English dialects. International Journal of Humanities and Arts Computing. 279296.
Tabachnick, Barbara G., & Fidell, Linda S. (2007). Using multivariate statistics. 5th ed.Boston: Allyn and Bacon.
U. S. Census Bureau. (2005). State of residence in 2000 by state of birth. PHC-T-38. Available at: http://www.census.gov/population/www/socdemo/migrate/2000pob.html.
Ward, Joe H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association 58:236244.
Wieling, Martijn, & Nerbonne, John. (2010). Hierarchical bipartite spectral graph partitioning to cluster dialect varieties and determine their most important linguistic features. Paper presented at: TextGraphs-5 Workshop on Graph-Based Methods for Natural Language Processing 16, July 16, 2010, Uppsala, Sweden. 3341.
Wolfram, Walt. (1969). A sociolinguistic description of Detroit Negro speech. Washington, DC: Center for Applied Linguistics.
Wolfram, Walt. (1991). The linguistic variable: Fact and fantasy. American Speech 66:2232.
Wolfram, Walt. (1993). Indentifying and interpreting variables. In Preston, D. (ed.), American dialect research. Philadelphia: John Benjamins. 193221.
Wolfram, Walt, & Schilling-Estes, Natalie. (2006). American English: Dialects and variation. 2nd ed.Cambridge/Oxford: Basil Blackwell.
Zelinsky, Wilbur. (1973). Cultural geography of the United States. Englewood Cliffs, NJ: Prentice-Hall.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Language Variation and Change
  • ISSN: 0954-3945
  • EISSN: 1469-8021
  • URL: /core/journals/language-variation-and-change
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×
Type Description Title
PDF
Supplementary materials

Grieve supplementary material
Maps.pdf

 PDF (42.2 MB)
42.2 MB

Metrics

Altmetric attention score

Full text views

Total number of HTML views: 33
Total number of PDF views: 139 *
Loading metrics...

Abstract views

Total abstract views: 586 *
Loading metrics...

* Views captured on Cambridge Core between September 2016 - 21st July 2018. This data will be updated every 24 hours.