Skip to main content Accessibility help
×
Home

A statistical method for the identification and aggregation of regional linguistic variation

Published online by Cambridge University Press:  05 August 2011

Jack Grieve
Affiliation:
University of Leuven
Dirk Speelman
Affiliation:
University of Leuven
Dirk Geeraerts
Affiliation:
University of Leuven

Abstract

This paper introduces a method for the analysis of regional linguistic variation. The method identifies individual and common patterns of spatial clustering in a set of linguistic variables measured over a set of locations based on a combination of three statistical techniques: spatial autocorrelation, factor analysis, and cluster analysis. To demonstrate how to apply this method, it is used to analyze regional variation in the values of 40 continuously measured, high-frequency lexical alternation variables in a 26-million-word corpus of letters to the editor representing 206 cities from across the United States.

Type
Research Article
Copyright
Copyright © Cambridge University Press 2011

Access options

Get access to the full version of this content by using one of the access options below.

References

Allen, Harold B. (1973). The linguistic atlas of the Upper Midwest. Minneapolis: University of Minnesota Press.Google Scholar
Biber, Douglas. (1989). A typology of English texts. Language 27:343.Google Scholar
Bloch, Bernard. (1971). Postvocalic r in New England Speech, a study in American dialect geography. In Allen, H. B. & Underwood, G. N., (eds.), Readings in American dialectology. New York: Appleton Century Croft Meredith Corporation.Google Scholar
Carver, Craig. (1987). American regional dialects. Ann Arbor: University of Michigan Press.CrossRefGoogle Scholar
Chambers, Jack, & Trudgill, Peter. (1998). Dialectology. 2nd ed.Cambridge, UK: Cambridge University Press.CrossRefGoogle Scholar
Cliff, A. D., & Ord, J. K. (1973). Spatial autocorrelation. London: Pion.Google Scholar
Cliff, A. D., & Ord, J. K. (1981). Spatial processes: Models and applications. London: Pion.Google Scholar
Davis, Lawrence M., & Houck, Charles L. (1992). Is there a Midland dialect area? American Speech 67:6170.CrossRefGoogle Scholar
Geeraerts, Dirk, Grondelaers, Stefan, & Bakema, Peter. (1994). The structure of lexical variation: Meaning, naming and context. Berlin: Mouton de Gruter.CrossRefGoogle Scholar
Goebl, Hans. (1982). Dialektometrie: Prinzipien und methoden des einsatzes der numerischen taxonomie im bereich der dialektgeographie. Vienna: Verlag der Osterreichischen Akademie der Wissenschaften.Google Scholar
Goebl, Hans. (1984). Dialektometrische studien: Anhand italoromanischer, rätoromanischer und galloromanischer Sprachmaterialien aus AIS und ALF. Tübingen: Niemeyer.Google Scholar
Goebl, Hans. (2006). Recent advances in Salzburg dialectometry. Literary and Linguistic Computing 21:411435.CrossRefGoogle Scholar
Goebl, Hans. (2007). On the geolinguistic change in Northern France between 1300 and 1900: A dialectometrical inquiry. In Nerbonne, J., Ellison, T. M., & Kondrak, G. (eds.), Computing and historical phonology: Proceedings of the Ninth Meeting of the ACL Special Interest Group in Computational Morphology and Phonology. Association for Computational Linguistics 7583.Google Scholar
Grieve, Jack. (2009). A corpus-based regional dialect survey of grammatical variation in written Standard American English. Ph.D. dissertation, Northern Arizona University.Google Scholar
Hair, Joseph, Black, Bill, Babin, Barry, Anderson, Rolph E., & Tatham, Ronald L. (2006). Multivariate data analysis. 6th ed.Englewood Cliffs, NJ: Prentice-Hall.Google Scholar
Heeringa, Wilbert. (2004). Measuring dialect pronunciation differences using Levenshtein distance. Ph.D. dissertation, University of Groningen.Google Scholar
Inhalainen, et al. (1987) refd in text.Google Scholar
Ihalainen, Ossi. 1988. Creating linguistic databases from machine-readable dialect texts. In Thomas, A. (ed), Methods in dialectology. Clevedon, UK: Multilingual Matters. 569584.Google Scholar
Ihalainen, Ossi. (1990). A source of data for the study of English dialect syntax: the Helsinki Corpus. In Aarts, J. & Meijs W, W. (eds.), Theory and practice in corpus linguistics. Amsterdam: Rodopi. 83103.Google Scholar
Inhalainen, Ossi. (1991). A point of verb syntax in south-western British English: An analysis of a dialect continuum. In Aijmer, K. & Altenberg, B. (eds.), English corpus linguistics: Studies in honour of Jan Svartvik. London: Longman. 290302.Google Scholar
Kortmann, Bernd, Herrmann, Tanja, Pietsch, Lukas, & Wagner, Susanne. (2005). A comparative grammar of British English dialects. Berlin: Mouton/de Gruyter.CrossRefGoogle Scholar
Kretzschmar, William. (1992). Isoglosses and predictive modeling. American Speech 67:227249.CrossRefGoogle Scholar
Kretzschmar, William. (1996). Quantitative areal analysis of dialect features. Language Variation and Change 8:1339.CrossRefGoogle Scholar
Kretzschmar, William. (2003). Mapping Southern English. American Speech 78:130149.CrossRefGoogle Scholar
Kurath, Hans. (1949). Word geography of the eastern United States. University of Michigan Press.Google Scholar
Labov, William. (1966a). The social stratification of English in New York City. Washington, DC: Center for Applied Linguistics.Google Scholar
Labov, William. (1966b). The linguistic variable as a structural unit. Washington Linguistics Review 3:422.Google Scholar
Labov, William. (1972). Sociolinguistic patterns. Philadelphia: University of Pennsylvania Press.Google Scholar
Labov, William, Ash, Sharon, & Boberg, Charles. (2006). Atlas of North American English: Phonetics, phonology, and sound change. New York: Mouton de Gruyter.CrossRefGoogle Scholar
Lee, Jay, & Kretzschmar, William. (1993). Spatial analysis of linguistic data with GIS functions. International Journal of Geographical Information Systems 7:541560.CrossRefGoogle Scholar
Marckwardt, Albert H. (1957). Principal and subsidiary dialect areas in the North Central states. PADS 27:315.Google Scholar
Moran, Patrick A. P. (1948). The interpretation of statistical maps. Journal of the Royal Statistical Society, Series B 37:243251.Google Scholar
Nerbonne, John. (2006). Identifying linguistic structure in aggregate comparison. Literary and Linguistic Computing 21:463476.CrossRefGoogle Scholar
Nerbonne, John, & Heeringa, Wilbert. (2009). Measuring dialect differences. In Schmidt, J. E. & Auer, P. (eds), Language and space: Theories and methods. Berlin: Mouton De Gruyter. 550567.Google Scholar
Nerbonne, John, & Kleiweg, Peter. (2003). Lexical distance in LAMSAS. Computers and the Humanities 37:339357.CrossRefGoogle Scholar
Nerbonne, John, & Kleiweg, Peter. (2007). Toward a dialectological yardstick. Journal of Quantitative Linguistics 14:148166.CrossRefGoogle Scholar
Nerbonne, John, & Kretschmar, William. (2003). Introducing computational methods in dialectometry. Computers and the Humanities 37:245255.CrossRefGoogle Scholar
Nerbonne, John, & Kretschmar, William. (2006). Progress in dialectometry: Toward explanation. Literary and Linguistic Computing 21:387397.CrossRefGoogle Scholar
Odland, John D. (1988). Spatial autocorrelation. Thousand Oaks, CA: Sage Publications.Google Scholar
Ord, J. K., & Getis, Arthur. (1995). Local spatial autocorrelation statistics: Distributional issues and an application. Geographical Analysis 27:286306.CrossRefGoogle Scholar
Pederson, L. (1986). Linguistic atlas of the Gulf states. Athens, GA: University of Georgia Press.Google Scholar
Perry, M. J. (2003). State to state migration flows: 1995 to 2000. Census 2000 Special Reports. CENSR-8. Available at: http://www.census.gov/prod/2003pubs/censr-8.pdf.Google Scholar
Preston, Dennis. (2002). Language with attitude. In Chambers, J., Trudgill, P., & Schilling-Estes, N. (eds.), The handbook of language variation and change. Malden, MA: Blackwell. 4066.Google Scholar
Prokic, Jenna, & Nerbonne, John. (2008). Recognizing groups among dialects. International Journal of Humanities and Arts Computing 1:153172.CrossRefGoogle Scholar
Rumpf, Jonas, Pickl, Simon, Elspass, Stephan, Koenig, Werner, & Schmidt, Volker. (2009). Structural analysis of dialect maps using methods from spatial statistics. Zeitschrift für Dialektologie und Linguistik 76:280308.Google Scholar
Rumpf, Jonas, Pickl, Simon, Elspass, Stephan, Koenig, Werner, & Schmidt, Volker. (2010). Quantification and statistical analysis of structural similarities in dialectological area-class maps. Dialectologia et Geolinguistica 18:73100.CrossRefGoogle Scholar
Schneider, Edgar. (2002). Investigating variation and change in written documents. In Chambers, J., Trudgill, P., & Schilling-Estes, N. (eds.), The handbook of language variation and change. London: Blackwell.Google Scholar
Séguy, Jean. (1971). La relation entre la distance spatiale et la distance lexicale. Revue de linguistique romane 35:335357.Google Scholar
Séguy, Jean. (1973a). Atlas linguistique et ethnographique de la Gascogne. Vol. 6. Paris: Centre national de la recherché scientifique.Google Scholar
Séguy, Jean. (1973b). La dialectometrie dans l'Atlas linguistique de la Gascogne. Revue de linguistique romane 37:124.Google Scholar
Shackleton, Robert G. (2005). English-American speech relationships: A quantitative approach. Journal of English Linguistics 33:99160.CrossRefGoogle Scholar
Sinnott, R. W. (1984). Virtues of the Haversine. Sky and Telescope 68:159.Google Scholar
Speelman, Dirk, Grondelaers, Stefan, & Geeraerts, Dirk. (2003). Computers and the Humanities 37:317337.CrossRefGoogle Scholar
Szmrecsanyi, Benedikt. (2008). Corpus-based dialectometry: Aggregate morphosyntactic variability in British English dialects. International Journal of Humanities and Arts Computing. 279296.CrossRefGoogle Scholar
Tabachnick, Barbara G., & Fidell, Linda S. (2007). Using multivariate statistics. 5th ed.Boston: Allyn and Bacon.Google Scholar
U. S. Census Bureau. (2005). State of residence in 2000 by state of birth. PHC-T-38. Available at: http://www.census.gov/population/www/socdemo/migrate/2000pob.html.Google Scholar
Ward, Joe H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association 58:236244.CrossRefGoogle Scholar
Wieling, Martijn, & Nerbonne, John. (2010). Hierarchical bipartite spectral graph partitioning to cluster dialect varieties and determine their most important linguistic features. Paper presented at: TextGraphs-5 Workshop on Graph-Based Methods for Natural Language Processing 16, July 16, 2010, Uppsala, Sweden. 3341.Google Scholar
Wolfram, Walt. (1969). A sociolinguistic description of Detroit Negro speech. Washington, DC: Center for Applied Linguistics.Google Scholar
Wolfram, Walt. (1991). The linguistic variable: Fact and fantasy. American Speech 66:2232.CrossRefGoogle Scholar
Wolfram, Walt. (1993). Indentifying and interpreting variables. In Preston, D. (ed.), American dialect research. Philadelphia: John Benjamins. 193221.CrossRefGoogle Scholar
Wolfram, Walt, & Schilling-Estes, Natalie. (2006). American English: Dialects and variation. 2nd ed.Cambridge/Oxford: Basil Blackwell.Google Scholar
Zelinsky, Wilbur. (1973). Cultural geography of the United States. Englewood Cliffs, NJ: Prentice-Hall.Google Scholar

Grieve supplementary material

Maps.pdf

PDF 42 MB

Altmetric attention score

Full text views

Full text views reflects PDF downloads, PDFs sent to Google Drive, Dropbox and Kindle and HTML full text views.

Total number of HTML views: 60
Total number of PDF views: 287 *
View data table for this chart

* Views captured on Cambridge Core between September 2016 - 16th January 2021. This data will be updated every 24 hours.

Hostname: page-component-77fc7d77f9-94bw7 Total loading time: 0.312 Render date: 2021-01-16T05:45:22.269Z Query parameters: { "hasAccess": "0", "openAccess": "0", "isLogged": "0", "lang": "en" } Feature Flags last update: Sat Jan 16 2021 04:51:52 GMT+0000 (Coordinated Universal Time) Feature Flags: { "metrics": true, "metricsAbstractViews": false, "peerReview": true, "crossMark": true, "comments": true, "relatedCommentaries": true, "subject": true, "clr": true, "languageSwitch": true, "figures": false, "newCiteModal": false, "shouldUseShareProductTool": true, "shouldUseHypothesis": true, "isUnsiloEnabled": true }

Send article to Kindle

To send this article to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

Note you can select to send to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

A statistical method for the identification and aggregation of regional linguistic variation
Available formats
×

Send article to Dropbox

To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

A statistical method for the identification and aggregation of regional linguistic variation
Available formats
×

Send article to Google Drive

To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

A statistical method for the identification and aggregation of regional linguistic variation
Available formats
×
×

Reply to: Submit a response


Your details


Conflicting interests

Do you have any conflicting interests? *