Book contents
- Frontmatter
- Contents
- From the Editors
- Notes on Contributors
- 1 Introduction: Language Variation Studies and Computational Humanities
- 2 Panel Discussion on Computing and the Humanities
- 3 Making Sense of Strange Sounds: (Mutual) Intelligibility of Related Language Varieties. A Review
- 4 Phonetic and Lexical Predictors of Intelligibility
- 5 Linguistic Determinants of the Intelligibility of Swedish Words among Danes
- 6 Mutual Intelligibility of Standard and Regional Dutch Language Varieties
- 7 The Dutch-German Border: Relating Linguistic, Geographic and Social Distances
- 8 The Space of Tuscan Dialectal Variation: A Correlation Study
- 9 Recognising Groups among Dialects
- 10 Comparison of Component Models in Analysing the Distribution of Dialectal Features
- 11 Factor Analysis of Vowel Pronunciation in Swedish Dialects
- 12 Representing Tone in Levenshtein Distance
- 13 The Role of Concept Characteristics in Lexical Dialectometry
- 14 What Role does Dialect Knowledge Play in the Perception of Linguistic Distances?
- 15 Quantifying Dialect Similarity by Comparison of the Lexical Distribution of Phonemes
- 16 Corpus-based Dialectometry: Aggregate Morphosyntactic Variability in British English Dialects
10 - Comparison of Component Models in Analysing the Distribution of Dialectal Features
Published online by Cambridge University Press: 12 September 2012
- Frontmatter
- Contents
- From the Editors
- Notes on Contributors
- 1 Introduction: Language Variation Studies and Computational Humanities
- 2 Panel Discussion on Computing and the Humanities
- 3 Making Sense of Strange Sounds: (Mutual) Intelligibility of Related Language Varieties. A Review
- 4 Phonetic and Lexical Predictors of Intelligibility
- 5 Linguistic Determinants of the Intelligibility of Swedish Words among Danes
- 6 Mutual Intelligibility of Standard and Regional Dutch Language Varieties
- 7 The Dutch-German Border: Relating Linguistic, Geographic and Social Distances
- 8 The Space of Tuscan Dialectal Variation: A Correlation Study
- 9 Recognising Groups among Dialects
- 10 Comparison of Component Models in Analysing the Distribution of Dialectal Features
- 11 Factor Analysis of Vowel Pronunciation in Swedish Dialects
- 12 Representing Tone in Levenshtein Distance
- 13 The Role of Concept Characteristics in Lexical Dialectometry
- 14 What Role does Dialect Knowledge Play in the Perception of Linguistic Distances?
- 15 Quantifying Dialect Similarity by Comparison of the Lexical Distribution of Phonemes
- 16 Corpus-based Dialectometry: Aggregate Morphosyntactic Variability in British English Dialects
Summary
Abstract Component models such as factor analysis can be used to analyse spatial distributions of a large number of different features – for instance the isogloss data in a dialect atlas, or the distributions of ethnological or archaeological phenomena – with the goal of finding dialects or similar cultural aggregates. However, there are several such methods, and it is not obvious how their differences affect their usability for computational dialectology. We attempt to tackle this question by comparing five such methods using two different dialectological data sets. There are some fundamental differences between these methods, and some of these have implications that affect the dialectological interpretation of the results.
INTRODUCTION
Languages are traditionally subdivided into geographically distinct dialects, although any such division is just a coarse approximation of a more fine-grained variation. This underlying variation is usually visualised in the form of maps, where the distribution of various features is shown as isoglosses. It is possible to view dialectal regions, in this paper also called simply dialects, as combinations of the distribution areas of these features, where the features have been weighted in such a way that the differences between the resulting dialects are as sharp as possible. Ideally, dialect borders are drawn where several isoglosses overlap.
As more and more dialectological data is available in electronic form, it is becoming increasingly attractive to apply computational methods to this problem. One way to do this is to use clustering methods (e.g. Kaufman and Rousseeuw, 1990), especially as such methods have been used in dialectometric studies (e.g. Heeringa and Nerbonne, 2002; Moisl and Jones, 2005).
- Type
- Chapter
- Information
- Computing and Language VariationInternational Journal of Humanities and Arts Computing Volume 2, pp. 173 - 188Publisher: Edinburgh University PressPrint publication year: 2009