Comparison of Component Models in Analysing the Distribution of Dialectal Features

10 - Comparison of Component Models in Analysing the Distribution of Dialectal Features

Published online by Cambridge University Press: 12 September 2012

Antti Leino and

Saara Hyvönen

Edited by

John Nerbonne ,

Charlotte Gooskens ,

Sebastian Kürschner and

Renée van Bezooijen

Show author details

Antti Leino: Affiliation:
linguistic research institute
Saara Hyvönen: Affiliation:
Helsinki University
John Nerbonne: Affiliation:
University of Groningen
Charlotte Gooskens: Affiliation:
University of Groningen
Sebastian Kürschner: Affiliation:
Friedrich-Alexander-Universität Erlangen-Nürnberg
Renée van Bezooijen: Affiliation:
University of Groningen

Book contents

Get access

Summary

Abstract Component models such as factor analysis can be used to analyse spatial distributions of a large number of different features – for instance the isogloss data in a dialect atlas, or the distributions of ethnological or archaeological phenomena – with the goal of finding dialects or similar cultural aggregates. However, there are several such methods, and it is not obvious how their differences affect their usability for computational dialectology. We attempt to tackle this question by comparing five such methods using two different dialectological data sets. There are some fundamental differences between these methods, and some of these have implications that affect the dialectological interpretation of the results.

INTRODUCTION

Languages are traditionally subdivided into geographically distinct dialects, although any such division is just a coarse approximation of a more fine-grained variation. This underlying variation is usually visualised in the form of maps, where the distribution of various features is shown as isoglosses. It is possible to view dialectal regions, in this paper also called simply dialects, as combinations of the distribution areas of these features, where the features have been weighted in such a way that the differences between the resulting dialects are as sharp as possible. Ideally, dialect borders are drawn where several isoglosses overlap.

As more and more dialectological data is available in electronic form, it is becoming increasingly attractive to apply computational methods to this problem. One way to do this is to use clustering methods (e.g. Kaufman and Rousseeuw, 1990), especially as such methods have been used in dialectometric studies (e.g. Heeringa and Nerbonne, 2002; Moisl and Jones, 2005).

Information

Type: Chapter
Information: Computing and Language Variation
International Journal of Humanities and Arts Computing Volume 2
, pp. 173 - 188

Publisher: Edinburgh University Press

Print publication year: 2009

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Accessibility standard: Unknown

Why this information is here

This section outlines the accessibility features of this content - including support for screen readers, full keyboard navigation and high-contrast display options. This may not be relevant for you.

Accessibility Information

Accessibility compliance for the PDF of this book is currently unknown and may be updated in the future.