Hostname: page-component-89b8bd64d-shngb Total loading time: 0 Render date: 2026-05-11T07:47:43.185Z Has data issue: false hasContentIssue false

Linguistic Distances in Dialectometric Intensity Estimation

Published online by Cambridge University Press:  04 July 2014

Simon Pickl*
Affiliation:
University of Salzburg
Aaron Spettl
Affiliation:
Ulm University
Simon Pröll
Affiliation:
University of Augsburg
Stephan Elspaß
Affiliation:
University of Salzburg
Werner König
Affiliation:
University of Augsburg
Volker Schmidt
Affiliation:
Ulm University
*
* Address for correspondence: Simon Pickl, Fachbereich Germanistik, Universität Salzburg, Erzabt-Klotz-Str. 1, 5020 Salzburg, Austria, +43 (0)662 8044 4359. Email simon.pickl@sbg.ac.at
Rights & Permissions [Opens in a new window]

Abstract

Dialectometric intensity estimation as introduced in Rumpf etal. (2009) and Pickl and Rumpf (2011, 2012) is a method for the unsupervised generation of maps visualizing geolinguistic data on the level of linguistic variables. It also extracts spatial information for subsequent statistical analysis. However, as intensity estimation involves geographically conditioned smoothing, this method can lead to undesirable results. Geolinguistically relevant structures such as rivers, political borders or enclaves, for instance, are not taken into account and thus their manifestations in the distributions of linguistic variants are blurred. A possible solution to this problem, as suggested and put to the test in this paper, is to use linguistic distances rather than geographical (Euclidean) distances in the estimation. This methodological adjustment leads to maps which render geolinguistic distributions more faithfully, especially in areas that are deemed critical for the interpretation of the resulting maps and for subsequent statistical analyses of the results.

Information

Type
Articles
Copyright
Copyright © Cambridge University Press 2014 
Figure 0

Map 1 Map 63 from vol. 8 of the SBS, showing the distribution of the variants for ‘woodlouse’. (a) Point-symbol map; (b) Legend.

Figure 1

Map 2 Map 63 from vol. 8 of the SBS, intensity estimation, geographical distance, Level 2, K3, h=22 km (CL).4

Figure 2

Map 3 Shape of the river Lech in the area under investigation in Voronoi rendering.

Figure 3

Figure 1 Scatter plot of geographical vs. lexical distances and logarithmic regression curve. In the visualization on the right, the location pairs that are separated by the river Lech are highlighted.

Figure 4

Figure 2 Scatter plots of geographical vs. linguistic (from top to bottom: Morphological, phonetic, all) distances and logarithmic regression curves. In the visualizations on the right, the location pairs that are separated by the river Lech are highlighted.

Figure 5

Map 4 Map 63 for vol. 8 of the SBS, intensity estimation, lexical distance, Level 2, K3, h=0.55 (CL).

Figure 6

Figure 3 Average scores for lexical and geographical distances in an application of intensity estimation to 736 lexical maps across all locations and all variables (lower is better).

Figure 7

Map 5 Average scores for linguistic (blue) and geographical (red) distances in an application of intensity estimation to 736 lexical maps across all locations. The respective lower value (linguistic or geographical) was colour-coded at the individual locations, the colour intensity being higher for scores with a larger advantage over the respective other one. (a) KGauß, LCV; (b) KGauß, CL; (c) K3, LCV; (d) K3, CL.

Figure 8

Figure 4 Average scores for linguistic distances (dlex etc.) in an application of intensity estimation to the respective test corpus (lex etc.) across all locations and all variables (lower is better).

Figure 9

Figure 5 Average scores for linguistic distances (dlex etc.) in an application of intensity estimation to the lexical subcorpus (lex) across all locations and all variables (lower is better).

Figure 10

Figure 6 Average scores for linguistic distances (dlex etc.) in an application of intensity estimation to the all test corpora (lex etc.) across all locations and all variables (lower is better).