Published online by Cambridge University Press: 16 August 2018
Researchers in dialectometry have begun to explore measurements based onfundamentally quantitative metrics, often sourced from dialect corpora, as analternative to the traditional signals derived from dialect atlases. This changeof data type amplifies an existing issue in the classical paradigm, namely thatlocations may vary in coverage and that this affects the distance measurements:pairs involving a location with lower coverage suffer from greater noise andtherefore imprecision. We propose a method for increasing robustness usinggeneralized additive modeling, a statistical technique that allows leveragingthe spatial arrangement of the data. The technique is applied to data from theBritish English dialect corpus FRED; the results are evaluated regarding theirinterpretability and according to several quantitative metrics. We conclude thatdata availability is an influential covariate in corpus-based dialectometry andbeyond, and recommend that researchers be aware of this issue and of methods toalleviate it.