Hostname: page-component-5db58dd55d-8lnk4 Total loading time: 0 Render date: 2026-06-02T00:53:58.836Z Has data issue: false hasContentIssue false

A large-scale corpus study of phonological opacity in Uyghur

Published online by Cambridge University Press:  19 December 2025

Connor Mayer*
Affiliation:
Department of Language Science, University of California, Irvine , USA
*
Rights & Permissions [Opens in a new window]

Abstract

This article examines a case of phonological opacity in Uyghur resulting from an interaction between backness harmony and a vowel reduction process that converts harmonic vowels into transparent vowels. A large-scale corpus study shows that although opaque harmony with the underlying form of a reduced vowel is the dominant pattern, cases of surface-apparent harmony also occur. The rate of surface-apparent harmony varies across roots and is correlated with a number of factors, including root frequency. These data pose problems for standard accounts of opacity, which do not predict such variation. I propose an analysis where variation emerges from conflict between a paradigm uniformity constraint mandating that the harmonising behaviour of a root remains consistent, and surface phonotactic constraints. This is implemented in a parallel model by scaling constraint violations according to certainty in a root’s harmonic class. This aligns with past work suggesting some opacity is driven by paradigm uniformity.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press
Figure 0

Table 1 Harmonising segments in Uyghur

Figure 1

Table 2 Summary of corpora

Figure 2

Figure 1 Suffix harmony choice in tokens where the final root vowel raises, broken down by root class. Token counts are overlaid on each category.

Figure 3

Figure 2 Histograms showing the distribution of rates of back suffix application in BF and FB roots. Note that for raised BF roots, a back suffix constitutes surface-apparent harmony and a front suffix constitutes opaque harmony, while for raised FB roots, it is the opposite.

Figure 4

Figure 3 Suffix choice in raised BF roots broken down by root-final derivational suffix. ‘Other BF’ refers to BF roots that do not end in one of the three derivational suffixes. Token counts are overlaid on each category. The tokens of included here all have a preceding B vowel, as in ‘park’.

Figure 5

Table 3 Results from a mixed-effects logistic regression model whose coefficients were estimated using Bayesian inference. The 95\% credible interval shows the central range in which 95% of the sample values occur. Credible intervals that do not contain zero are interpreted as a meaningful directional effect, and are marked with *

Figure 6

Table 4 Mean log-likelihood (LL) for the training and test sets across each of the ten folds and the optimal value of $\sigma $. The lexical–surface model (in bold) obtains the best performance on the held-out test folds with the fewest number of constraints

Figure 7

Figure B1 Plot of samples from the posterior for each model parameter. The dark areas are the mean values, and the shaded areas are the 95% credible intervals.

Figure 8

Figure B2 Output of DHARMa model checks run on the Bayesian logistic regression model.

Figure 9

Figure B3 Histogram comparing simulated model residual standard deviations against the residual standard deviation from the fitted model.

Figure 10

Table D1 Coefficients from the mixed-effects logistic regression model for approximating when it is fitted to the entire data set