Articulation of vowel length contrasts in Australian English

Acoustic studies have shown that in Australian English (AusE), vowel length contrasts are realised through temporal, spectral and dynamic characteristics. However, relatively little is known about the articulatory differences between long and short vowels in this variety. This study investigates the articulatory properties of three long–short vowel pairs in AusE: /iː–ɪ/ beat – bit, /ɐː–ɐ/ cart – cut and /oː–ɔ/ port – pot, using electromagnetic articulography. Our findings show that short vowel gestures had shorter durations and more centralised articulatory targets than their long equivalents. Short vowel gestures also had proportionately shorter periods of articulatory stability and proportionately longer articulatory transitions to following consonants than long vowels. Long–short vowel pairs varied in the relationship between their acoustic duration and the similarity of their articulatory targets: /iː–ɪ/ had more similar acoustic durations and less similar articulatory targets, while /ɐː–ɐ/ were distinguished by greater differences in acoustic duration and more similar articulatory targets. These data suggest that the articulation of vowel length contrasts in AusE may be realised through a complex interaction of temporal, spatial and dynamic kinematic cues.


Introduction
The acoustic characterisation of vowel length contrasts in Australian English (AusE) has been clearly documented.Vowel length contrasts in this variety are realised through temporal (Bernard 1967, Cochrane 1970, Fletcher & McVeigh 1993, Cox 2006, Cox & Palethorpe 2011, Cox, Palethorpe & Miles 2015), spectral (Bernard 1970, Cox 2006, Elvin, Williams & Escudero 2016), and dynamic characteristics (Harrington & Cassidy 1994, Watson & Harrington 1999, Cox 2006, Elvin et al. 2016).Less is known about the articulatory characteristics of vowel length contrasts in AusE.Fletcher, Harrington & Hajek (1994) compared jaw displacement in the long-short vowel pair /"˘'"/ (barb -bub) in /bVb/ syllables for three speakers, and found that /"˘/ was consistently characterised by a lower target jaw position than /"/.Blackwood Ximenes, Shaw & Carignan (2017) examined the articulation of a subset of AusE vowels produced in /sVd/ context by four speakers, and found that the average tongue dorsum position of /I/ was lower and more retracted than its long equivalent /i˘/.The present study expands on previous articulographic work by examining the lingual articulation of multiple long-short vowels pairs in AusE, allowing us to characterise the kinematic properties that underlie the realisation of this contrast.

Phonetic correlates of vowel length contrast
The primary cue to vowel length contrasts in languages such as AusE, is vowel duration.Long vowels are prototypically produced with a greater acoustic duration than short vowels (Lehiste 1970, Lindau 1978).The acoustic duration of vowels is commonly measured from the onset to offset of vowel voicing (House 1961, Lehiste & Peterson 1961, Bell-Berti & Harris 1981, Hertrich & Ackermann 1997).This measure is dependent upon the duration of laryngeal activity associated with vowel articulation (Bell-Berti & Harris 1981, Hertrich & Ackermann 1997).However, the durations of the supralaryngeal articulatory movements of the lips, jaw and tongue have been relatively understudied.Hertrich & Ackermann (1997) examined the duration of lip-opening gestures associated with German vowels, finding that, on average, the lip-opening movement of short vowels was approximately 80% the duration of those of long vowels, while the acoustic duration of short vowels was 60% that of long vowels.These results demonstrate that while vowel length-related durational contrast is specified across multiple articulators (e.g.lips, larynx), this durational contrast appears to be specified differently across these different articulators.However, it remains an open question whether differences between acoustic and articulatory characteristics of vowel duration occur in other languages.
In Dutch, English, German, and Swedish, long/tense and short/lax vowels1 often also differ with regard to their position in the vowel space, with the acoustic and articulatory targets of short vowels produced closer to the centre of the vowel space compared to their long equivalents (Lindblom 1963, Hadding-Koch & Abramson 1964, Lindau 1978, Nooteboom & Doodeman 1980, Jessen 1993, Hoole & Mooshammer 2002, Cox 2006, Harrington, Hoole & Reubold 2012, Elvin et al. 2016).Early accounts of vowel quality differences between long and short vowels proposed a physiological explanation, whereby the centralisation of short vowel targets was said to be due to biomechanical limitations on achieving the same phonological target as their long equivalents in a shorter time span (Lindblom 1963).In this undershoot account, the primary determinant of centralisation is vowel duration: the shorter the vowel the more centralised its target (Lindblom 1963).However, vowel quality may be manipulated independently of vowel duration in the realisation of vowel length contrasts.In German unstressed syllables, short (lax) vowels are centralised but not shorter in duration than long vowels (Mooshammer & Fuchs 2002, Mooshammer & Geng 2008).Furthermore, listeners appear to use both vowel quality differences and durational differences as cues to vowel length contrasts (Delattre 1962, Hadding-Koch & Abramson 1964, Mooshammer & Fuchs 2002, Gussenhoven 2007, Mády & Reichel 2007, Mooshammer & Geng 2008, Lehnert-LeHouillier 2010, Meister, Werner & Meister 2011, Tomaschek, Truckenbrodt & Hertrich 2015).Vowel quality differences and durational differences appear to be in a trading relationship in some languages; listeners rely less on durational cues to vowel length when presented with stimuli in which long-short vowel quality differences are exaggerated, and rely more on durational cues when quality differences are minimised (Delattre 1962, Hadding-Koch & Abramson 1964, Lehnert-LeHouillier 2010).
Vowel length contrasts are also characterised by differences in formant dynamics.The proportionate duration of three acoustic components: the acoustic onglide, acoustic steady-state (target) and the acoustic offglide have been shown to differ between long and short vowels.In American English (Lehiste & Peterson 1961), Canadian English (Nearey & Assmann 1986), German (Strange & Bohn 1998), and AusE (Bernard 1967, Watson & Harrington 1999, Cox 2006), short vowels have proportionately shorter acoustic steady-states and proportionately longer acoustic offglides than their long counterparts.These differences have also been observed in articulation, with short vowels in German and Slovak exhibiting proportionately shorter articulatory steady states than their long equivalents (Kroos et al. 1997, Hoole & Mooshammer 2002, Beňuš 2011).In German, short vowels also exhibit proportionately longer release intervals (the articulatory transition to following tautosyllabic consonants) than long vowels (Kroos et al. 1997, Hoole & Mooshammer 2002).Little is known about the dynamic articulatory properties of AusE vowels, or whether AusE exhibits similar vowel-length dependent patterns of articulation as those found in German.

Duration
On average, AusE short vowels are 60% the duration of their long equivalents in voiced coda contexts (Cox 2006, Elvin et al. 2016).This is more distinct than the tense/lax contrast of General American English, in which lax vowels are approximately 75% the duration of their tense equivalents (Peterson & Lehiste 1960, House 1961).Relative durational differences are consistent across various phonetic contexts (Elvin et al. 2016).However, the absolute duration of AusE vowels is affected by vowel height similar to other English dialects (House 1961, Chen 1970, Cochrane 1970, Klatt 1976, Cox 2006, Elvin et al. 2016).In citation form /hVd/ context, /I/ has a shorter absolute duration (∼140 ms) than both the short open vowel /"/ (∼160 ms) and the short mid-open vowel /ç/ (∼170 ms) (Cox 2006).No previous studies of AusE have examined the duration of lingual activity associated with vowels, so it is not known how these durational contrasts manifest in the articulatory domain.

Target quality
Although duration is the primary cue to vowel length in AusE (Harrington & Cassidy 1994, Watson & Harrington 1999, Cox 2006), long and short vowel pairs also differ in target quality, with some vowel pairs intrinsically more spectrally and spatially differentiated than others.
/i˘'I/ also share similar acoustic vowel targets.Cox (2006) found no significant difference in the mean F1 and F2 values of /i˘/ (F1 = 391 Hz, F2 = 2729 Hz) and /I/ (F1 = 402 Hz, F2 = 2697 Hz) produced by adolescent females in the 1990s.However, studies based on more recent acoustic data have suggested that in young AusE speakers /I/ is marginally lower and more retracted than /i˘/ (Cox, Palethorpe & Bentink 2014).These acoustic results are supported by recent articulatory studies where /I/ is produced with a significantly more retracted and lowered tongue dorsum than /i˘/ (Blackwood Ximenes et al. 2017).
Unlike /"˘'"/ and /i˘'I/, /o˘/ and /ç/ can be differentiated through their target formant values alone, independent of durational information (Bernard 1970, Watson & Harrington 1999, Cox 2006, Elvin et al. 2016, Cox & Fletcher 2017).The primary difference between /o˘/ and /ç/ is in target F1 (/o˘/ = 494 Hz, /ç/ = 708 Hz) although the pair also differs in F2 (/o˘/= 954 Hz, /ç/ = 1182 Hz; Cox 2006).Early articulatory analysis of /o˘/ and /ç/ shows a clear differentiation of target tongue position for this pair (Bernard 1970), however, recent articulatory analyses highlight that the tongue dorsum positions at the target of /o˘/ and /ç/ have much larger degree of articulatory overlap than reflected in the target F1 and F2 values of these vowels: /o˘/ is articulated with a similar tongue dorsum height and a slightly more retracted posture than /ç/ (Blackwood Ximenes, Shaw & Carignan 2016, 2017;Ratko et al. 2016).Instead, differences in lip rounding may also contribute to the F1 and F2 differences between /o˘/ and /ç/.Blackwood Ximenes et al. (2017) observed that the long /o˘/ had a greater degree of lip protrusion than the short /ç/ in three out of four recorded participants.More research is needed to determine whether differences in lip rounding are also present in other samples of AusE speakers.
Collectively, this work suggests that different long-short vowel pairs may vary in the extent to which vowel length contrast is expressed by temporal (duration) or spectral/spatial (target formant values or target tongue position) information (Watson & Harrington 1999, Cox 2006).
This study will focus on the articulation of three long-short vowel pairs /i˘'I/, /"˘'"/ and /o˘'ç/.3These pairs are distributed across three peripheral areas of the AusE vowel space (Figure 1)./i˘'I/ beat -bit are considered to contrast primarily in vowel length, although this pair also has an additional onglide contrast present in /i˘/ (Cox 2006, Cox & Palethorpe 2007)./"˘'"/ cart -cut contrast primarily in length.The third pair, /o˘'ç/ portpot are distinguishable by acoustic height in addition to length (Cox 2006), but have a high degree of lingual articulatory similarity (Blackwood Ximenes et al. 2016Ximenes et al. , 2017;;Ratko et al. 2016).

Aims and predictions
The aim of this paper is to provide an empirical investigation of the lingual articulation of vowel length contrasts in AusE.The present study builds upon a largely acoustic description of AusE vowels.The few prior articulatory studies of AusE vowels have not focused on length contrasts (Bernard 1967, Blackwood Ximenes et al. 2017) or have included examination of only a single long-short vowel pair (Fletcher et al. 1994).We make the following predictions: 1. Durational differences in the lingual gestures (gesture onset to gesture offset) of long and short vowels should follow similar patterns as acoustic duration differences, with short vowel gestures having a shorter duration than long vowel gestures (Cox 2006, Elvin et al. 2016), but the magnitude of durational differences between long and short vowels should be reduced in the articulatory domain, as has been found in German (Hertrich & Ackermann 1997).2. Although all long-short vowel pairs should exhibit similar articulatory targets, the degree of similarity is predicted to differ by vowel pair.The low vowel pair /"˘'"/ will have the most similar articulatory targets, whereas /i˘'I/ and /o˘'ç/ will have less similar pairwise articulatory targets (Bernard 1970, Cox 2006, Elvin et al. 2016).3.There will be a trading relationship between acoustic duration and spatial and kinematic differences in the realisation of vowel length contrast (Delattre 1962, Hadding-Koch & Abramson 1964, Lehnert-LeHouillier 2010).That is, the long-short pair with the most similar articulatory targets will exhibit the largest pairwise difference in acoustic duration, whereas the long-short pair with the least similar articulatory targets will exhibit the smallest difference in acoustic duration.This is in opposition to Lindblom's (1963) target undershoot account, which predicts that the vowel pair with the least similar articulatory target would exhibit the largest durational differences.4. /o˘/ will be produced with more lip rounding than /ç/. 5.In line with acoustic studies (Watson & Harrington 1999, Cox 2006, Elvin et al. 2016), long and short vowel gestures will be characterised by different dynamic articulatory patterns.Short vowels should exhibit a proportionately shorter period of articulatory stability around their mid-points and proportionately longer articulatory transitions to following consonants.However, the long vowel /i˘/ should be characterised by a lengthy phonological onglide as is characteristic of AusE (Cox 2006).

Participants
Participants were seven monolingual speakers of AusE (four females).Average age was 20.4 years (s.d.= 2.82).All participants were born in Australia and had at least one Australianborn parent.All reported no history of speech or hearing disorders.All received primary and secondary education within New South Wales, and were residents of the Greater Sydney region at time of recording.

Experiment materials
Vowel pairs /i˘'I/, /"˘'"/, /o˘'ç/ were elicited in two symmetrical consonant contexts: /pVp/ and /tVt/ (Table 1).Consonant context conditions the duration, quality and formant dynamics of vowels (Stevens & House 1963, Klatt 1976, Jenkins, Strange & Edman 1983, Strange, Jenkins & Johnson 1983, Sussman et al. 1997, Strange & Bohn 1998, Hillenbrand, Clark & Nearey 2001, Strange et al. 2007, Pycha & Dahan 2016).As such, we included two consonant contexts to better determine which intrinsic differences between long and short vowels are maintained across consonant contexts, and which are contingent upon surrounding consonant identity.The experiment was designed to carefully control for the effects of phonetic context on vowel articulation, which necessitated the use of a combination of both words and non-words.Studies have shown that participants may hyperarticulate novel or unfamiliar words (Umeda 1975, Klatt 1976, Fowler & Housom 1987).To minimise the potential influences on articulation due to lexical status and familiarity, all participants in this task undertook two practice sessions prior to recording.
A carrier phrase was used to create an antagonistic tongue dorsum position prior to and following the target item./i˘'I/ were presented within the carrier phrase Star CVC heart /st"C VC h"˘t/./"˘'"/ and /o˘'ç/ were presented within the carrier phrase See CVC heat /si˘CVC hi˘t/, with focus on the target word.Prior to recording, all participants were familiarised with elicitation materials and instructed to read them aloud.If a participant pronounced the target word incorrectly or was unsure how to pronounce it, they were shown a written word that rhymed with the desired pronunciation.Participants then read each phrase from orthographic presentation on a computer screen in a sound attenuated room.Presentation was self-paced.
The 12 target words (Table 1) were divided into two blocks: block one consisted of target words containing /i˘/ and /I/, and block two containing /"˘" o˘ç/.Target words were randomised within blocks.Ten repetitions of each word within its carrier phrase (120 items) were elicited from each participant.Participant W1 terminated the experiment early, resulting in only eight repetitions for that participant.

Data acquisition
Articulatory data were recorded using a Northern Digital Inc. Wave Electromagnetic Articulography (EMA) system (Northern Digital Inc. 2016) at a sampling rate of 100 Hz.The placement of sensors is shown in Figure 2. Three lingual sensors were placed at the (1) tongue tip (∼6 mm from anatomical tongue tip), (2) tongue body (∼22 mm from tongue tip) and (3) tongue dorsum (∼40 mm from tongue tip).Sensors were also placed on the (4) upper lip, (5) lower lip and (6) lower gum line, to track jaw height.Reference sensors were placed on the (7) nasion and the protrusion of the (8) left mastoid and ( 9) right mastoid processes.Speech audio was recorded using a ROde NT1-A shotgun microphone at a sampling rate of 22050 Hz.

Data processing
Articulatory sensor signals were corrected for head movement and rotated to a common coordinate system defined with respect to the rear of the upper incisors using the three reference sensors.For the analysis presented in this study we used data from the tongue dorsum (TD) sensor (Sensor 3 in Figure 2).The TD sensor was chosen as it exhibited the greatest displacement during vowel gesture production for all participants and vowel pairs (see Appendix Table A1).Articulatory signals were low-pass filtered and conditioned using a DCT-based discretised smoothing spline (Garcia 2010) and synchronised with the audio data.

Acoustic segmentation
Two acoustic landmarks were identified for each vowel: acoustic onset (Figure 3: (A)) and acoustic offset (Figure 3: (B)).In each recording, RMS energy was calculated in 20 ms 75% overlapped Hamming-windowed intervals over the length of a 1.5-second interval centred on the target vowel.Working outwards from the peak RMS energy, the first and last points in time were located at which signal energy fell below 0.5% of maximum RMS energy (Tiede 2005).These acoustic estimates were superimposed on time-aligned waveforms and shorttime spectrograms plotted up to 10000 Hz, and inspected and manually adjusted by a trained phonetician when necessary (approximately 5% of tokens).Vowel acoustic duration (AcDur) was calculated as the difference between acoustic limits (B-A).

Articulatory analysis
Acoustic and articulatory landmarks are illustrated for tokens of parp and pup in Figure 3.The topmost panel is the acoustic waveform, the middle panel is the velocity of the tongue dorsum (TD) sensor and the lower panel is the TD trajectory.For simplicity, velocity and displacement are shown only in the vertical dimension, however, measurements were based on the tangential velocity of the TD sensor in both horizontal (TD x ) and vertical (TD y ) dimensions.A trained phonetician located a lingual vowel gesture in each target word using the findgest algorithm in the MATLAB-based software package MVIEW (Tiede 2005).The findgest algorithm uses the tangential velocity of a given sensor to locate several gesture landmarks (Figure 3).Gestural onset (GONS) was the point before P1 where velocity dropped to 20% of P1 velocity, nucleus onset (NONS) was the point after P1 where velocity dropped to 20% of P1 velocity, nucleus offset (NOFFS) was the point before P2 where velocity dropped to 20% of P2 velocity, gestural offset (GOFFS) was the point after P2 where velocity dropped to 15% of P2 velocity.Vowel gesture durations (GDur) spanned from vowel gesture onset (GONS) to vowel gesture offset (GOFFS).
The choice of the current tripartite division of vowel gestures is informed by theories of gestural grammar (Browman & Goldstein 1990, Chitoran et al. 2002, Gafos 2002, Davidson 2004).Several studies have shown that linguistic grammars have access to and utilise the internal temporal structures of vowel and consonant gestures (see Gafos 2002 for review).
Acoustic studies of vowels also utilise a tripartite division, particularly in reference to vowel length contrasts, where the durations of these three sub-vocalic intervals are important to differentiating vowel length in many languages, including AusE (Cochrane 1970, Watson & Harrington 1999, Cox 2006).
We also report Euclidean distances between the articulatory targets (TargDiff) of the three long-short vowel pairs.Euclidean distances were calculated for each participant between the centroid of the articulatory targets of each of the three long vowels (MAXC in Figure 3; [(CentroidTD xl , CentroidTD yl )] and the individual tokens of their short equivalents [(TD xs i , TD ys i )]: A challenge of analysing articulatory data across participants is that differences in tongue shape, vocal tract size and sensor placement lead to cross-participant differences in constriction location that may not be linguistically meaningful.For example, a retraction of the TD sensor to 30 mm behind the front teeth (maxillary occlusal plane) may result in the production of a front vowel for one participant, or a back vowel for another participant, depending on the size and shape of each participant's vocal tract (Blackwood Ximenes et al. 2017).To compare across participants, Euclidean distance measures were normalised through z-scoring (TargDiffz), as outlined by Lobanov (1971).Lobanov's (1971) method was originally applied to vowel formants, however recently it has been applied to normalisation of EMA sensor positions (Shaw et al. 2016, Blackwood Ximenes et al. 2017).Lindblom's (1963) target undershoot account would predict that long-short pairs with larger durational differences would also exhibit larger vowel quality differences.We therefore also included the difference between the time to target attainment of long and short vowels as a variable in our models examining vowel quality across the three long-short pairs.Difference in time to target attainment (ms; TimeTargDiff) was calculated for each participant between the average time to target of each of the three long vowels [AverageTimetoTarg l ] (GONS to MAXC in Figure 3) and time to target of individual tokens of their short equivalents [TimetoTarg xsi ].
Lip protrusion was used as a measure of lip rounding in the present study, in line with previous work by Blackwood Ximenes et al. (2017).Degree of lip protrusion of /o˘/ and /ç/ was calculated based on the average horizontal position of the UL and LL sensors (Figure 2 above) measured at the target of the lingual gesture of the vowel (MAXC; Figure 3).This average horizontal position was z-transformed by participant.

Data exclusion
A total of 816 target words were elicited (12 target words × 10 repetitions × 6 participants) + (12 items × 8 repetitions × 1 participant).In six coronal context tokens there were more than two velocity peaks on the TD sensor trajectory between the maximum constrictions of the onset and coda consonant: these tokens were excluded from further analysis.Seven further items were excluded due to mispronunciation and sensor tracking errors, leaving a total of 803 analysed target words.
To find the optimal model for each dependent variable, we explored top down, step-wise model building strategies, where a model was compared with another model one order less complex, using log-likelihood ratios.Final models only included main effects and interactions that significantly improved model fit (p >.050).Participant differences were modelled using random intercepts for participant and repetition.In cases where a full random-effects structure resulted in model convergence issues or a singular fit, the random effect with the lowest variance was removed; this is in line with recommendations by Barr et al. (2013) and Bates et al. (2015).The random components of models were not of further interest and are not reported.
P-values for main effects were obtained through maximum likelihood tests with Satterthwaite approximations to degrees of freedom (Kuznetsova et al. 2017).Because the variable vowel pair had three levels (/i˘'I/, /"˘'"/ and /o˘'ç/), we also conducted individual pairwise least-mean squares regression analysis (with Holm-Bonferroni corrections) using the emmeans package (Lenth 2019).This facilitated the comparison of the main effect of vowel pair, and interactions between vowel length and vowel pair and consonant context and vowel pair.For pairwise analysis, factors were coded as: vowel length: LONG = 0 and consonant context: LABIAL = 0.For vowel pair analysis: /i˘'I/ = 0.For the comparison between /"˘'"/ and /o˘'ç/, /"˘'"/ = 0. Full summaries of all linear mixed effects models are provided in Appendix Tables A2-A8.
Euclidean distance measures are an incomplete measure of vowel target similarity as they fail to take into account distribution differences across the different vowels.Two vowel pairs may exhibit a similar distance between their centroids but due to different overall distributions of individual token values may have vastly different degrees of overlap (Warren 2017).To overcome issues of different distributions across different vowels, Pillai-Bartlett scores have been used to examine spectral overlap in ongoing vowel mergers in acoustic literature (Hay, Warren & Drager 2006, Hall-Lew 2010, Nance 2011, Wong 2012, Havenhill 2015).The Pillai-Bartlett score is one of the test statistics of MANOVAs.The higher the value of the Pillai-Bartlett score, the greater the difference between the two analysed distributions with respect to the dependent variables of the MANOVA (Hay et al. 2006, Hall-Lew 2010).Three MANOVA models were constructed (one for each vowel pair), with dependent variables of z-transformed TD fronting (TD xz ) and TD height (TD yz ) with the following equation: Finally, because speech rate was not actively controlled during this experiment, we wished to determine whether differences in speech rate contributed to the observed differences in measured variables.We measured the onset of the target word to the onset of the target word in the following trial (token-to-token duration) as an approximation for speech rate.Token-to-token duration was a poor predictor in all the models analysed in this study, and as such was removed from all models during the model selection process.Appendix Figure A1 shows the correlation between token-to-token duration and the dependent variables analysed in this study.

Acoustic duration
We first wished to confirm that participants in the present study produced short vowels with a shorter acoustic duration than their long equivalents, in line with previous studies of vowel  3), gesture durations (ms, GDur, Figure 3) and proportionate durations of formation intervals, gesture nuclei and release intervals for all vowels averaged across participants.Standard deviations in parentheses.Formation interval (FI), gesture nucleus (GN) and release interval (RI) durations expressed as a proportion of total vowel gesture durations (GDur).Figure 4 (Colour online) Grand mean acoustic (left) and gesture durations (right) of /i˘'I/, /"˘'"/, /o˘'ç/ in labial (/pVp/) and coronal (/tVt/) consonant contexts.Mean durations (ms) calculated from all vowels produced by all participants in each consonant context.Acoustic duration = acoustic onset to acoustic offset (AcDur, see Figure 3), gesture duration = vowel gesture onset to vowel gesture offset (GDur, see Figure 3).

Vow
length in AusE (Cox 2006, Elvin et al. 2016).A linear mixed effects model was constructed using the method described in Section 2.7 with the following equation: A full model summary is provided in Appendix Table A2.
Our results regarding vowel length are therefore congruent with prior acoustic studies of vowel length in AusE.

Gesture duration
Our first prediction was that lingual gestures of short vowels should be shorter than those of long vowels (Cox 2006, Elvin et al. 2016).To test this, a linear mixed effects model was constructed using the method described in Section 2.7 with the following equation: A full model summary is provided in Appendix Table A3.Mean duration of short vowel gestures was 90% of the mean duration of long vowel gestures (F(786) = 59.3, p < .001;Table 2, Figure 4).The magnitude of difference between long and short vowels differed across labial and coronal contexts (F(787) = 7.2, p = .007).The difference in gesture duration between long and short vowels was smaller in coronal than in the labial context (β = 28 ms, t(787) = 2.7, p = .048).
Vowel gesture durations were shorter in the coronal than the labial context (F(787) = 405.7,p < .001).The gesture duration of all vowel pairs were shorter in the coronal than in the labial context, but the magnitude of the effect of consonant context on vowel gesture duration differed across vowel pairs (F(787) = 8.7, p < .001).Consonant context had the largest effect on the gesture duration of /o˘'ç/.The gesture duration of /o˘'ç/ shortened to a greater extent in the coronal context than the gesture duration of both /i˘'I/ (β = −48.2ms, t(787) = 3.7, p = .001)and to a greater extent than /"˘'"/ (β = −38.2ms, t(787) = 3.0, p = .019)./i˘'I/ and /"˘'"/ shortened to a similar extent in the coronal (compared to the labial) context (p = .544).

Articulatory targets
Our second prediction posited that /"˘'"/ will exhibit the most similar articulatory targets of the three vowel pairs, whereas /i˘'I/ and /o˘'ç/ will have less similar pairwise articulatory targets.Our third prediction posited that vowel duration and vowel quality would exhibit a trading relationship in AusE, such that vowel pairs with the largest acoustic duration difference would exhibit the smallest difference in target quality and vice versa.To determine this, the similarity in articulatory target tongue dorsum positions were compared for the three long-short vowel pairs produced in labial (/pVp/) and coronal (/tVt/) contexts (Table 3, Figure 5), using the method illustrated in Section 2.6.The z-transformed absolute Euclidean distance between the targets of long and short vowel pairs (TargDiffz) was modelled using the method described in Section 2.7, with the following equation: The duration difference between time to long and short vowel target (TimeTargDiff) did not improve model fit (p = .536),so was not included in the present model.A full model summary is provided in Appendix Table A4.
Overlap between the distributions of long and short vowel targets was also compared using Pillai-Bartlett scores.Pillai-Bartlett scores are shown in Table 3. Lower Pillai scores indicate more overlap between two distributions./"˘'"/ exhibited the lowest Pillai-Bartlett scores (0.24) of the three vowel pairs, while /i˘'I/ (0.47) and /o˘'ç/ (0.48) exhibited similar scores.

Lip rounding differences between /o˘/ and /ç/
Our fourth prediction posited that lip rounding should be greater for /o˘/ than /ç/.To investigate this, we compared lip protrusion of /o˘/ and /ç/.Lip protrusion was calculated as the average horizontal position of the UL and LL sensors at the lingual target of the two vowels z-transformed across participants.Differences in lip protrusion between /o˘/ and /ç/ was modelled using the method described in Section 2.7 using the following equation: A full model summary is provided in Appendix Table A5.Overall, /o˘/ was produced with more lip protrusion than /ç/ (F(198) = 143.4,p < .001; Figure 6).Z-transformed lip protrusion was also greater for coronal than labial tokens (F(198) = 10.3, p = .002),suggesting greater lip rounding for /o˘/ than /ç/.

Interval durations
Our final prediction was that, in line with acoustic studies (Watson & Harrington 1999, Cox 2006, Elvin et al. 2016), long and short vowel gestures will be characterised by different dynamic articulatory patterns.Short vowels should exhibit a proportionately shorter period of articulatory stability around their mid-points and proportionately longer articulatory transitions to following consonants.However, the long vowel /i˘/ should be characterised by a lengthy phonological onglide as is characteristic of AusE (Cox 2006).To determine whether this was the case, proportionate durations of three intervals within each vowel gesture were compared across the three long-short vowel pairs using a linear mixed effects model constructed using the method described in Section 2.7.Full model summaries are provided in Appendix Tables A6-A8.Absolute and proportionate durations of the three sub-gestural intervals are presented in Table 2 and Figures 7 and 8.However, statistical analyses were undertaken only for the proportionate formation interval, gesture nucleus and release interval durations (FI%, GN% and RI%, respectively).Figure 7 compares tongue dorsum displacement throughout the vowel gesture for each pair of vowels.For each vowel, TD displacement with respect to dorsal location at the (i) vowel gesture onset (GONS) is tracked at three gesture landmarks: (ii) Nucleus onset (NONS), (iii) Nucleus offset (NOFFS), and (iv) Gesture offset (GOFFS).At each landmark, displacement is calculated as mean Euclidean distance in the midsagittal plane between TD xy and TD xy at GONS.Timing of landmarks is expressed as a proportion of total vowel gesture duration (GDur).

Formation interval
A linear mixed effects model was constructed using the method described in Section 2.7 with the following equation: A full model summary provided in Appendix Table A6.As shown in Figure 8, vowel length conditioned FI% of the three vowel pairs differently across the two consonant contexts (F(796) = 6.0, p = .003).
Durations expressed as proportion of entire vowel gesture duration (GDur).Intervals determined as shown in Figure 3.

Release interval
A linear mixed effects model was constructed using the method described in Section 2.7 with the following equation: Full model summary provided in Appendix Table A8.

Summary of main findings
This study compared the lingual articulatory properties of three long-short vowel pairs /i˘'I/, /"˘'"/ and /o˘'ç/ in two symmetrical consonant contexts /pVp/ and /tVt/.The main findings of this study are: • acoustic duration4 of short vowels was 62% the acoustic duration of long vowels • gesture duration (measured as GDur) of short vowels was 90% that of long vowels • /"˘'"/ had the greatest pairwise difference in acoustic duration, while /i˘'I/ had the smallest pairwise difference in acoustic duration • the difference between long and short vowel gesture durations was larger in the labial than in the coronal context • /"˘/ and /"/ were produced with the most similar mean articulatory targets and the most overlapping long-short distributions; /o˘/ and /ç/ were produced with the least similar articulatory targets • contrasts between long and short vowel FI% and GN% were reduced in coronal compared to labial contexts • FI% was longer for /i˘/ than for /I/, but longer for /"/ and /ç/ than /"˘/ and /o˘/ in the labial context • GN% was shorter for short vowels, compared to their long equivalents • pairwise difference in GN% was smallest between /i˘'I/, and equivalent between /"˘'"/ and /o˘'ç/ • RI% was longer for short vowels, compared to their long equivalents

Discussion
The aim of this study was to investigate lingual articulation of vowel length contrasts in AusE, building on previous, largely acoustic description of AusE vowel contrasts.These data provide an articulatory characterisation of some key aspects of vowel length contrasts in AusE, revealing new insights into AusE production kinematics.

Gesture duration
We first explored the impact of contrastive vowel length on vowel gesture durations.Our first prediction was that in line with acoustic durations, the duration of short vowel gestures would be shorter than those of long vowel gestures, but that the duration difference between long and short vowels should be reduced in the articulatory domain.Our results confirmed this prediction.On average short vowels were 62% the acoustic duration of long vowels, in line with previous acoustic studies of AusE vowel length (Cox 2006, Elvin et al. 2016), while short vowel gestures were 90% the duration of long vowel gestures.Hertrich & Ackermann (1997) have speculated that the discrepancy between acoustic duration and gesture durations indicates that phonological vowel length contrast is not produced as a difference in either the duration of laryngeal activity (vowel voicing) or supralaryngeal (lips, tongue, jaw) movement alone, but rather reflects a vowel length dependent difference in the coordination of laryngeal and supralaryngeal gestures.
The gesture durations of coronal context vowels were 77% the duration of labial context vowels.This result is congruent with findings that show coronal consonants constrain the production of following vowels to a greater degree than labial consonants (Recasens, Pallarès & Fontdevila 1997, Sussman et al. 1997, Löfqvist 1999, Fowler & Brancazio 2000, Recasens 2002, Harrington et al. 2011, Harrington, Kleber & Reubold 2011).Due to the relative independence of the lips and tongue dorsum, vowel gestures in labial contexts can begin earlier than those in coronal contexts, which can result in a longer duration as has been observed in this study.There was also a smaller difference between the duration of long and short vowel gestures in the coronal than in the labial context.Across the two consonant contexts, the duration of short vowel gestures was less conditioned by consonant context than the duration of long vowel gestures, resulting in a reduction of contrast in the coronal context.This finding is consistent with general observations that the duration of short vowels is more stable than the duration of long vowels across different speech rates, prominences and phonetic contexts (Klatt 1973, Port 1981, Gopal 1990, Fletcher et al. 1994, Hoole, Mooshammer & Tillmann 1994, Hoole & Mooshammer 2002, Jong & Zawaydeh 2002, Mooshammer & Fuchs 2002, Hirata 2004, White & Mády 2008, Nakai et al. 2009, Beňuš 2011, Cox & Palethorpe 2011, Cox et al. 2015, Peters 2015, Penney et al. 2018).
However, the shorter gesture duration of coronal context vowels contradicts our acoustic duration results, where coronal context vowels exhibited a longer acoustic duration than labial context vowels.While this has also been observed in other acoustic studies of English (House & Fairbanks 1953, Lehiste & Peterson 1961, Port 1981), the discrepancy between acoustic and gesture durations once again suggests that the relationship between acoustic and articulatory landmarks in vowel production are sensitive to factors such as vowel length and consonant context.

Articulatory target similarity
In Section 3.2, we compared articulatory targets of long and short vowel pairs in AusE.Acoustic studies of AusE have shown that long-short vowel pairs differ in the degree of spectral similarity, with /"˘'"/ the least spectrally differentiated and /o˘'ç/ the most spectrally differentiated (Bernard 1970, Harrington & Cassidy 1994, Watson & Harrington 1999, Cox 2006, Cox et al. 2014, Elvin et al. 2016, Cox & Fletcher 2017).Our second prediction was that although all long-short vowel pairs should be produced with similar articulatory targets, the degree of similarity should differ by vowel pair.Of the three long-short vowel pairs, /"˘'"/ was predicted to exhibit the most similar articulatory targets, while /o˘'ç/ was predicted to be realised with the least similar articulatory targets.This was indeed the case, /"˘'"/ had the shortest Euclidean distance between vowel targets and the most overlapping distributions of the three vowel pairs (Figure 5)./o˘'ç/ had the largest Euclidean distance between long and short targets and the least overlapping distributions.However, Euclidean distance values were also highly variable for /o˘'ç/ (Figure 5), which may indicate participant-specific strategies for the production of this pair, with some participants producing the pair with more articulatorily distinct targets than others (Appendix Figure A2).
There was a larger difference in long and short vowel target quality in coronal than in the labial context (Figure 7).Prior studies have suggested that short vowels may be more coarticulated with following consonants than their long equivalents (Hoole & Mooshammer 2002), resulting in short vowels exhibiting more target quality variation across consonant contexts than their long equivalents.Future studies should examine interactions between consonant context and articulation of AusE vowels in more detail.
We also observed that /o˘/ exhibited greater lip protrusion than /ç/, suggesting that /o˘/ is more rounded than /ç/.This is congruent with Blackwood Ximenes et al.'s (2017) observations of lip rounding differences between /o˘/ and /ç/ in three speakers of AusE.Blackwood Ximenes et al. (2017) have suggested that differences in lip rounding between /o˘'ç/ may also contribute to F1 and F2 differences between the pair, raising and retracting /o˘/ in the acoustic space relative to /ç/ independent of lingual adjustments.This may also be the case here; however, the tongue dorsum position of /o˘/ was still higher and retracted compared to /ç/.There was also variation in the degree of lip protrusion differences across participants.M3 and W3 produced /o˘'ç/ with overlapping lip protrusion values.This once again highlights potential speaker-specific strategies in the production of these vowels.Although more research is needed to determine whether overlapping lip protrusion and/or tongue dorsum postures between /o˘/ and /ç/ are reflected in overlapping F2 values in these speakers.

Trade-offs between acoustic duration and articulatory target
In languages such as Japanese, Swedish and Thai, acoustic vowel duration and spectral quality have a trading relationship as cues to vowel length (Delattre 1962, Hadding-Koch & Abramson 1964, Lehnert-LeHouillier 2010); that is, the more differentiated the acoustic targets of a long-short vowel pair, the less listeners rely on durational cues and vice versa (Hadding-Koch & Abramson 1964).In line with these studies, our third prediction was that the vowel pair with the largest pairwise difference in duration would have the most similar articulatory targets and vice versa.Our results partially support this prediction./"˘'"/ had the largest pairwise difference in acoustic duration, and the most similar articulatory targets of the three vowel pairs.This is consistent with prior studies that have shown that vowels in this pair differ primarily in acoustic duration, and have largely overlapping acoustic targets (Bernard 1970, Cochrane 1970, Harrington & Cassidy 1994, Watson & Harrington 1999, Cox 2006, Elvin et al. 2016).However, /i˘'I/ had the smallest pairwise difference in acoustic duration, but /o˘'ç/ had the least similar articulatory targets.This result is not unexpected for two reasons.First, /o˘'ç/ can be differentiated by acoustic target quality alone, independent of durational information (Watson & Harrington 1999).Second, while duration is important for differentiating /i˘/ and /I/ in AusE, there are also dynamic formant differences (namely /i˘/'s prolonged acoustic onglide) that also serve to further differentiate /i˘/ from /I/ (Harrington & Cassidy 1994, Harrington et al. 1997, Watson & Harrington 1999, Cox 2006, Cox et al. 2014, Cox et al. 2015).The previously observed trade-off between acoustic duration and spectral quality as cues to vowel length (Delattre 1962, Hadding-Koch & Abramson 1964, Lehnert-LeHouillier 2010) may rather be a trade-off between durational and non-durational cues to vowel length contrast, with the dynamic differences between /i˘/ and /I/ contributing to this trading relationship.
Our findings also challenge a purely physiological account of vowel quality differences between long and short vowels, such as that proposed in Lindblom's (1963) target undershoot model.First, in a target undershoot account, we would expect the vowel pair with the largest durational differences to exhibit the largest vowel quality differences.However, this was not the case for either acoustic duration or gesture duration.As mentioned above, /"˘'"/ had the largest difference in acoustic duration, and the smallest pairwise difference in vowel quality.In terms of gesture duration, the difference between long and short vowels was similar across the three vowel pairs.Furthermore, in a target undershoot account, we would predict that the difference in duration to the time of gestural target would be a predictor of differences in target quality (TargDiffz).Our results do not support this account.Difference in time to target was not a significant predictor of vowel quality differences across our three vowel pairs.

Kinematic differences
Finally, we examined dynamic kinematic differences between long and short vowel gestures.Previous acoustic production studies have found differences in the formant dynamics of long and short AusE vowels suggesting differences in articulatory kinematics (Bernard 1970, Cochrane 1970, Watson & Harrington 1999, Cox 2006).We predicted that short vowel gestures would have a proportionately shorter period of articulatory stability around their mid-points and proportionately longer articulatory transitions to following consonants than their long equivalents.In our comparison of these intervals in long and short vowel gestures, short vowel gestures indeed had proportionately shorter gesture nuclei and proportionately longer release intervals than long vowel gestures (Figures 7 and 8).These data provide the first articulatory evidence supporting acoustic studies that have found AusE short vowels to have a proportionately shorter acoustic steady-states and proportionately longer acoustic offglides than long vowels (Cox 2006).
We also posited that /i˘/ would exhibit a prolonged phonological onglide as is characteristic of AusE (Harrington et al. 1997, Cox 2006, Cox et al. 2014).Our results generally confirmed this, with /i˘/ exhibiting the longest proportionate formation interval of the long vowels.However, proportionate formation interval of /i˘/ was only significantly longer than /I/ in the labial context.The shortening of /i˘/ in the coronal context, may be due to the articulatory requirements of coda /t/ on the /i˘/ gesture (Recasens et al. 1997, Sussman et al. 1997, Recasens 2002).Several studies have noted that high front vowels in syllables containing coronal consonants, exhibit more retracted acoustic and articulatory targets than those produced in other consonantal contexts (Stevens & House 1963;Schouten & Pols 1979a, b;Sussman et al. 1997;Strange & Bohn 1998;Hoole 1999;Nearey 2013).In English, coronal consonants exhibit higher coarticulatory resistances than surrounding vowels, with the targets of surrounding vowels compromised to reach the desired articulatory goal of the coronal consonant (Sussman et al. 1997, Hoole 1999).In the production of coda /t/, the tongue dorsum must be sufficiently retracted for the tongue tip to be raised for alveolar closure (Hoole 1999).
As shown in Figure 5, /i˘/ and /I/ are sometimes produced with a more retracted TD posture in the coronal context, supporting these observations.In the production of /i˘/ in the coronal context, the proportionately later achievement of vowel target, due to prolonged onglide, is antagonistic to the required retracted position necessary for production of the coronal coda.Therefore, speakers may shorten the duration of onglide in /t/ final syllables to allow earlier tongue dorsum retraction for coda /t/ closure.As no such constraint is placed on /i˘/ in labial final syllables, the phonological onglide is present.More research into production of /i˘/ in non-symmetrical consonant contexts may further illuminate the relative contribution of onset-vowel and vowel-coda organisation in the production of onglide in AusE.
/"˘/ had a shorter proportionate formation interval than /"/ in both the labial and coronal context, while /o˘/ had a shorter proportionate formation interval than /ç/ in the labial context (Figures 6 and 7).This result is largely inconsistent with previous acoustic studies of vowel length in not only AusE but also American English and German, which have found no significant difference in proportionate acoustic onglide between long and short vowels (Lehiste & Peterson 1961, Strange & Bohn 1998, Cox 2006).However, Lehiste & Peterson (1961) descriptively reported that short/lax vowels in American English (excluding /I/) had proportionately longer acoustic onglides than their long equivalents.The proportionately longer formation interval of short /"/ and /ç/ reported here, are consistent with general observations that vowels of shorter durations exhibit proportionately longer transitions from and to surrounding phonemes (Gay 1981, Soli 1982, Van Summers 1987).The discrepancy between prior acoustic and current articulatory results may arise from, as discussed above, potential differences in laryngeal-supralaryngeal coordination in long and short vowel gestures (Hertrich & Ackermann 1997).If a larger proportion of short vowels is concealed by preceding consonant aspiration than long vowels, it may mask articulatory differences in onglide in the acoustic domain.
We also found that the magnitude of gesture nucleus durations differed by vowel pair, with /i˘'I/ exhibiting the smallest pairwise difference in proportionate gesture nucleus duration of the three vowel pairs.This appears to be the result of shorter proportionate gesture nucleus duration of /i˘/ compared to the other two long vowels (Table 2), driven by the presence of onglide in /i˘/.Coronal context vowels also exhibited a proportionately shorter gesture nucleus duration than labial context vowels.The shortened gesture nucleus duration in the coronal context once again may be due to the relatively greater coarticulatory influence of /t/ on vowels (Recasens et al. 1997, Recasens 2002).
The effect of consonant context on proportionate release interval duration also differed across vowel pairs.Release intervals were longer in the coronal context than labial context for /i˘'I/ but were shorter in the coronal than labial context for /"˘, ", o˘, ç/.These patterns appear to be in a trading relationship with formation interval duration, although the exact mechanism behind this requires further investigation.

Future directions
There are some limitations to this study.First, we did not investigate articulatory control mechanisms that may underlie durational and vowel quality differences such as stiffness and velocity.This is primarily because speech rate was not actively controlled in this study.Speech rate also conditions the durational, spatial and kinematic properties of vowels (Ostry & Munhall 1985, Kroos et al. 1997, Hoole & Mooshammer 2002, Beňuš 2011).In particular, changes in duration due to variation in speech rate may be implemented through adjustments in gestural stiffness (the ratio of velocity to displacement) or adjustments in only velocity (Gay 1981, Byrd & Tan 1996, Shaiman 2001).These mechanisms are not mutually exclusive and may also interact with the implementation of vowel length (Kroos et al. 1997, Hoole & Mooshammer 2002).Future speech-rate controlled studies should investigate differences in stiffness, velocity and intergestural overlap between long and short vowels and how these can be understood within mass-spring implementations of Task-Dynamics (Saltzman & Kelso 1987, Saltzman & Munhall 1989, Hawkins 1992, Turk & Shattuck-Hufnagel 2020).
We also did not investigate differences in the intergestural organisation of syllables containing long vs. short vowels in AusE.In German, research suggests that short vowels are more overlapped with following coda consonants than long vowels (Hertrich & Ackermann 1997, Kroos et al. 1997, Hoole & Mooshammer 2002).This may also be the case in AusE, but requires further investigation to confirm.
Acoustic target and dynamic acoustic data were not directly compared to articulatory target and articulatory kinematic data in this study.In this study we found discrepancies between relative acoustic and relative gestural duration measures, with short vowels ∼ 62% the acoustic duration of long vowels, but ∼ 90% the gestural duration of long vowels.This is similar to prior studies investigating this relationship in German vowel length contrast (Hertrich & Ackermann 1997).This suggests that the timing relationship between the larynx and the supralaryngeal articulators in vowel production may differ between long and short vowels, however this requires further empirical examination.
We examined only the lingual articulation of vowels, and only using data from a single lingual sensor.Differences in vowel identity arise due to differences in overall vocal tract shape (Stevens & House 1955, Chiba & Kajiyama 1958, Lindblom & Sundberg 1971, Fant 1980), which is dependent on the coordinated placement of the tongue with respect to the jaw and lips (Lindblom & Sundberg 1971, Hoole & Mooshammer 2002).More detailed articulatory characterisation of vowels should examine the entire vocal tract.Future studies would benefit from rtMRI imaging technologies which offer high spatial and temporal resolution imaging of the vocal tract (Zhu et al. 2013, Lingala et al. 2017).
Finally, perceptual studies are also needed to examine how duration and target quality are used by listeners to cue long versus short vowels.Investigation of participant-specific trading relationships between duration and vowel quality in the production of vowel length contrasts may also provide further insight into the representation and implementation of vowel length.

Conclusions
This study has systematically examined articulatory differences between long and short vowels in AusE.Long vowels were characterised by different temporal, spatial and dynamic kinematic properties compared to their short equivalents.Our results suggest that vowel duration and vowel quality may be actively and independently controlled to realise vowel length contrasts in AusE.Our results also highlight discrepancies between acoustic and articulatory measures of vowel duration, raising questions about the relationship between these two ways of measuring durational contrast.These data reveal the importance of studying vowel production in both the acoustic and articulatory domains to more fully understand the representation and implementation of vowel contrasts.

Figure 1 (
Figure 1 (Colour online) Schematic illustrating the distribution of AusE monophthongs in the acoustic vowel space.Overlaid blue boxes indicate vowel pairs examined in this study.Based on Cox & Palethorpe (2007).

Figure 2 (
Figure 2 (Colour online) Configuration of EMA sensors.Left: Midsagittal view of sensor locations.Horizontal dashed line = occlusal plane; vertical dashed line = maxillary occlusal plane.Right: Location of the lingual sensors.

Figure 6 (
Figure 6 (Colour online) By-participant lip protrusion at target of /o˘/ and /ç/ in labial and coronal contexts.Averaged across UL and LL sensors and repetitions.Lip protrusion z-transformed by participant.Greater lip protrusion indicates a greater degree of rounding.

Figure A1 (Figure A2 (
Figure A1 (Colour online) Correlation between token-to-token duration and dependent variables analysed in this study.Token-totoken duration used as an approximation for global speech rate.Left to right: AcDur = acoustic duration (ms), GDur = gesture duration (ms), TargDiffz = z-transformed euclidean distance between long and short vowel targets, LPz = z-transformed lip protrusion for /o˘'ç/, FI% = proportionate formation interval duration, GN% = proportionate gesture nucleus duration, RI% = proportionate release interval duration.Correlation coefficient (r) provided for each variable.Distance from long vowel centroid (z-trans)

Table 1
Orthographic and phonemic representations of target words.

Table 3
Mean Euclidean distances (TargDiff) between articulatory targets (MAXC, 3) of the three long-short vowel pairs and Pillai-Bartlett scores.TargDiff (mm) calculated from all vowels produced by all participants in each consonant context (lab = labial, cor = coronal).TargDiffz (z-transformed) are Euclidean distances z-transformed by participant.Pillai-Bartlett scores represent degree of overlap between two distributions.Lower values indicate more overlap between two distributions.All values averaged across participants.Standard deviations in parentheses.

Table A4
Results of the mixed model analysis to test z-transformed Euclidean distance between long and short vowel targets (TARGDIFFz).

Table A6
Results of the mixed model analysis to test proportionate formation interval duration (FI%).

Table A7
Results of the mixed model analysis to test proportionate gesture nucleus duration (GN%).

Table A8
Results of the mixed model analysis to test proportionate release interval duration (RI%).