Voice quality and speaking rate in Icelandic rhetorical questions

Abstract In this paper, we show that Icelandic uses the phonetic parameters of speaking rate, duration and voice quality (VQ) to distinguish between information-seeking questions (ISQs) and rhetorical questions (RQs). Specifically, durations are longer (speaking rate is slower) and nonmodal VQs are used more in RQs than in ISQs. Our findings for temporal parameters fit in with previous studies on the prosody of RQs in various languages. With respect to VQ, Icelandic differs, for example, from German and English in the location of breathy voice in the utterance (utterance-initial in German and English, utterance-final in Icelandic). We interpret the utterance-final position of breathiness in Icelandic RQs as a potential compensating strategy for the lack of phonological cues, i.e. boundary tones.


Introduction
This paper deals with phonetic differences between information-seeking questions (ISQs) and rhetorical questions (RQs) in Icelandic, specifically voice quality (VQ) and speaking rate/global duration, focusing on polar and wh-questions.The prosody of the two illocution types (ISQs, RQs) has recently been compared for several languages, among them English (Dehé & Braun 2020b), German (Braun et al. 2019), Standard Chinese (Zahner et al. 2021), French (Beyssade & Delais-Roussarie, to appear), Estonian (Asu, Sahkai & Lippus 2020), Italian (Sorianello 2018(Sorianello , 2019)), Cantonese (Lo, Kiss & Tulling 2019), Japanese (Miura & Hara 1995) and Icelandic (Dehé, Braun & Wochner 2018, Dehé & Braun 2020a); see Dehé et al. (2022) for an overview.These studies show that speakers make use of the same prosodic parameters to indicate rhetorical meaning across languages: F0, constituent duration/speaking rate, and VQ.The ways in which they are used vary across languages, with most variation for f0 modification.Some of the The online version of this article has been updated since its original publication.A notice detailing the changes has been published at: https://doi.org/10.1017/S033258652200004Xf0-related variation follows from prosodic typology (e.g.intonation languages vs. tone languages) and language-specific pitch accent inventories.Regarding VQ, non-modal VQ often signals rhetorical meaning.For example, breathy VQ occurs in sentence-initial position in German polar and wh-RQs (Braun et al. 2019) and English wh-RQs (Dehé & Braun 2020b).In Chinese, glottal VQ occurs more frequently in RQs than in ISQs in both initial and final positions (Zahner et al. 2021).In German, VQ also distinguishes between questions and statements in general (e.g. more breathy voice in declarative questions than in declarative statements, Niebuhr et al. 2010).In several African languages, breathiness has been associated with questionhood (Rialland 2009 for utterance-final breathiness in polar questions in languages of the Gur family).Constituent durations are generally longer, or speaking rate slower, in RQs than in ISQs across languages.Faster speaking rate has also been observed in declarative questions than in string-identical statements (van Heuven & van Zanten 2005 for Manado Maylay, Orkney English and Dutch, Niebuhr et al. 2010 for German).This is potentially relevant because RQs have been argued to be assertion-like (e.g.Han 2002), in that eventually, all discourse participants are committed to the propositional content of the utterance, and RQs are thus closer in meaning to statements than to questions.
For Icelandic specifically, Dehé & Braun (2020a) show that ISQs and RQs differ in nuclear pitch accent types and in the type and frequency of prenuclear accents.The default boundary tone is low (L%) across question and illocution types.Regarding duration, the first word of the utterance (finite verb in polar questions, wh-word in wh-questions) and the nuclear syllable (first syllable of object noun) are longer in RQs than in ISQs (Dehé et al. 2018).VQ and global durational parameters have not yet been included in the prosodic comparison of RQs and ISQs in Icelandic.The present paper addresses these research gaps, showing that Icelandic exploits both VQ and speaking rate/duration to distinguish between RQs and ISQs.

Methodology
The current study is a post hoc analysis of Dehé & Braun's (2020a) data.While Dehé & Braun (2020a) focus on the intonation of ISQs vs. RQs, the present paper investigates VQ and speaking rate/duration.The data was elicited in a production experiment mimicking dialogue situations; materials consisted of 21 pairs of polar and 21 pairs of wh-interrogatives (e.g. ( 1)).All wh-questions started with the wh-pronoun hver 'who'; the subject in all polar questions was einhver 'anybody'.
( Data of 17 native speakers of Icelandic were analysed (average age 26.9 years; age range 20-32 years; 11 female, six male).Overall, 645 target interrogatives were analysed, 313 polar (156 ISQs, 157 RQs) and 332 wh-questions (166 ISQs, 166 RQs), exactly the same utterances as in Dehé & Braun (2020a).They were annotated in Praat (Boersma & Weenink 2018).Following Braun et al. (2019), VQ was annotated on a perceptual basis, at four positions.VQ was annotated by the second author, with 7% of the data also annotated by a research assistant.Interrater reliability (Cohen's Kappa, Cohen 1960) showed substantial agreement (90%, κ = 0.71) (Landis & Koch 1977).In wh-questions, the four positions were (i) the sentenceinitial wh-word (hver /k h vεːr/), (ii) the initial, stressed syllable of the finite verb (e.g./pɔr/ in borðar /ˈpɔr.ðar/'eats'), (iii) the initial, stressed syllable of the object noun (e.g./liː/ in límónur /ˈliː.mo͡ u.ˌnʏr/ 'limes'), and (iv) the offset of the utterance (e.g. last syllable of límónur).In polar questions, the four positions were (i) the stressed syllable of the sentence-initial verb (e.g./pɔr/ in borðar), (ii) the initial, stressed syllable of the subject einhver (/e ͡ in/ in /ˈe ͡ ͡ in.k h vεr/ 'anybody'), (iii) the stressed syllable of the object noun, and (iv) the offset of the sentence.Three types of VQ were perceptually classified: modal (neutral mode of phonation, Laver 1980), breathy (audible friction of the air) and glottalized (low frequency irregular vocal fold vibrations, Braun et al. 2021). 1  Speaking rate was operationalized as the number of syllables per second.The actual sentence duration served as the frame of reference for the calculation, to which the number of assumed syllables for each target sentence was set in relation, i.e. any segmental reductions or deletions were disregarded.However, reductions were minimal given the short length of the utterances and the laboratory setting.
For statistical analysis, we used a series of linear mixed effects regression models (lmers) with illocution type (ISQ, RQ) and question type (wh, polar) as fixed factors and participants and items as crossed-random factors (random intercepts).Random slopes were added and retained if they improved the model fit (Bates et al. 2015, Matuschek et al. 2017), as indicated by the anova() function in R-studio (R Core Team 2013).For the analysis of VQ (categorical variable), we used generalized linear mixed models (glmers).For the analysis of a specific VQ, the relevant type of VQ was coded as 1, the other two as 0. The effect of the fixed factors was calculated for these modified dependent variables (Agresti 2002).Model fitting followed the same procedure as for lmers.P-values were calculated using the Satterthwaite approximation in the R package lmerTest and adjusted (p adj ) by means of the Benjamini-Hochberg correction (Benjamini & Hochberg 1995).

Voice quality
The results for VQ are plotted in Figure 1 (top: wh-questions; bottom: polar questions).In the first three positions, there were no interactions between illocution type and question type (p > .5);we therefore report main effects.Breathy VQ occurred more often in RQs than in ISQs in all positions, but in the first three positions, observations for breathy VQ were too few to calculate statistical models.
We report the results for the four positions in turn.In sentence-initial position (verb in polar, wh-word in wh-questions), RQs were more often realized with glottalized VQ than ISQs, although the difference was not statistically significant (p > .1,p adj = .4).There was a main effect of question type; polar questions were significantly more often realized with glottal VQ than wh-questions (ß = 0.91, SE = 0.25, z = 3.68, p < .001,p adj < .01).For modal VQ, there were main effects of both illocution type and question type.RQs were less often realized with initial modal VQ than ISQs (ß = −0.74,SE = 0.23, z = −3.21,p = p adj < .01),and polar questions were less often realized with modal VQ than wh-questions (ß = −0.66,SE = 0.23, z = −2.81,p < .001,p adj < .05).
In third position (first syllable of object noun), there were no main effects of illocution type on glottalized VQ (p = p adj > .1)or modal VQ (p = p adj > .1).Overall, there was a higher occurrence of glottalized voice in RQs as compared to ISQs, and more ISQs than RQs were realized with modal VQ.
Finally, at the offset of the utterance, all three VQs occurred with frequency high enough to allow for statistical analysis.For breathy VQ, no interaction between illocution type and question type was observed (p = p adj > .7)and there was no effect of question type (p = p adj < .3).There was a main effect of illocution type: RQs showed significantly more occurrences of breathiness than ISQs (ß = 1.88,SE = 0.22, z = 8.65, p = p adj < .001).There was an interaction between illocution type and question type for modal VQ (ß = 0.63, SE = 0.22, z = 2.9, p < .01,p adj < .05).In polar questions, ISQs were more often realized with final glottalized VQ than  RQs.Conversely, glottalized VQ was more frequent in wh-RQs than in wh-ISQs.There were no main effects (p > .3).An interaction between illocution type and question type was also observed for modal VQ (ß = 1.09,SE = 0.29, z = 3.8, p = p adj < .001),suggesting a stronger difference for modal VQ in wh-questions than in polar questions in this position.A main effect of illocution type was observed such that RQs were significantly less often realized with modal VQ than ISQs (ß = −3.04,SE = 0.13, z = −5.6,p = p adj < .001).
Note that two non-modal VQs may occur on different syllables of the object noun.Specifically, of all items with breathy VQ at the offset, 17.2% were glottalized on the first syllable of the noun (78.5% modal, 4.3% breathy), with generally more glottal VQ in RQs (see above).
Figures 2 and 3 illustrate sentence-final breathy VQ and sentence-final glottalized VQ, respectively, in wh-RQs.Figure 4 shows sentence-final modal VQ in a wh-ISQ.

Discussion
The analysis reveals that RQs in Icelandic differ from ISQs in terms of VQ and speaking rate/duration.First, RQs are generally longer than ISQs, as well as realized with a slower speaking rate.This is in line with results for other languages (see Section 1 above), suggesting that temporal cues are used cross-linguistically to distinguish between RQs and ISQs in prosody.A reviewer suggests that RQs may come with paralinguistic attitudes such as anger, exasperation or scornfulness, of which the slower speaking rate would be a correlate, rather than of the rhetorical meaning.Neitsch (2018) compared RQs with and without strong speaker attitudes, showing that both types of RQs exhibit the same prosodic parameters, among them longer durations than ISQs, with stronger magnitude for RQs with strong speaker attitudes.
Second, like German and English, Icelandic makes use of breathy VQ in the production of RQs.However, there are also differences between the languages.In German, both polar and wh-RQs often have breathy voice in sentence-initial position (Braun et al. 2019).In English, only wh-questions show differences in VQ (breathy voice in initial position in wh-RQs; Dehé & Braun 2020b).In Icelandic RQs, initial breathiness is rare.Instead, breathy voice mainly occurs in utterance-final position.We interpret this positional difference between German and English on the one hand and Icelandic on the other as an interaction between VQ and intonation.In German and English, boundary tones distinguish between utterance types (e.g.questions vs. statements; see von Essen 1964, Brinkmann & Benzm Ĵller 1999 for German; Bartels 1999 for English).This is not the case in Icelandic, where the default boundary tone for all utterance types is L%, including both polar and wh-questions (Árnason 2005, 2011;Dehé & Braun 2020a).Moreover, in German and English, polar RQs are distinguished from polar ISQs by means of boundary tones (high rising boundary tone in ISQs vs. high plateau in RQs; Braun et al. 2019 for German, Dehé & Braun 2020b for English), which is not the case in Icelandic (Dehé & Braun 2020a).In German, boundary tones also distinguish between wh-ISQs and wh-RQs (mandatory fall in wh-RQs, high number of rising movements in wh-ISQs, Braun et al. 2019).While this is not the case in English (L% in both RQs and ISQs), constituent duration steps in, with longer duration of the final object in wh-RQs than ISQs (Dehé & Braun 2020b).In Icelandic, despite the general availability of the high boundary tone (H%) to mark special aspects of meaning (Árnason 2005, 2011), this cue is not used for the expression of rhetorical meaning.Both ISQs and RQs end in L% (Dehé & Braun 2020a).It is therefore conceivable that Icelandic speakers exploit the manipulation of VQ in final position as a compensation strategy.As an apparent general boundary marker, used in both ISQs and RQs to a considerable extent, glottalized voice is not the preferred option to signal rhetorical meaning.Instead, breathy voice marks the offset of RQs.Breathiness has also been found at the terminus of polar questions in African languages of the Gur family.Interestingly, polar questions in those languages typically end in a fall, too (Rialland 2009).This further supports the assumption that final VQ may replace boundary tones as a cue to illocution type.Future research will show whether Icelandic also makes use of VQ as a cue to pragmatic meaning in other utterance types (e.g.exclamatives, see Wochner 2021 for German) or in statements with specific pragmatic connotations (e.g.emphasis, see Niebuhr et al. 2010, or obviousness, see Wochner 2021).Generally speaking, Icelandic fits in with previous studies showing that non-modal VQ is crosslinguistically used as a cue to rhetorical meaning in questions, although the particular ways in which non-modal VQ is used is language-specific.

Figure 1 .
Figure 1.VQ at four positions in wh-questions (top) and polar questions (bottom).