The acoustic characteristics of implosive and plosive bilabials in Shimaore

Implosive consonants have drawn the attention of researchers over time, partially due to their relative rarity in the world’s languages, and partially due to their unique ingressive air flow. This sound category has varying complex features from an articulatory and acoustic perspective. This study explores the sound category by analyzing the acoustic features of a language whose implosives have yet to be acoustically considered: Shimaore, a Bantu-Sabaki language spoken in Mayotte. Specifically, it compares the bilabial plosive /b/ and the bilabial implosive /ɓ/ in terms of Voice Onset Time (VOT), fundamental frequency, amplitude, and voice quality via H1*–H2*, harmonics-to-noise ratios (HNR), and cepstral peak prominence (CPP). Analyses show that VOT is shorter for implosives than for plosives. At oral closure and vowel onset, amplitude and f0 are higher than plosives. H1*–H2* values read alongside HNR and CPP values suggest that implosives in Shimaore have glottal constriction. Some individual differences are explored for question of variation in implosives. Implications regarding sociophonetic studies in Mayotte as well as general implications for implosives are discussed.


Introduction
Implosive consonants have drawn the attention of researchers over time, partially due to their relative rarity in the world's languages, and partially due to their unique acoustic properties.These consonants have been described as 'problematic' (McLaughlin 2005: 201), due to challenges in describing the articulatory and acoustic properties involved.In fact, this category of sounds consists of several subgroups with various distinguishing properties, including at times a lack of ingressive air flow, once thought to be a criterial dimension (Ladefoged 1971).There appears to be an array of variation across languages for implosives (Lindau 1984).Considering these observations, studies are needed that look at how these sounds are realized in specific languages and dialects.
One language whose implosives have been understudied is Shimaore, a Bantu-Sabaki language variety spoken on the island of Mayotte in the Mozambique Channel.In descending order of frequency,1 Shimaore has bilabial /∫/, alveolar /Î/ and palatal [ê] voiced implosives, the first two in phonemic contrast with their plosive counterparts /b/ and /d/, respectively.Only a handful of contrastive pairs exist, such as /bibi/ 'insect' and /∫i∫i/ 'young girl', and all are bilabial.For the alveolar sounds, /dipe/ is 'bread' (from the French phrase du pain) whereas /Îago/ means 'city' and 'home'.While rare, the palatal implosive [ê] has been documented as a sound of the language2 (Blanchy 1996, Association SHIME 2016), even though other language documentation texts only mention bilabial and alveolar implosives (Rombi 1983, Johansen Alnet 2009).That is, the palatal is mostly observed with the verb 'to eat' and derivations of it, including [uêa], [ula], or [u(a] 3 .What is more, documentation of these sounds varies, and it appears that realization may differ by speaker, village, and context.It is not clear what role, if any, contact with French and Kibushi -the island's other principal languages, which lack implosives -has on variation in sound production. A close acoustic analysis of implosive sounds is needed, particularly for bilabials, which are the most common not only in Shimaore but also in many other Bantu languages (Greenberg 1970, Maddieson 2003).Indeed, bilabial implosives most often contrast with plosives.The purpose of this study is to acoustically describe the bilabial implosives in Shimaore and their plosive counterparts.The study is guided by two questions: How do voiced bilabial implosives and voiced bilabial plosives differ in terms of VOT, fundamental frequency, amplitude and voice quality, via H1 * -H2 * , HNR and CPP values?What are some qualitative individual differences between the two sound categories, /∫/ and /b/?The first question is to better understand which acoustic properties are the most informative for implosives in Shimaore.The second question is asked in order to explore possible variation.

An overview on implosives
Implosives are rare in the world's sound systems.Estimates vary from 10% (Ladefoged & Maddieson 1996) to 13% (Maddieson 2013), to 20% (Clements & Osu 2002) and to 23% (Cun 2009).A general understanding is that implosives are 'stops that are produced with a greater than average amount of the lowering of the larynx during the time that the oral closure for the stop is maintained' (Ladefoged & Maddieson 1996: 82).Note that the 'greater than average' caveat comes from the fact that for voiced stops in some languages, such as Maidu, Thai and Zulu, the larynx lowers slightly during sustained vocal fold vibration (Ladefoged & Maddieson 1996: 51, 78).Besides larynx lowing, ingressive airflow is a common property of implosives, like in Sindhi, but it is possible to have little to no ingressive airflow, as has been observed in Hausa (Nihalani 1986).Ingressive airflow is due to the negative pressure that is created during oral closure when the larynx lowers (see Cun 2009 for a more recent analysis of airflow).In fact, Catford (1939) describes implosives as 'suction stops'.Researchers agree that implosives do not have egressive airflow (Lex 1994, as cited in Clements & Osu 2002: 304), which may be why they are sometimes called 'nonexplosive stops' (Clements & Osu 2002).
While these definitions may suffice for many languages, they are also problematic (Clements & Osu 2002) because they are based on Catford's (1939) foundational argument that implosives have four distinguishing components: glottal closure, larynx lowering, rarefaction, and implosive release.While glottal closure is observed for implosives, implosives with modal voice also exist (Ladefoged & Maddieson 1996).As discussed above, larynx lowering and rarefaction are not always found in implosives.For this reason, Clements & Osu (2002: 10) argue that implosives should be considered nonobstruent stops, as the property 'distinguishing implosives from plosives is the ABSENCE OF AIR PRESSURE BUILDUP IN THE ORAL CAVITY' (emphasis added).Ashby (1990) argues that clarification needs to be made regarding implosives as an auditory class, which could help move beyond the problem of variation in sound production.Nevertheless, given these considerations, it is reasonable to define implosives as nonexplosive stops that tend to involve larynx lowering.
There are various other aspects to take into consideration.For one, the most common implosive is the bilabial, and implosives become rarer the more posterior the place of articulation is (Greenberg 1970, Ladefoged & Maddieson 1996).This can be seen for example in Xhosa, which only has /∫/.As already mentioned, the bilabial is the most common implosive in Shimaore.In addition, voiced implosives tend to increase in amplitude during oral closure (Lindau 1984, Demolin 1995) as well as shorter negative VOT when compared to obstruent stops (Hussain 2018).Hainan Min (Wenchang Hainanese) has implosives that increase in amplitude during oral closure (Cun 2009).Voiceless implosives (with full glottal closure) are observed in some languages such as Owerri Igbo (Ladefoged, Kemeny & Brackenridge 1976) and Seerer-Siin (McLaughlin 2005).Furthermore, Ladefoged & Maddieson (1996: 89) hypothesize that auditory characteristics of implosives may be due to effects on formant frequencies brought on by larynx lowering.
Furthermore, laryngeal settings (voice quality) for implosives vary, as some languages have modal voicing (Xhosa) and others creaky (Hausa), such that the degree of glottal constriction varies.Indeed, creaky phonation has been noted to co-occur with implosives in several languages (Ladefoged & Maddieson 1996, Gordon & Ladefoged 2001).However, it is argued that most Bantu languages do not have creaky phonation or glottal constriction (Maddieson & Sands 2019).In addition, it has been claimed that some implosive sounds do not include larynx lowering but create cavity space by cheek expansion, as in the case of Gyeli (Grimm 2019).The use of cheek expansion has also been noted in a Hendo unexploded palatal implosive (Demolin, Ngonga-Ke-Mbembe & Soquet 2002).Some languages contrast these sounds according to free variation such as with Bajele (Renaud 1976), the following vowel type, as seen in Mpiemo (Thornell & Nagano-Madsen 2004), or word-initial position, as seen in Bekwel (Cheucle 2014, as cited in Grimm 2019: 143).
Thus, these sounds vary across languages around the world in terms of their phonetic and phonemic characteristics, even if they tend to be voiced (McLaughlin 2005).Defining and identifying implosives can therefore prove to be challenging at times due to these various articulatory and acoustic realizations (Grimm 2019).In fact, implosive sounds of languages spoken in the same region can have varying differences in acoustic properties (Lindau 1984).We can see how distinguishing between plosives and implosives in this sense can be at times difficult, and that there is indeed 'a gradient between one form of voiced plosive and what may be called a true implosive' (Ladefoged & Maddieson 1996: 82).Given the possibility that implosives may exist on a gradient, one goal of this study is to better understand where along this gradient certain sound pairs exist and to understand their phonetic properties via acoustic analyses.Specifically, Voice Onset Time, fundamental frequency, amplitude and spectral tilt and noise measures are analyzed in order to better understand this sound category for bilabials in Shimaore.

Acoustic measurements for implosives
Various measures exist for identifying implosives and plosives.Recent research stress the need to analyze multiple acoustic properties, as contrasts appear to be distinguished by a combination of parameters, for example, not just Voice Onset Time (VOT), since fundamental frequency and spectral tilt of the following vowel may differ according to consonant type (Kirby 2018, Cho, Whalen & Docherty 2019).In fact, all three phonetic properties have been used to effectively analyze stops in various places of articulation, including bilabial and dental, in Burushaski (Hussain 2021).Since we consider implosives to be nonexplosive stops, it may be that a combination of correlates is employed for contrast discrimination in bilabial (im)plosives.For spectral tilt, in the case of implosives, looking at laryngeal settings via H1 * -H2 * alongside harmonics-to-noise ratio (HNR) and cepstral peak performance (CPP) (itself a measure of HNR) may prove useful for analyzing voice quality and establishing the co-occurrence of creaky voice or the presence of glottal constriction.The HNR of < 500 Hz was used because as a measure of low-frequency noise, it is most closely related to irregular voicing.Each measure in addition to amplitude as related to implosives is discussed below.

Voice Onset Time (VOT)
First, VOT is a measurement used with stop consonants for measuring the time between the release of the stop and the start of voicing.Voiced consonants have negative VOT since the vocal folds start vibrating before the release.This measurement is an acoustic staple for identifying contrast (Lisker & Abramson 1964, Cho et al. 2019) in laryngeal settings, including implosives (Hussain 2018).Implosives tend to have short −VOT, with a continuous build-up of amplitude, whereas plosives have longer −VOT and a decrease in amplitude before the release (Demolin 1995, Ladefoged & Maddieson 1996, Hussain 2018).This is the case for Sindhi and other Indo-Iranian languages (Nihalani 1986, Hussain 2018).One smallscale study found that Swahili implosives have shorter VOT measures than plosives (Coburn & Hjortnaes 2019).However, this may not be the case in other languages.For Mpiemo, plosives tend to have shorter VOT than implosives, but this could be due to speaker variation and speaking rate (Nagano-Madsen & Thornell 2012).Qualitative analyses of Zulu suggest shorter VOT for the plosive /b/ than for the implosive /∫/ (Naidoo 2010).In addition, speech rate may factor into VOT, with Zulu implosives having the shortest VOT when occurring in rapid speech (Midtlyng 2011).It is yet to be understood how VOT functions for implosives and stops in Shimaore.

Fundamental frequency (f0) and CF0
Besides VOT, fundamental frequency is an important measure for looking at implosive consonant dimensions, both during oral closure and in the following vowel.Focus has been made on the latter as it is thought to be a perceptual cue for the preceding consonant (Wright & Shryock 1993).This is also called Consonant Induced f0 (CF0) since the fundamental frequency of the following vowel is influenced by the preceding consonant (Kirby 2018, Cho et al. 2019).In addition, during oral closure, implosives may have higher f0 compared to voiced plosives due to the rapid lowering of the larynx (Ohala 1976, as cited by Demolin 1995: 382;Hombert, Ohala & Ewan 1979).For the following vowel, Painter (1978) observed that implosives tend to result in overall higher pitch throughout the vowel duration.Furthermore, for SiSwati, Wright & Shryock (1993) found vowel f0 following a voiced bilabial implosive to be higher than when following a voiced sonorant.It has been argued that vowels following implosives have pitch similar to voiceless stops (Hombert 1978).CF0 also appears to be higher for vowels following implosives versus voiced plosives for Mpiemo (Nagano-Madsen & Thornell 2012).It may also be true that f0 values during the oral closure of implosives remain higher than their plosive counterparts.This is yet to be confirmed for Shimaore.

Spectral tilt and amplitude
Measures beyond VOT and f0 are important when considering implosives.Spectral tilt is a relative term used to describe measurements identifying different phonation types and degree of glottal constriction, such as breathy, modal and creaky voice (Jackson et al. 1986).It is defined as 'the degree to which intensity drops off as frequency increases' (Gordon & Ladefoged 2001: 15).That is, in its essence, it is used to measure 'whether higher-frequency parts of a spectrum show much lower amplitude than lower frequency parts, only slightly lower amplitude or even greater amplitude' (Thomas 2011: 231).It is an amplitude ratio of the lowest-frequency harmonic (f0, H1) to other ones including H2 (second harmonic amplitude), A1 (first formant amplitude), and A2 (second formant amplitude) (Jackson et al. 1985, Thomas 2011, Khan 2012).For implosives and the potential co-occurrence of creaky voice, H1-H2 (the amplitude of the first harmonic, H1, minus the amplitude of the second harmonic, H2) is the most interesting.This value tends to be lower for creaky voice with higher glottal constriction, and higher for breathy voice, with modal voice falling somewhere in between the two phonation types (Garellek 2019).As a reminder, the phonation scale in order of increasing glottal constriction is breathy, slack, modal, creaky and glottalized.Certainly, creaky phonation is a term that in fact encompasses a wide range of phenomena, such as vocal fry, laryngealization, glottalization, but in general refers to a sound with higher than modal glottal constriction (Garellek 2019, Davidson 2020).Jessen & Roux (2002) found lower H1 * -H2 *4 of the following vowel for implosives, suggesting that they are not breathy voiced.This may be expected since breathy voice involves egressive airflow.
In addition, HNR measures allow for further clarification of phonation when compared alongside spectral tilt measures, as more glottal constriction results in low HNR values due to the presence of noise (Garellek 2019).This same conclusion can be found for using CPP values in which low values correspond to non-modal phonation or more noise (Seyfarth & Garellek 2018).The interest in this study is to determine if implosives have a more constricted laryngeal setting and voice quality than plosives.In addition, some studies look at amplitude in and of itself for addressing degree of implosiveness (Naidoo 2010, Nagano-Madsen & Thornell 2012, Coburn & Hjortnaes 2019, Grimm 2019).Amplitude during oral closure and the following vowel may function as a distinguishing acoustic property for implosives, but little to nothing is known about spectral tilt or amplitude concerning implosives in Shimaore.

Shimaore and Mayotte's linguistic landscape
As for the language in question, Shimaore is a Northeast Coast Bantu language of Sabaki variety (Guthrie 1948 classification G44d) belonging to the Comorian language group, of which there are four, one for each island of the Comorian Archipelago (Nurse & Hinnebusch 1993, Patin, Mohamed-Soyir & Kisseberth 2019), as illustrated in Figure 1.The language group subdivides into the western group of Shindgazija and Shimwali and the eastern group of Shindzuani and Shimaore (Ahmed-Chamanga 2017).The two groups are -for the most part -not mutually intelligible.Inhabitants of Mayotte, the furthest island from the African coast, speak Shimaore.Most recent estimates suggest 41% of the inhabitants speak Shimaore and 31% speak a variety of the three Comorian dialects Shingazidja, Shimwali and Shindzuani (Insee Mayotte Infos 2014), this last dialect having the largest percentage due to high immigration rates from Anjouan, the Shindzuani-speaking island located only 60 kilometers from Mayotte.Shindzuani and Shimaore are mutually intelligible.Modest estimates are around 190,000-220,000 speakers of Shimaore or a Shindzuani-influenced Shimaore in Mayotte.That is, when referring to the Bantu language spoken in Mayotte, this can be in reference to a variety that is influenced by Shindzuani.As seen in Figure 2, certain villages, such as those in the North and East of the island, speak a variety closer to Shindzuani, whereas western and southern villages speak a variety often described as the 'pure' Shimaore due to its isolation from contact with other island varieties.Language variation is often village specific, as can be seen in the northwest of the island with Mtsamboro and Hamjago, which are separated by only a few minutes on foot, but which speak two mutually unintelligible languages, Shimaore and Kibushi Kisakalava, respectively.Dense urban areas on the northeast part of the island such as Mamoudzou, prove to be complex in terms of variation and language contact, such that identifying them just as influenced from Shindzuani may be limiting.Furthermore, calling Shimaore its own language versus a Comorian dialect is as much a political choice as a linguistic one, but this discussion goes beyond the scope of this study.
The Shimaore (and arguably Shindzuani) consonants and vowels are provided in Table 1.In parentheses are sounds from Arabic, which some argue are part of the Shimaore/Shindzuani sound system (Ahmed-Chamanga 2017), but which speakers rarely  Rombi's (1983) and the present author's observations.Shimaore has prenasalized consonants for the plosives, affricates and implosives as well as the labial-dental fricatives.It has been observed that implosives become plosives in post-nasal environments (Nurse & Hinnenbusch 1993).In addition, variation appears to be present in the realization of prenasalized consonants, some of which may be due to regional or dialectal differences.For example, /ntsi/ and /tsi/5 are both acceptable for saying 'earth' or 'country', with some locals arguing that /ntsi/ belongs to the dialects originating from Anjouan (Nurse & Hinnenbusch 1993).Stress in Shimaore is penultimate (Rombi 1983, Philippson 1988, Ahmed-Chamanga 2017), while tones are absent (Rombi 1983).
Concerning (im)plosives, Bantu terms in Shimaore retain their implosive sounds.Like Swahili, Shimaore contains many loanwords from Arabic, adapted to the local phonology, such that the voiced bilabial implosive is privileged over the voiced bilabial plosive (Rombi 1983).For example, the word for 'love' is mahaáa, from the Arabic word mahabbah.That is, bilabial plosives from Arabic loanwords become implosive whereas those from French loanwords remain plosive (as well as loanwords from other languages not having implosives such as Malagasy), presumably because they were introduced into the language more recently.This appears to be in contrast to Shingazidja, where Arabic loanwords maintain their plosive quality (Patin et al. 2019).That is, literature on Shimaore (Rombi 1983, Johansen Alnet 2009) and Shingazidja (Rombi 1989, Patin et al. 2019) and observations on the status of implosives show that their realization varies.At the very least, further inquiry into these sounds is needed, particularly prior to conducting research that investigates sociophonetic variation.Considering Swahili, a recent study comparing voiced stops showed individual variation, such that some speakers produced stops as implosives, whereas others produced plosives, thus implicating the existence of variation for these sounds (Coburn & Hjortnaes 2019).
Among other languages in Mayotte, Kibushi is spoken by about 15% of the inhabitants (Jamet 2016).It is an Austronesian language closely related to Sakalava Malagasy.There are two main varieties, Kibushi Kisakalava and Kibushi Kiantalautsi, the latter being spoken by only a few thousand people.The French language has an increasing presence in Mayotte, a French department since 2011 whose schooling and administrative activities occur exclusively in French from an official standpoint.Younger generations of Maore (people from Mayotte) and migrants speak French, although fifteen years ago, more than a third of inhabitants over 14 years of age reported not speaking any French (Insee 2007).Even if more recent statistics are not available, with the departmentalization and the influence of the national education system, this number is arguably in decline and may continue to decrease in the coming generations.Furthermore, studies suggest that locals prioritize passing on French to their children, as they believe it is the language of the future for the island (Laroussi 2015; Mori 2018Mori , 2021)).Thus, the influence of French on the local languages is a strong possibility.Neither French nor Kibushi have implosive sounds natively, although some varieties of Kibushi have innovated them, presumably as a result of contact with Shimaore.

Participants
The analyzed recordings come from 28 speakers (seven male, 21 female) aged 16-47 years (mean age = 25 years, standard deviation = 8).All participants were bilingual French/Shimaore speakers, and we exchanged in French during the experiments.Due to the voluntary participation and the nature of the fieldwork, sex and age were not controlled for.

Materials
Words examined in this study come from a larger wordlist of 50 words in Shimaore with corresponding images (see Appendix for the wordlist and sample images).Drawings were used due to the oral nature of the language and difficulties encountered when asking participants to read words in Shimaore using the Latin alphabet.When possible, images from the International Picture Naming Project were used (see Bates et al. 2000).The recordings of 15 words with /b/ or /∫/ were analyzed, as shown in Table 2. Bilabials were chosen because they most commonly exhibit the contrast between implosives and non-implosives.In order to control for factors such as stress and phonetic environment, only words in word initial, stressed position were analyzed.For words containing two instances of a target consonant, such as bibi 'insect', only the first consonant was analyzed.Note that speakers produce implosives in stressed intervocalic positions, such as maáawa 'wings'.However, (im)plosives sometime undergo devoicing when in an unstressed, word-final position with the closed back vowel /u/, regardless of loanword origin such as with latabu from French la table 'the table' and dahaáu from Arabic dahab 'gold'.Loanword status is included as this may be involved in variation, as discussed in Section 1.1 above, though it is beyond the scope of the paper to discuss if these words are recent loanwords or fully integrated vocabulary.Note that six out of 15 words on the list are loanwords from French, all of which contain the non-implosive /b/.As mentioned above, this is not uncommon for the Shimaore vocabulary.

Procedure
Using a modified sociolinguistic interview method (image list, cartoon strips, storytelling), participants were recorded in a 1m 2 soundproof booth on the Centre universitaire de formation et de recherche de Mayotte university campus using an Apex 435B condenser microphone, Presonus audiobox and Clarin's SpeechRecorder software (Draxler & Jänsch 2014) with a sampling rate of 44,100 Hz.Each word was pronounced twice except for participants 93F22, 94F21, 95F22, 96F20, and 98F22, who pronounced the word only once (for naming, 'F' means female and the following number is the age of the participant).

Annotation and analysis
Recordings were annotated to TextGrids in Praat (Boersma & Weenink 2019).Voiced (im)plosives were segmented from the onset of vocal fold vibration (voiced closure) to the start of the release burst (see e.g. Figure 3 in Hussain 2018).For example, for the word /»∫i∫i/, there were four boundaries placed for segmentation, the first at the onset of the vocal fold vibration, the second at the start of the release burst, the third at the end of /i/, the fourth at the release burst of the second /∫/, and the fourth at the end of the second vowel (see for example Figure 6).VOT and f0 were analyzed using Praat.PraatSauce (Kirby 2020) was used to analyze HNR (0-500 Hz), CPP and H1 * -H2 * (values discussed are the corrected values, indicated by * ).In order to adjust for individual differences, including male versus female voices, frequency was converted from hertz to semitones for each participant, using each participant's f0 averages (12 * (log((zz)/100))/log(2)). Statistical analyses and figures were created using R (R Core Team 2017), such as for means (mean) and standard deviation (SD).Smoothing Spline Analysis of Variance (SSANOVA; Gu 2002) was used for looking at f0, amplitude, and H1 * -H2 * using Gu's (2021) 'gss' package in R (Gu 2021).Modeling Gaussian regression, the SSANOVA (ssanova) model is similar to linear regression functions (lm) in R. Because data subsets were non-normally distributed, nonparametric methods were used, specifically the Kruskal-Wallis χ 2 test (kruskal.test).Spectrograms and waveforms were made using the phonTools package (Barreda 2015) on R.

VOT
Overall, VOT values are significantly different for the two consonant types (Kruskal-Wallis χ 2 = 4253.3,df = 1, p < .001),with average implosives having a VOT of −57.9 ms (SD 22.67) and plosives having an average VOT of −105.55 ms (SD 33.32).Plosive VOT values were found to be just under twice as long as the implosives VOT values.Figure 3 shows boxplots of VOT by word with visible intra-and inter-word variation.We can see that durations are similar for certain words, such as bibiro 'bottle' and áa 'kick'.Some words have a wider range of duration values, such as áao 'plank/board' and bibi 'insect', whereas others have a narrower range, such as áaáa 'father'.VOT differences are consistent with previous research on VOT in implosives (Ladefoged & Maddieson 1996).Figures 4-7 show spectrograms and waveforms for two participants' production of áiái 'young girl' and bibi 'insect', chosen for their typicality of what was observed.Note that the implosive /∫/ is indicated with 'bb' in graphs and figures.Figures 4 and 5 are for participant 06M23 and Figures 6 and 7 are for participant 12F34.As can be seen VOT are much shorter for word initial implosives than plosives.

Amplitude during oral closure and vowel onset
As for amplitude, Figure 8 shows a smoothing spline ANOVA with 95% Bayesian confidence intervals (indicated by the dotted lines) for amplitude during the last 100 ms of oral closure and the first 100 ms of the following vowel.For implosives, amplitude continuously increases during oral closure and continues to increase until about 25 ms after the release.For plosives, amplitude begins to increase about 20 ms before the release and continues to rise until around 50 ms into the vowel.At the release, average amplitude for implosives is 56.69 Db (SD 10.62), and average amplitude for plosives, 51.95 Db (SD 11.00).This difference of about 5 Db is statistically significant (Kruskal-Wallis χ 2 = 410.58,df = 1, p < .01).Figures 4    and 6 show this amplitude increase during oral closure of implosives and Figures 5 and 7 demonstrate steady or diminishing amplitude for plosives.

F0 during oral closure and vowel onset
As for f0, Figure 9 shows a smoothing spline ANOVA for pitch in semitones during the last 100 ms of oral closure and the first 100 ms of the following vowel.A noticeable difference is observed during oral closure of implosives in that the f0 is higher than plosives.F0 decreases on release and starts to stabilize around 20 ms into the vowel.For plosives, pitch is low during oral closure but starts to sharply rise 25 ms before release at which it continues to rise, but at a slower pace.The difference in pitch for the first 20 ms of the vowel is statistically significant (Kruskal-Wallis rank sum test χ 2 = 42.22,df = 1, p < .01).These findings seem to support theories that the downward movement of the larynx accompanied with glottal constriction during oral closure of implosives results in a higher pitch during closure (Ohala 1976, Hombert, Ohala & Ewan 1979).
3.1.4Voice Quality via H1 * -H2 * , HNR and CPP For H1 * -H2 * values, averages for the first 20 ms of the vowel are significantly different per consonant type: the implosive mean is 8.10 Db (SD 6.60) and the plosive mean is 11.15 Db (SD 7.11) (Kruskal-Wallis rank sum test χ 2 = 137.087,df = 3, p < .01). Figure 10 shows a SSANOVA of the first 100 ms of the vowel.As we can see, H1 * -H2 * values for the two consonant types run parallel to each other much of the time, with plosive scores remaining higher.Table 3 shows the corresponding Harmonics-to-Noise (HNR) ratio and cepstral peak prominence (CPP) values.As can be seen, the difference between the two sound groups in terms of noise is minimal.Considering these two phonation parameters (H1 * -H2 * and the HNR/CPP values), it appears that the implosives do have glottal constriction (Garellek 2019).While the constriction is not strong enough to induce irregularity in the noise measures (HNR and CPP), it is strong enough to affect spectral tilt, specifically H1 * -H2 * values.Thus, the bilabial implosive in Shimaore appears to move away from the category of modal phonation, a finding which does not support Maddieson & Sands' (2019: 28) claim that Bantu implosives tend to lack glottal constriction, having vocal folds in 'the normal position for voicing'.That  a HNR-05 = Harmonics-to-Noise ratio at 0-500 Hz (Hillenbrand et al. 1994).
is, the implosive bilabial has more glottal constriction that the plosive but not to the point to qualify it as creaky phonation.

Individual differences
Looking at individual differences can shed light not only on variation but also on the question of implosives existing on a gradient with plosives.Figures 11, 12 and 13 show VOT, amplitude and f0 values per participant, respectively.For some participants, VOT differences between consonants are quite different, such as participants 06M23 and 93F22, whereas for others the values are closer together, such as participants 30F21 and 05M21.In addition, some have varying VOT lengths (participant 99F23) whereas VOT values for others fall within a short range (21F23).A general observation of Figure 12 is that amplitude values vary among participants, but that many have higher amplitudes at least 10 ms before release.
In addition, male speakers tend to have higher amplitude values than female speakers.For f0, we can see in Figure 13 that f0 during oral closure is higher for many but not all participants.Some, like 30F21 and 99F23, have similar f0 curves during closure.While it is beyond the scope of the paper to analyze each participant, it is worthwhile to look at a couple of them.First, there is participant 06M23, whose production of (im)plosives parallels overall findings in Section 3.1 above.Indeed, this person produces implosives in ways that reflect the averages in Section 3.1.VOT duration is markedly different, where implosive VOT is −41.96ms (SD = 17.82) and plosive VOT is −112.25 ms (SD = 32.15).Amplitude values are similar to the overall average, in that there is an increase during oral closure and during the first 20 ms of the vowel.At the release, implosive amplitude average is 71.55 Db (SD = 3.88) and for plosives, it is 68.30Db (SD = 3.07), a difference that is slightly less than overall averages.Like Figure 9, the participant's f0 curve shows higher values during closure and at the beginning of the vowel, before lowering.
In contrast, participant 30F21 stands out because some of her readings are different than those discussed in Section 3.1.First, VOT is only minimally different: implosive average = −67.43ms (SD 22.03) and plosive average = −70.48ms (SD 26.27).Compared to VOT averages in Section 3.1.1,we see the participant tends to have much shorter VOT for plosives and somewhat longer VOT for implosives.Concerning amplitude, 30F21's SSANOVA curves resemble those of Section 3.1.2except that amplitude increases much more for plosives around 25 ms into the vowel.At the release, the implosive amplitude average is 49.73 Db (SD 2.70) and the plosive amplitude average is 45.24 Db (SD = 3.79), which is similar to the averages in Section 3.1.2.For f0, as can be seen in Figure 13, the main difference is during oral closure, where the implosive has only a slightly higher pitch than the plosive.This is a departure from Section 3.1.3,where f0 is elevated throughout oral closure.For this participant, it is possible that the larynx is quickly lowered just before release, which could explain the lower f0 values.This participant's differences and the individual variation suggest that bilabial implosives are manifested in various ways in Shimaore and that there can be variation, regarding VOT, amplitude and f0.It is possible that individuals vary in the degree and timing of larynx lowering, which may influence the degree of oral pressure, oral closure f0, amplitude and VOT.
In addition, Figures 14 and Figure 15 show another observation concerning consonant release quality.Some participants produced bursts at the release of the implosive.A clear indicator of a burst-like activity is seen with participant 17F20 in Figure 14, whereas 02F46, as seen in Figure 15, does not appear to have a burst.In Figure 14, there is a visible depression in the waveform as well as a transient corresponding to the release in the spectrogram.Some studies suggest that there should be no visible burst for implosives (Clements & Osu 2002), but that does not appear to always be the case for this data and is consistent with observations of bursts in Zulu implosives (Naidoo 2010) and Hainan Min (Cun 2009).
Finally, it is worth considering which acoustic dimension may be the most salient for each participant in distinguishing implosives on the plosive-implosive cline.Looking individually at these correlates may reveal variation among speakers and give insight into the distinguishing role of the measurements analyzed above.This was done by calculating standardized differences via Cohen's d values between the implosive and plosive measurements for each dimension, allowing for a standardized comparison (see Seyfarth & Garellek 2018 for a thorough explanation).In conformity with the measurements in Section 3.1, f0, amplitude and spectral tilt measures were taken from the last 20 ms of oral closure through the first 20 ms of the vowel, for a total of 40 ms. Figure 16 illustrates the standardized differences for the five acoustic dimensions analyzed.As can be seen, VOT stands out as the property that separates these sound categories (average value = 2), followed by amplitude (average value = .82)and f0 (average value = .66).Looking at individual differences, it can be observed that for participant 18M16 amplitude and f0 are the more salient dimension for distinguishing the sound pairs, particularly amplitude.This is in contrast with participant 30F21, where the acoustic dimensions do not have much contrast for the two sounds.Notably, for participant 02F46, besides VOT, voice quality measures (H1 * -H2 * and CPP) appear to be more salient than amplitude or f0.For others, such as participant 28F22 and 29F21, VOT stands out as the most distinguishing measurement for differentiating between bilabial implosives and bilabial plosives.

General discussion
This study sought to explore bilabial implosives in an understudied Bantu language, Shimaore, via various acoustical analyses.First, /∫/ has significantly shorter VOT than /b/.For the implosive, amplitude and f0 are also higher during oral closure and at the beginning of the vowel.For voice quality, H1 * -H2 * is higher in plosives than in implosives, but there is little difference between the sound pairs for HNR or CPP values, which suggests the presence of glottal constriction.Individual differences were also observed regarding these acoustic correlates.Standardized differences between correlates show VOT as the principal factor for distinguishing the two sounds, but that amplitude and f0 are also factors to consider.
In terms of how these results compare to other languages, for VOT and amplitude during oral closure, Shimaore implosives appear to behave similarly to those in Hausa (Lindau 1984) and Sindhi (Nihalani 2006), with short VOT and gradual amplitude increase before release (Ladefoged & Maddieson 1996).Second, high f0 values during oral closure and at the beginning of the vowel are like Bantu Mpieme (Nagano-Madsen & Thornell 2012).These high f0 values during oral closure support the theory that larynx lowering induces higher pitch (Ohala 1976, Hombert, Ohala & Ewan 1979).Thus, it appears that, for some participants, pitch may be used in sound pair discrimination (Kirby 2018, Cho et al. 2019).Regarding phonation, results do not support the idea that bilabial implosives in Shimaore are accompanied by creaky voice, as seen in other languages (Ladefoged & Maddieson 1996, Gordon & Ladefoged 2001).That is, while H1 * -H2 * values are higher for implosives, which suggest non-modal phonation, HNR and CPP readings do not suggest creaky voice.It appears that Shimaore implosives have a phonation somewhere between modal and creaky voice when considering the glottal states.That is, the spectral tilt and noise values support the argument that bilabial implosives have more glottal constriction than plosives in Shimaore.
One question asked in this study concerns considering the argument that implosiveness should be considered on a gradient.If these sounds exist on a cline, then it is important to know how individuals distinguish these sound pairs.Whereas VOT stands out as the most important correlate, it appears that several acoustic dimensions factor into creating this distinction.Where VOT is less distinguishing, amplitude and f0 contribute to sound differentiation.That is, implosives with longer VOT values may still be perceived as implosives via higher amplitude and f0 values during the oral closure and the beginning of following vowel.Considering that f0 and amplitude differences are signs of larynx lowing, it appears that acoustic correlates of larynx lowering are important for distinguishing implosives from plosives in Shimaore.Likewise, for some participants, smaller differences between f0 and amplitude values suggest the realization of implosives that would put them closer to plosives on the implosive-plosive gradient in terms of larynx lowering.Further studies are needed to address the argument that implosives and plosives exist on a gradient, including how this might affect IPA conventions and the need for diacritic marks to indicate the degree of implosiveness.
Furthermore, while glottal constriction and voice quality appear to be less salient dimensions, they may still be important for some individuals, such as participants 02F46 and 95F22.Indeed, it is important to consider individual differences.For example, there is the possibility that some speakers may produce implosives that are much closer to plosives, as was the case for participant 30F21.This individual may in fact be in the process of losing this distinction, and the most salient dimension of loss may be the lack of distinction for VOT.It is speculative at best to discuss why this individual produces these sounds so similarly and further research is needed in order to become informed about the phenomenon.Nevertheless, the language contact factor comes to mind.
That is, since the departmentalization of the island, the French language is becoming more and more dominant in the Maore society, with children spending time in classrooms learning French from the age of three.Many Maore parents privilege French in the household, as they associate the language with academic and economic success.Indeed, it is the language used in administrative, educational and many economic contexts.This shift in language use could be a factor in the realization of implosives in Shimaore.Studies have shown a delay in developing implosive sounds in child language acquisition and in second language learning (Lewis & Roux 1999, Cissé, Demolin & Vallée 2011, Alerechi 2019).If learned late in childhood, it is possible that the realization of implosives may change in terms of larynx lowering, glottal constriction and/or VOT.Concerning plosives, studies have also shown that language contact situations lead to variation in VOT, for example with /b/ in the English (Scobbie 2006).This may be observed with Shimaore and the differentiation between plosive and implosives in terms of VOT.
Finally, these findings raise many sociophonetic questions, specifically concerning variation and language contact.Indeed, Cho et al. (2019) stress the need to better understand sociophonetic aspects of sounds.Beyond effects of language learning and language contact, it is possible that implosives and their markedness are implicated in language use and identity (Eckert 2012, Hay & Drager 2007).For example, speakers may associate the use of plosives rather than implosives in Shimaore as an indicator of being more French than Maore.Likewise, implosive use may index an identity limited in its connections to French, such that speakers may employ the sound when aligning themselves with a non-French local identity.Research is needed to explore variation in the realization of implosives on several levels, including the phonetic context, age, gender, village, SES (socioeconomic status), among other factors.Analyses of sound production in informal situations are needed, particularly considering that f0 production may vary by context, such as reading versus colloquial speech (Kirby 2014).Time and sociophonetic research may tell how these sounds change, remain stable and index meaning on the island.This is important not only for bilabial implosives, but also alveolar implosives, for which there is little to no research on their realization in various contexts.In addition, identification of which factors provoke the use of the palatal implosive is also for the moment unknown.Further studies are needed to understand the sociophonetic context of all implosives in the Bantu languages spoken in Mayotte.

Figure 1 (
Figure 1 (Colour online) Map of the Comorian Archipelago in the Mozambique Channel.

Figure 2 (
Figure 2 (Colour online) Approximate distribution of the local languages and their varieties in Mayotte.

Figure 9
Figure9F0 in semitones from 100 ms before to 100 ms after consonant release.

Figure 16
Figure 16Standardized differences between implosives and plosives per acoustic dimension for each participant.

Table 1
Consonants and vowels of Shimaore, with marginal consonant phonemes shown in parentheses.Grey areas indicate articulations judged impossible.

Table 2
Word list.
a In the figures /∫/ = bb and /b/ = b.b 'No' is for words presumed to have Bantu origins.