Skip to main content
×
Home
    • Aa
    • Aa

Advances in real-time magnetic resonance imaging of the vocal tract for speech science and technology research

  • Asterios Toutios (a1) and Shrikanth S. Narayanan (a1)
Abstract

Real-time magnetic resonance imaging (rtMRI) of the moving vocal tract during running speech production is an important emerging tool for speech production research providing dynamic information of a speaker's upper airway from the entire midsagittal plane or any other scan plane of interest. There have been several advances in the development of speech rtMRI and corresponding analysis tools, and their application to domains such as phonetics and phonological theory, articulatory modeling, and speaker characterization. An important recent development has been the open release of a database that includes speech rtMRI data from five male and five female speakers of American English each producing 460 phonetically balanced sentences. The purpose of the present paper is to give an overview and outlook of the advances in rtMRI as a tool for speech research and technology development.

  • View HTML
    • Send article to Kindle

      To send this article to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle.

      Note you can select to send to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

      Find out more about the Kindle Personal Document Service.

      Advances in real-time magnetic resonance imaging of the vocal tract for speech science and technology research
      Available formats
      ×
      Send article to Dropbox

      To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your Dropbox account. Find out more about sending content to Dropbox.

      Advances in real-time magnetic resonance imaging of the vocal tract for speech science and technology research
      Available formats
      ×
      Send article to Google Drive

      To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your Google Drive account. Find out more about sending content to Google Drive.

      Advances in real-time magnetic resonance imaging of the vocal tract for speech science and technology research
      Available formats
      ×
Copyright
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Corresponding author
Corresponding author: A. Toutios Email: toutios@usc.edu
Linked references
Hide All

This list contains references from the content that can be linked to their source. For a full set of references and notes please see the PDF or HTML where available.

[1] P.W. Schönle ; K. Gräbe ; P. Wenig ; J. Höhne ; J. Schrader ; B. Conrad : Electromagnetic articulography: use of alternating magnetic fields for tracking movements of multiple points inside and outside the vocal tract. Brain Lang., 31 (1) (1987), 2635.

[3] W. Hardcastle ; W. Jones ; C. Knight ; A. Trudgeon ; G. Calder : New developments in electropalatography: a state-of-the-art report. Clin. Linguistics Phonetics, 3 (1) (1989), 138.

[4] D. Whalen : The Haskins optically corrected ultrasound system (HOCUS). J. Speech, Lang. Hearing Res., 48 (3) (2005), 543553.

[5] M. Stone : Imaging and measurement of the vocal tract, in K. Brown (ed.), Encyclopedia of Language and Linguistics, Elsevier, Oxford, 2006, 526539.

[6] P. Delattre : Pharyngeal features in the consonants of Arabic, German, Spanish, French, and American English. Phonetica, 23 (1971), 129155.

[8] K. Munhall ; E. Vatikiotis Bateson ; Y. Tohkura : X-ray film database for speech research. J. Acoust. Soc. Am., 98 (2) (1995), 12221224.

[10] S. Giles ; K. Moll : Cinefluorographic study of selected allophones of English /l/. Phonetica, 31 (3–4) (1975), 206227.

[11] M. Stone Modeling the motion of the internal tongue from tagged cine-MRI images. J. Acoust. Soc. Am., 109 (6) (2001), 29742982.

[12] H. Takemoto ; K. Honda ; S. Masaki ; Y. Shimada ; I. Fujimoto : Measurement of temporal changes in vocal tract area function from 3D cine-MRI data. J. Acoust. Soc. Am., 119 (2) (2006), 10371049.

[13] S. Narayanan ; K. Nayak ; S. Lee ; A. Sethy ; D. Byrd : An approach to real-time magnetic resonance imaging for speech production. J. Acoust. Soc. Am., 115 (4) (2004), 17711776.

[14] E. Bresch ; Y.-C. Kim ; K. Nayak ; D. Byrd ; S. Narayanan : Seeing speech: capturing vocal tract shaping using real-time magnetic resonance imaging [Exploratory DSP]. IEEE Signal Process. Mag., 25 (3) (2008), 123132.

[18] D. Byrd ; S. Tobin ; E. Bresch ; S. Narayanan : Timing effects of syllable structure and stress on nasals: a real-time MRI examination. J. Phonetics, 37 (1) (2009), 97110.

[22] M. Uecker ; S. Zhang ; D. Voit ; A. Karaus ; K.-D. Merboldt ; J. Frahm : Real-time MRI at a resolution of 20 ms. NMR Biomed., 23 (8) (2010), 986994.

[23] A. Scott ; R. Boubertakh ; M. Birch ; M. Miquel : Towards clinical assessment of velopharyngeal closure using MRI: evaluation of real-time MRI sequences at 1.5 and 3 T. The Br. J. Radiol., 85 (1019) (2012), e1083e1092.

[24] S. Zhang ; A. Olthoff ; J. Frahm : Real-time magnetic resonance imaging of normal swallowing. J. Magn. Reson. Imaging, 35 (6) (2012), 13721379.

[25] A. Niebergall : Real-time MRI of speaking at a resolution of 33 ms: undersampled radial FLASH with nonlinear inverse reconstruction. Magn. Reson. Med., 69 (2) (2013), 477485.

[26] A.D. Scott ; M. Wylezinska ; M.J. Birch ; M.E. Miquel : Speech MRI: morphology and function. Phys. Med., 30 (6) (2014), 604618.

[28] S. Narayanan : Real-time magnetic resonance imaging and electromagnetic articulography database for speech production research (TC). J. Acoust. Soc. Am., 136 (3) (2014), 13071311.

[31] Y.-C. Kim ; S. Narayanan ; K. Nayak : Flexible retrospective selection of temporal resolution in real-time speech MRI using a golden-ratio spiral view order. Magn. Reson. Med., 65 (5) (2011), 13651371.

[33] J. Jackson ; C. Meyer ; D. Nishimura ; A. Macovski : Selection of a convolution function for Fourier inversion using gridding. IEEE Trans. Med. Imaging, 10 (3) (1991), 473478.

[34] E. Bresch ; J. Nielsen ; K. Nayak ; S. Narayanan : Synchronized and noise-robust audio recordings during realtime magnetic resonance imaging scans. J. Acoust. Soc. Am., 120 (4) (2006), 17911794.

[37] Y.-C. Kim ; S. Narayanan ; K. Nayak : Accelerated 3D upper airway MRI using compressed sensing. Magn. Reson. Med., 61 (6) (2009), 14341440.

[39] E. Bresch ; S. Narayanan : Region segmentation in the frequency domain applied to upper airway real-time magnetic resonance images. IEEE Trans. Med. Imaging, 28 (3) (2009), 323338.

[41] A. Soquet ; V. Lecuit ; T. Metens ; D. Demolin : Mid-sagittal cut to area function transformations: direct measurements of mid-sagittal distance and area with MRI. Speech Commun., 36 (2002), 169180.

[42] R.S. McGowan ; M.T.-T. Jackson ; M.A. Berger : Analyses of vocal tract cross-distance to area mapping: an investigation of a set of vowel images. J. Acoust. Soc. Am., 131 (2012), 424434.

[45] C. Liu ; R. Bammer ; M.E. Moseley : Parallel imaging reconstruction for arbitrary trajectories using k-space sparse matrices (kSPA). Magn. Reson. Med., 58 (6) (2007), 11711181.

[53] V. Ramanarayanan ; A. Lammert ; L. Goldstein ; S. Narayanan : Are articulatory settings mechanically advantageous for speech motor control? PLoS ONE, 9 (8) (2014), e104168.

[54] A. Lammert ; L. Goldstein ; S. Narayanan ; K. Iskarous : Statistical methods for estimation of direct and differential kinematics of the vocal tract. Speech Commun., 55 (2013), 147161.

[55] A. Lammert ; M. Proctor ; S. Narayanan : Morphological variation in the adult hard palate and posterior pharyngeal wall. J. Speech, Lang., Hearing Res., 56 (2) (2013), 521530.

[56] A. Lammert ; M. Proctor ; S. Narayanan : Interspeaker variability in hard palate morphology and vowel production. J. Speech, Lang., Hearing Res., 56 (6) (2013), S1924S1933.

[60] T. Kinnunen ; H. Li : An overview of text-independent speaker recognition: from features to supervectors. Speech Commun., 52 (1) (2010), 1240.

[61] W. Shen ; J. Campbell ; R. Schwartz : Human error rates for speaker recognition. The J. Acoust. Soc. Am., 130 (4) (2011), 2547–2547.

[62] P. Rose : Technical forensic speaker recognition: evaluation, types and testing of evidence. Comput. Speech Lang., 20 (2–3) (2006), 159191.

[63] B. Atal ; J. Chang ; M. Mathews ; J. Tukey : Inversion of articulatory-to-acoustic transformation in the vocal tract by a computer-sorting technique. J. Acoust. Soc. Am., 63 (5) (1978), 15351555.

[64] P. Ghosh ; S. Narayanan : A generalized smoothness criterion for acoustic-to-articulatory inversion. The Journal of the Acoustical Society of America, 128 (4) (2010), 21622172.

[65] S. Maeda : A digital simulation method of the vocal-tract system. Speech Commun., 1 (3–4) (1982), 199229.

[67] S. King ; J. Frankel ; K. Livescu ; E. McDermott ; K. Richmond ; M. Wester : Speech production knowledge in automatic speech recognition. J. Acoust. Soc. Am., 121 (2) (2007), 723742.

[68] V. Mitra ; H. Nam ; C. Espy-Wilson ; E. Saltzman ; L. Goldstein : Articulatory information for noise robust speech recognition. IEEE Trans. Audio, Speech Lang. Process., 19 (7) (2011), 19131924.

[70] P. Ghosh ; S. Narayanan : Automatic speech recognition using articulatory features from subject-independent acoustic-to-articulatory inversion. J. Acoust. Soc. Am., 130 (4) (2011), EL251EL257.

[71] B. Lindblom : Role of articulation in speech perception: clues from production. The J. Acoust. Soc. Am., 99 (3) (1996), 16831692.

[72] S. Wilson ; A. Saygin ; M. Sereno ; M. Iacobini : Listening to speech activates motor areas involved in speech production. Nat. Neurosci., 7 (7) (2004), 701702.

[73] P. Ghosh ; L. Goldstein ; S. Narayanan : Processing speech signal using auditory-like filterbank provides least uncertainty about articulatory gestures. J. Acoust. Soc. Am., 129 (6) (2011), 40144022.

[75] T. Kitamura : Difference in vocal tract shape between upright and supine postures: observations by an open-type MRI scanner. Acoust. Sci. Technol., 26 (5) (2005), 465468.

[76] M. Stone : Comparison of speech production in upright and supine position. J. Acoust. Soc. Am., 122 (1) (2007), 532541.

[77] L. Traser ; M. Burdumy ; B. Richter ; M. Vicari ; M. Echternach : Weight-bearing MR imaging as an option in the study of gravitational effects on the vocal tract of untrained subjects in singing phonation. PLoS ONE, 9 (11) (2014), e112405.

[78] Y. Honda ; N. Hata : Dynamic imaging of swallowing in a seated position using open-configuration MRI. J. Magn. Reson. Imaging, 26 (1) (2007), 172176.

[79] J.L. Perry : Variations in velopharyngeal structures between upright and supine positions using upright magnetic resonance imaging. Cleft Palate-Craniofacial J., 48 (2) (2010), 123133.

[80] S. Katseff ; J. Houde ; K. Johnson : Partial compensation for altered auditory feedback: a tradeoff with somatosensory feedback? Lang. Speech, 55 (2) (2012), 295308.

[81] D.R. Lametti ; S.M. Nasir ; D.J. Ostry : Sensory preference in speech production revealed by simultaneous alteration of auditory and somatosensory feedback. J. Neurosci., 32 (27) (2012), 93519358.

[82] R. Kalin ; M.S. Stanton : Current clinical issues for MRI scanning of pacemaker and defibrillator patients. Pacing Clin. Electrophysiol., 28 (4) (2005), 326328.

[83] K.J. Murphy ; J.A. Brunberg : Adult claustrophobia, anxiety and sedation in MRI. Magn. Reson. Imaging, 15 (1) (1997), 5154.

[84] S.G. Lingala ; B.P. Sutton ; M.E. Miquel ; K.S. Nayak : Recommendations for real-time speech MRI. J. Magn. Reson. Imaging, 43 (1) (2016), 2844.

[85] B. Story ; I. Titze ; E. Hoffman : Vocal tract area functions from magnetic resonance imaging. J. Acoust. Soc. Am., 100 (1) (1996), 537554.

[86] S. Narayanan ; A. Alwan ; K. Haker : Toward articulatory-acoustic models for liquid approximants based on MRI and EPG data. Part I. The laterals. J. Acoust. Soc. Am., 101 (2) (1997), 10641077.

[88] H. Takemoto ; T. Kitamura ; H. Nishimoto ; K. Honda : A method of tooth superimposition on MRI data for accurate measurement of vocal tract shape and dimensions. Acoust. Sci. Technol., 25 (6) (2004), 468474.

[89] I.W. Ng ; T. Ono ; M.S. Inoue-Arai ; E. Honda ; T. Kurabayashi ; K. Moriyama : Application of MRI movie for observation of articulatory movement during a fricative /s/ and a plosive /t/. Angle Orthodont., 81 (2) (2011), 237244.

Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

APSIPA Transactions on Signal and Information Processing
  • ISSN: 2048-7703
  • EISSN: 2048-7703
  • URL: /core/journals/apsipa-transactions-on-signal-and-information-processing
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×

Keywords:

Metrics

Altmetric attention score

Full text views

Total number of HTML views: 33
Total number of PDF views: 218 *
Loading metrics...

Abstract views

Total abstract views: 255 *
Loading metrics...

* Views captured on Cambridge Core between September 2016 - 24th September 2017. This data will be updated every 24 hours.