Hostname: page-component-77f85d65b8-2tv5m Total loading time: 0 Render date: 2026-03-30T09:28:11.991Z Has data issue: false hasContentIssue false

Advances in real-time magnetic resonance imaging of the vocal tract for speech science and technology research

Published online by Cambridge University Press:  31 March 2016

Asterios Toutios*
Affiliation:
Signal Analysis and Interpretation Laboratory (SAIL), University of Southern California (USC), 3740 McClintock Avenue, Los Angeles, CA 90089, USA
Shrikanth S. Narayanan
Affiliation:
Signal Analysis and Interpretation Laboratory (SAIL), University of Southern California (USC), 3740 McClintock Avenue, Los Angeles, CA 90089, USA
*
Corresponding author:A. Toutios Email: toutios@usc.edu

Abstract

Real-time magnetic resonance imaging (rtMRI) of the moving vocal tract during running speech production is an important emerging tool for speech production research providing dynamic information of a speaker's upper airway from the entire midsagittal plane or any other scan plane of interest. There have been several advances in the development of speech rtMRI and corresponding analysis tools, and their application to domains such as phonetics and phonological theory, articulatory modeling, and speaker characterization. An important recent development has been the open release of a database that includes speech rtMRI data from five male and five female speakers of American English each producing 460 phonetically balanced sentences. The purpose of the present paper is to give an overview and outlook of the advances in rtMRI as a tool for speech research and technology development.

Information

Type
Overview Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
Copyright © The Authors, 2016
Figure 0

Fig. 1. Example rtMRI frames from the ten speakers in the USC-TIMIT database (top row, male; bottom row, female).

Figure 1

Fig. 2. Example rtMRI sequence from the USC-TIMIT database. A male subject utters the sentence “Bright sunshine shimmers on the ocean” (one of the 460 MOCHA-TIMIT sentences included for each subject). Note that there is a zoom into the frames, as compared to Fig. 1. The phonetic labels are a result of automatic alignment. The symbol “sp” stands for “space” and “sil” for “silence”.

Figure 2

Fig. 3. Spectrograms of the audio, recorded concurrently with the rtMRI data, for the utterance “This was easy for us” spoken by a female subject before (top) and after (bottom) de-noising.

Figure 3

Table 1. Technical details of four extensively used rtMRI sequences

Figure 4

Fig. 4. GUI allowing for audition, labeling, tissue segmentation, and acoustic analysis of the rtMRI data, displaying an example of parametric segmentation.

Figure 5

Fig. 5. Example of region segmentation (white outlines) of articulators in rtMRI data. The word uttered by the female subject is “critical”. The symbol “s” stands for “space”.