Abstract
Visual feedback of articulators using Electromagnetic- Articulography (EMA) has been shown to aid the acquisition of non-native speech sounds. Using physical EMA sensors is expensive and invasive making it impractical for providing real-world pronunciation feedback. Our work focuses on us- ing neural Acoustic-to-Articulatory Inversion (AAI) models to map speech directly to EMA sensor positions. Self-Supervised Learning (SSL) speech models, such as HuBERT, can produce representations of speech that have been shown to significantly improve performance on AAI tasks. Probing experiments have indicated that certain layers and iterations of SSL models produce representations that may yield better inversion performance than others. In this work, we build on these probing results to create an AAI model that improves upon a state-of-the-art baseline inversion model and evaluate the model’s suitability for second language pronunciation training.



![Author ORCID: We display the ORCID iD icon alongside authors names on our website to acknowledge that the ORCiD has been authenticated when entered by the user. To view the users ORCiD record click the icon. [opens in a new tab]](https://www.cambridge.org/engage/assets/public/coe/logo/orcid.png)