Hostname: page-component-848d4c4894-m9kch Total loading time: 0 Render date: 2024-05-13T21:29:42.251Z Has data issue: false hasContentIssue false

18 Automatically calculated lexical and sentential context features of connected speech predict cognitive impairment

Published online by Cambridge University Press:  21 December 2023

Graham Flick*
Affiliation:
New York University, New York, New York, USA. IBM Research, Yorktown Heights, New York, USA
Rachel Ostrand
Affiliation:
IBM Research, Yorktown Heights, New York, USA
*
Correspondence: Graham Flick, New York University & IBM Research, graham.flick@nyu.edu
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.
Objective:

Early detection is critical to the effective management of Alzheimer’s Disease (AD) and other forms of dementia. Existing screening assessments are often costly, require substantial expertise to administer, and may be insensitive to mild changes in cognition. A promising alternative is automatically measuring features of connected speech (c.f., Ostrand & Gunstad, 2021, Journal Geriatric Psych & Neurol) to predict impairment. Here, we built on prior work examining how well speech features predicted cognitive impairment. Unique to the current work, we attempted to capture more holistic effects of cognitive impairment by examining the relevance of linguistic features that measure sentential or discourse context properties of speech, including the context in which filler words (e.g., um) occur, and the predictability of individual words within their sentence context, computed from a large computational language model (GPT-2).

Participants and Methods:

Participants completed the Cookie Theft picture description task, with data available in the DementiaBank corpus (Becker et al., 1994, Arch Neurol). Descriptions that contained at least 50 words (N = 214) were submitted to an automatic feature calculation pipeline written in Python to calculate various part-of-speech counts, lexical diversity metrics, and mean lexical frequency, as well as multiple metrics related to lexical surprisal (i.e., how surprising a word is given its context).

Surprisal of individual words was computed using the pre-trained GPT-2 transformer language model (Radford et al., 2019, Comput. Sci.) by computing word probability given the previous 12 words. Multiple linear regression was performed using 17 linguistic features jointly as predictors, and Mini-Mental State Examination (MMSE) score as the outcome variable. Simple regressions were calculated between each feature and MMSE scores to examine the predictability of each linguistic feature on cognitive decline.

Results:

A multiple linear regression model containing all linguistic features plus demographic information (age, sex, education) significantly predicted MMSE scores (Adjusted R2 = 0.41, F20, 193 = 8.37, p < .001), and explained significantly more variance in MMSE scores than did demographic variables alone (F17193 = 6.85, p < .001). Individual predictors that were significantly correlated with MMSE score included: how unexpected an individual’s word choices were, given the preceding context (median surprisal: r = -0.33, p < 0.001; interquartile range: r = 0.18, p < 0.02), mean lexical frequency (r = -0.50, p < .001), and usage of definite articles (r = 0.31, p < 0.001), nouns (r = 0.26, p < .001), and empty words (r = -0.25, p < 0.001).

Conclusions:

Participants with lower MMSE scores, indicating greater impairment, used more frequent, yet more surprising, words, and produced more empty words and fewer definite articles and nouns. These results suggest that measures of semantic specificity and coherence of speech could be meaningful predictors of cognitive decline, and can be computed automatically from speech transcriptions. The results also provide novel evidence that computational approaches to estimating lexical predictability may have value in predicting the degree of decline, motivating future work in other speech elicitation tasks and differing clinical groups.

Type
Poster Session 08: Assessment | Psychometrics | Noncredible Presentations | Forensic
Copyright
Copyright © INS. Published by Cambridge University Press, 2023