Acoustic speech markers for schizophrenia-spectrum disorders: a diagnostic and symptom-recognition tool

J. N. de Boer; A. E. Voppel; S. G. Brederoo; H. G. Schnack; K. P. Truong; F. N. K. Wijnen; I. E. C. Sommer

doi:10.1017/S0033291721002804

Acoustic speech markers for schizophrenia-spectrum disorders: a diagnostic and symptom-recognition tool

Published online by Cambridge University Press: 04 August 2021

and

J. N. de Boer*: Affiliation:
Department of Biomedical Sciences of Cells and Systems and Department of Psychiatry, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands Department of Psychiatry, University Medical Center Utrecht, Utrecht University & University Medical Center Utrecht Brain Center, Utrecht, the Netherlands
A. E. Voppel: Affiliation:
Department of Biomedical Sciences of Cells and Systems and Department of Psychiatry, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
S. G. Brederoo: Affiliation:
Department of Biomedical Sciences of Cells and Systems and Department of Psychiatry, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
H. G. Schnack: Affiliation:
Department of Psychiatry, University Medical Center Utrecht, Utrecht University & University Medical Center Utrecht Brain Center, Utrecht, the Netherlands Utrecht Institute of Linguistics OTS, Utrecht University, Utrecht, the Netherlands
K. P. Truong: Affiliation:
Department of Human Media Interaction, University of Twente, Enschede, the Netherlands
F. N. K. Wijnen: Affiliation:
Utrecht Institute of Linguistics OTS, Utrecht University, Utrecht, the Netherlands
I. E. C. Sommer: Affiliation:
Department of Biomedical Sciences of Cells and Systems and Department of Psychiatry, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
*: Author for correspondence: J. N. de Boer, E-mail: j.n.deboer-18@umcutrecht.nl

Article contents

Abstract
Background
Methods
Results
Conclusions
Introduction
Methods
Procedure
Results
Discussion
Data
Author contributions
Financial support
Conflict of interest
References

Rights & Permissions

Abstract

Background

Clinicians routinely use impressions of speech as an element of mental status examination. In schizophrenia-spectrum disorders, descriptions of speech are used to assess the severity of psychotic symptoms. In the current study, we assessed the diagnostic value of acoustic speech parameters in schizophrenia-spectrum disorders, as well as its value in recognizing positive and negative symptoms.

Methods

Speech was obtained from 142 patients with a schizophrenia-spectrum disorder and 142 matched controls during a semi-structured interview on neutral topics. Patients were categorized as having predominantly positive or negative symptoms using the Positive and Negative Syndrome Scale (PANSS). Acoustic parameters were extracted with OpenSMILE, employing the extended Geneva Acoustic Minimalistic Parameter Set, which includes standardized analyses of pitch (F0), speech quality and pauses. Speech parameters were fed into a random forest algorithm with leave-ten-out cross-validation to assess their value for a schizophrenia-spectrum diagnosis, and PANSS subtype recognition.

Results

The machine-learning speech classifier attained an accuracy of 86.2% in classifying patients with a schizophrenia-spectrum disorder and controls on speech parameters alone. Patients with predominantly positive v. negative symptoms could be classified with an accuracy of 74.2%.

Conclusions

Our results show that automatically extracted speech parameters can be used to accurately classify patients with a schizophrenia-spectrum disorder and healthy controls, as well as differentiate between patients with predominantly positive v. negatives symptoms. Thus, the field of speech technology has provided a standardized, powerful tool that has high potential for clinical applications in diagnosis and differentiation, given its ease of comparison and replication across samples.

Keywords

Acoustic biomarker language psychosis speech

Type: Original Article
Information: Psychological Medicine , Volume 53 , Issue 4 , March 2023 , pp. 1302 - 1312

DOI: https://doi.org/10.1017/S0033291721002804 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re- use, distribution and reproduction, provided the original article is properly cited.
Copyright: Copyright © The Author(s), 2021. Published by Cambridge University Press

Introduction

In clinical practice, impressions of speech are routinely employed as an element of mental status examination and are a primary source of information in the diagnostic process. Especially in schizophrenia-spectrum disorders, descriptions of speech are used to assess specific symptoms such as alogia (e.g. poverty of speech) and blunted affect (e.g. diminished vocal intonation), as well as positive symptoms such as excitement (e.g. pressured speech; Alpert, Shaw, Pouget, & Lim, Reference Alpert, Shaw, Pouget and Lim2002). Although there is no consensus on the best sub-categorization of the different symptoms of schizophrenia-spectrum disorders yet, an important distinction is that between positive and negative symptoms. Being able to determine at different moments in time whether a schizophrenia-spectrum patient experiences predominantly positive or negative symptoms is of great clinical importance because these symptom subtypes require different treatment (Fusar-Poli et al., Reference Fusar-Poli, Papanastasiou, Stahl, Rocchetti, Carpenter, Shergill and McGuire2015). Moreover, persistent negative symptoms are strongly related to functional outcomes (Lin et al., Reference Lin, Huang, Chang, Chen, Lin, Tsai and Lane2013; Milev, Ho, Arndt, & Andreasen, Reference Milev, Ho, Arndt and Andreasen2005), further highlighting the need for a reliable distinction between subtypes. From a scientific perspective, reliable identification of relevant subgroups of patients is highly important because the neurobiological underpinnings of their disorders are likely different (Cuthbert & Insel, Reference Cuthbert and Insel2013; Insel, Reference Insel2014). Consider a patient with grandiose delusions, hallucinations, pressured speech and derailment, v. a patient who presents with social withdrawal, alogia and catatonia: although both patients could be classified as having schizophrenia, the processes underlying these clinically non-overlapping symptom-collections might be entirely different.

Traditionally, clinical rating scales, such as the Positive and Negative Syndrome Scale (PANSS; Kay, Fiszbein, & Opfer, Reference Kay, Fiszbein and Opfer1987) are used to assess both positive and negative symptoms in schizophrenia. However, previous research has stressed the dramatic disparity between speech-based objective measurements of individual characteristics (e.g. flat affect) and scale-based subjective assessments assessed by clinicians; symptom rating scales are, at best, only modestly related to objective measurements of the behavior they intend to reflect (Cohen, Mitchell, & Elvevåg, Reference Cohen, Mitchell and Elvevåg2014). Moreover, some PANSS items are highly influenced by experience and preconceptions of the rater, which results in low inter-rater reliability (Kølbæk et al., Reference Kølbæk, Blicher, Buus, Feller, Holm, Dines and Correll2018). In daily practice, these subjective assessments are not performed scale-based, but ad hoc, leading to even higher inter-observer variation. Especially, the assessment of negative symptoms remains problematic, since consensus-based instruments diverge on what should be considered true negative symptoms and what should be considered general/other illness components; such as, for example, cognition (Marder & Galderisi, Reference Marder and Galderisi2017). Developing objective measurements to complement clinical ratings is thus fundamental to overcome these serious validity concerns with clinical assessments of positive and negative symptoms, and enable development towards measurement-based care (Insel, Reference Insel2017).

Recent advances in natural language processing have paved the way towards automated speech analyses as a biomarker for psychosis (Corcoran et al., Reference Corcoran, Mittal, Bearden, Gur, Hitczenko, Bilgrami and Wolff2020; de Boer et al., Reference de Boer, Voppel, Begemann, Schnack, Wijnen and Sommer2018; de Boer, Brederoo, Voppel, & Sommer, Reference de Boer, Brederoo, Voppel and Sommer2020; Hitczenko, Mittal, & Goldrick, Reference Hitczenko, Mittal and Goldrick2020; Tan & Rossell, Reference Tan and Rossell2020). Most studies in this field analyze language content, including methods of measuring discourse coherence, syntactic complexity and referential coherence (Bedi et al., Reference Bedi, Carrillo, Cecchi, Slezak, Sigman, Mota and Corcoran2015; Corcoran et al., Reference Corcoran, Carrillo, Fernández-Slezak, Bedi, Klim, Javitt and Cecchi2018; Holmlund et al., Reference Holmlund, Chandler, Foltz, Cohen, Cheng, Bernstein and Elvevåg2020; Mota, Furtado, Maia, Copelli, & Ribeiro, Reference Mota, Furtado, Maia, Copelli and Ribeiro2014; Palaniyappan et al., Reference Palaniyappan, Mota, Oowise, Balain, Copelli, Ribeiro and Liddle2019; Rezaii, Walker, & Wolff, Reference Rezaii, Walker and Wolff2019). A source of information that is less commonly used is the acoustic aspects of speech. Articulation and other components involved in speech production can be measured in the acoustic signal, for example by decomposing sound waves into formants; i.e. acoustic resonance frequencies that indicate the position and movement of the articulatory organs when speaking. F1 (first formant) indicates jaw/mouth opening and tongue height, while F2 corresponds to tongue positioning (front/back) and lip rounding. Furthermore, F0, or fundamental frequency, is an acoustic parameter of pitch. Such acoustic speech features (F0 and F2 variability) have been associated with specific negative symptoms (Bernardini et al., Reference Bernardini, Lunden, Covington, Broussard, Halpern, Alolayan and Compton2016; Covington et al., Reference Covington, Lunden, Cristofaro, Wan, Bailey, Broussard and Compton2012) and have also been utilized as differentiators between individuals with schizophrenia-spectrum disorders and healthy individuals in small samples (Martínez-Sánchez et al., Reference Martínez-Sánchez, Muela-Martínez, Cortés-Soto, García Meilán, Vera Ferrándiz, Egea Caparrós and Pujante Valverde2015; Tahir et al., Reference Tahir, Yang, Chakraborty, Thalmann, Thalmann, Maniam and Dauwels2019), attaining overall classification accuracies varying from 81% to 94%. Although these studies are promising, the larger literature also reports studies that failed to find a difference in acoustic features between schizophrenia-spectrum patients and healthy individuals (Cohen, Mitchell, Docherty, & Horan, Reference Cohen, Mitchell, Docherty and Horan2016; Meaux, Mitchell, & Cohen, Reference Meaux, Mitchell and Cohen2018).

These inconsistent findings may be due to the fact that several studies took different approaches to computing and standardizing acoustic parameters (Eyben et al., Reference Eyben, Scherer, Schuller, Sundberg, Andre, Busso and Truong2016). Individual research groups develop and employ their own set of features, and the studies reported above used acoustic feature sets that overlap only partially. Furthermore, studies often include non-acoustic features (e.g. the usage of determiners in speech) in their analyses. Therefore, the results reported in the literature cannot always be meaningfully compared (de Boer et al., Reference de Boer, Brederoo, Voppel and Sommer2020). Indeed, a recent meta-analysis found the existing literature to be highly diverse and unsystematic (Parola, Simonsen, Bliksted, & Fusaroli, Reference Parola, Simonsen, Bliksted and Fusaroli2020). Furthermore, the rapid developments in natural language processing have led to a proliferation in features, often amounting in up to tens of thousands of acoustic parameters (Marmar et al., Reference Marmar, Brown, Qian, Laska, Siegel, Li and Vergyri2019). While this abundance in features allows capturing many acoustic characteristics, it comes at the cost of overfitting and preventing clear interpretations of the underlying mechanisms (de Boer et al., Reference de Boer, Brederoo, Voppel and Sommer2020). We analyzed the speech of patients with a schizophrenia-spectrum disorder and healthy controls using the extended Geneva Acoustic Minimalistic Parameter Set (eGeMAPS) for acoustic analyses (Eyben et al., Reference Eyben, Scherer, Schuller, Sundberg, Andre, Busso and Truong2016). This parameter set was developed to provide a baseline for affective speech processing, in order to ensure easy replication and to improve comparability of parameters.

Overall, some recent studies have shown promising results in identifying schizophrenia-spectrum disorders from healthy controls utilizing automated analysis of acoustic speech patterns (Martínez-Sánchez et al., Reference Martínez-Sánchez, Muela-Martínez, Cortés-Soto, García Meilán, Vera Ferrándiz, Egea Caparrós and Pujante Valverde2015; Tahir et al., Reference Tahir, Yang, Chakraborty, Thalmann, Thalmann, Maniam and Dauwels2019). Yet, inconsistencies remain and replications in large samples using interpretable and standardized speech parameters are required (Meaux et al., Reference Meaux, Mitchell and Cohen2018; Parola et al., Reference Parola, Simonsen, Bliksted and Fusaroli2020). Thus, the first aim of the current study is to replicate the diagnostic potential of standardized speech parameters in a large sample of patients with a schizophrenia-spectrum disorder. The second aim is to explore the value of acoustic analyses in differentiating between patients who at that time experience predominantly positive v. negative psychotic symptoms. Third, we aim to explore the relation between acoustic features and cognitive and psychotic symptoms, as well as the relation with antipsychotic medication. Based on previous study from our group, we specifically expect increased pauses to be related to the use of typical antipsychotics (de Boer, Voppel, Brederoo, Wijnen, & Sommer, Reference de Boer, Voppel, Brederoo, Wijnen and Sommer2020).

Methods

Participants

Data from 142 Dutch patients with a schizophrenia-spectrum disorder and 142 healthy age- and sex-matched controls from the same community were collected between 2015 and 2020. All procedures were approved by the Ethical Committee of the University Medical Center Utrecht. Psychiatric diagnoses were confirmed by the Structured Clinical Interview for DSM-IV (SCID; First, Reference First2014), the Comprehensive Assessment of Symptoms and History (CASH; Andreasen, Flaum, & Arndt, Reference Andreasen, Flaum and Arndt1992) or the Mini-International Interview (MINI; Sheehan et al., Reference Sheehan, Lecrubier, Sheehan, Amorim, Janavs, Weiller and Dunbar1998), depending on the study the participants originally enrolled in. Healthy controls were screened for the absence of previous or current mental illness. Symptom severity and cognitive functioning were rated by consensus rating of two trained researchers, blinded to phonetic analysis, with the PANSS (Kay et al., Reference Kay, Fiszbein and Opfer1987) and Brief Assessment of Cognition (BACS; Keefe et al., Reference Keefe, Goldberg, Harvey, Gold, Poe and Coughenour2004). The patients were categorized as having predominant negative v. positive symptoms based on PANSS scores. Since there is no consensus in the literature on cutoff scores for such subcategorizations, a median split was used. Patients were therefore categorized as having predominantly negative symptoms if they had a greater score on the negative than on the positive subscale of the PANSS, and a negative subscale score ⩾ the median, being 12. The patients with predominantly positive symptoms consisted of patients with PANSS positive subscale scores that were greater than their negative subscale scores, and a positive subscale score ⩾ the median, being 9. Of the 142 schizophrenia-spectrum cases, 44 had predominantly positive symptoms and 45 had predominantly negative symptoms, the remaining patients had an equal combination of positive and negative symptoms.

Patients were divided into two categories based on different dopamine binding profiles, namely patients with (1) low dopamine D2 receptor (D2R) occupancy drugs (i.e. quetiapine, paliperidone, olanzapine and clozapine) or (2) high D2R occupancy drugs (i.e. aripiprazole, risperidone, flupentixol, amisulpride and haloperidol), following a previous report by our group (de Boer et al., Reference de Boer, Voppel, Brederoo, Wijnen and Sommer2020). Antipsychotic drug dosages were recalculated into chlorpromazine equivalents to evaluate the effect of dosage between the drugs (Leucht et al., Reference Leucht, Samara, Heres, Patel, Woods and Davis2014).

Procedure

Speech recording

Semi-structured interviews varying from 5 to 30 min in length (average 11 min) were conducted using a set of neutral open-ended questions. For an elaborate description of this methodology, see previous reports by our group (de Boer, van Hoogdalem, et al., Reference de Boer, van Hoogdalem, Mandl, Brummelman, Voppel, Begemann and Sommer2020; de Boer, Voppel, et al., Reference de Boer, Voppel, Brederoo, Wijnen and Sommer2020; Voppel, de Boer, Brederoo, Schnack, & Sommer, Reference Voppel, de Boer, Brederoo, Schnack and Sommer2021). AKG-C544l cardioid microphones were used to record the participant's and interviewer's speech onto two different channels. Speech was digitally recorded onto a Tascam DR40 solid state recorder, at a sampling rating of 44.1 kHz with 16-bit quantization. Head-worn microphones were used to keep the distance between the mouth and the microphone as constant as possible (2 cm), since this has a considerable effect on recorded vocal loudness. Microphones were placed at a 45° angle from the participant's mouth in order to prevent aerodynamic noise.

Speech pre-processing

To remove crosstalk (i.e. speech from the interviewer on the participant's audio channel), the following steps were taken: (1) the ‘annotate to text grid silences’ function in PRAAT (Boersma & Weenink, Reference Boersma and Weenink2013) was used on the interviewer's channel (settings: minimum pitch 100 Hz, time step 0.0, silence threshold −30.0 dB, minimum silence duration 1.0 s, minimum sounding duration 0.1 s); (2) all resulting speech segments in which the interviewer was silent were selected on the participants channel and (3) the resulting speech segments were concatenated into a new audio file containing only segments of speech of the participant.

Acoustic parameters

The eGeMAPS parameter set was extracted from the participant's speech using OpenSMILE (Eyben, Weninger, Gross, & Schuller, Reference Eyben, Weninger, Gross and Schuller2013). eGeMAPS provides arithmetic means and coefficients of variation [standard deviation (s.d.) normalized by the arithmetic mean] for each parameter. A total of 88 parameters were computed at the speaker level and were used for feature selection and classification model building. For a full overview of the parameters, see online Supplementary Table S1. The parameters can be divided into the following types (Eyben et al., Reference Eyben, Weninger, Gross and Schuller2013): temporal (e.g. speech rate; six parameters), frequency (e.g. fundamental frequency; 24 parameters), spectral [e.g. Mel-frequency cepstral coefficients (MFCCs), relative energy in different frequency bands; 43 parameters] and energy/amplitude (e.g. intensity; 14 parameters).

Cognitive functioning

Cognition was assessed in all patients using the BACS (Keefe et al., Reference Keefe, Goldberg, Harvey, Gold, Poe and Coughenour2004) which consists of the following tasks: (1) List learning – Verbal memory; (2) Digit sequencing - Working memory; (3) Token motor task - Motor speed; (4) Category Instances and Controlled Oral Word Association Test - Verbal fluency; (5) Symbol coding - Attention and information processing speed and (6) Tower of London - Executive function. Individual BACS scores were converted into standardized Z-scores that are corrected for age and gender based on previously published norm scores (Keefe et al., Reference Keefe, Harvey, Goldberg, Gold, Walker, Kennel and Hawkins2008).

Statistical analysis

For categorical variables, a χ² test, and for continuous variables analysis of variance (ANOVAs) were used to assess differences between groups in demographic characteristics. Following previous research (Marmar et al., Reference Marmar, Brown, Qian, Laska, Siegel, Li and Vergyri2019), random forest algorithms were used to build machine-learning classifiers using acoustic speech parameters to differentiate between schizophrenia-spectrum patients and healthy controls, and to classify symptom patterns (i.e. predominantly positive v. predominantly negative symptoms). In this study, random forest classifiers were built using the 88 speech markers, using leave-ten-out cross-validation to divide the data into test and train sets. Random forests are based on multiple classification trees. The nodes in these decision trees are based on a binary ‘split’ of a predictor, aimed at minimizing misclassification in a training subset of the data. The process continues recursively until a tree is formed that does not improve from further splits or nodes. The probability of being a member of a certain target class, is the fraction of that class in the terminal node into which they fall. Once trees are iteratively built, per-case classification is performed on the testing subset. The resulting probability estimates are used to generate a receiver operator curve (ROC) and its area under the curve (AUC). From the trained trees, the importance of a specific predictor (Gini-importance score) is estimated by setting the predictor to a random number, and comparing the difference in predictive power (AUCs) to the final model, measuring how much worse the model becomes when replacing the predictor with random data. Gini-importance scores are average scores over all recursive trees, and can only be interpreted in relation to each other. Bivariate Pearson correlations were performed to analyze associations between acoustic variables and clinical characteristics in the patients for the top 10 features in both models. Correlation analyses were corrected for multiple comparison using false-discovery rate (FDR). Alpha was set to 0.05 for all analyses.

Results

The schizophrenia-spectrum patients and controls did not differ significantly in age, sex or parental educational levels (see Table 1). The patients overall received less education than the controls, which is to be expected given that the first psychosis often occurs during educational years. The schizophrenia-spectrum patients with predominantly positive symptoms had a longer illness duration than the patients with predominantly negative symptoms. These subgroups did not differ significantly on age, sex and parental educational levels. The patients with predominantly positive symptoms had a longer illness duration than the patients with predominantly negative symptoms (Table 1).

Table 1. Demographics

HC: healthy controls; SSD: schizophrenia-spectrum disorder patients.

Psychotic symptom subgroup analyses were performed in those patients that met criteria for either predominantly positive or negative symptoms. These subgroups are a subset (n = 89) of the full schizophrenia-spectrum disorder sample (n = 142).

** Indicates p value <0.01.

Diagnostic classifier

The trained 10-fold cross-validated classifier had an accuracy of 86.2% with an AUC of 0.92 in distinguishing schizophrenia-spectrum patients from healthy controls. Sensitivity was 85.1% and specificity 87.2% (see Table 2).

Table 2. Classification of performance metrics

Legend: SSD: schizophrenia-spectrum disorder patients, HC: healthy controls, sens: sensitivity, spec: specificity, CI: confidence interval, NPV: negative predictive value, PPV: positive predictive value, AUC: area under the curve.

Table 3 lists the ten acoustic parameters with the highest importance scores in the final model, as well as the means and s.d.s of these parameters in the schizophrenia-spectrum patients and controls. For a full list of Gini-importance scores, see online Supplementary Table S1. Three of the top parameters in the model were temporal (voiced segment rate, unvoiced segment length and s.d.), indicating short, fragmented speech segments and longer pauses in the schizophrenia-spectrum patients compared to controls. Patients with a schizophrenia-spectrum disorder were further classified using spectral parameters between groups, namely a reduced mean spectral slope of voiced and unvoiced regions (indicating a more tensed voice in the patients) and reduced spectral flux variation (i.e. less difference between spectra measured between two consecutive time windows, which may indicate more monotone speech). Patients were further characterized by reduced variation of F2 bandwidth (i.e. vowel bandwidth, can indicate breathiness). Moreover, the energy of the first and third formant as compared to pitch (F0) differed from healthy controls, indicating differences in intonation characteristics and voice quality. The patients showed a larger variation in loudness compared to the controls.

Table 3. Top ten acoustic parameters in the diagnostic classifier (schizophrenia-spectrum disorder v. healthy control)

s.d., standard deviation; UV, unvoiced; V, voiced.

Ranking based on average Gini-importance scores.

** indicates p value <0.01.

Psychotic symptoms classifier

The trained 10-fold cross-validated model had an accuracy of 74.2% with an AUC–ROC of 0.76 in distinguishing patients with predominantly negative from those with predominantly positive symptoms. Sensitivity was 65.9% and specificity 80.0% (see Table 2). Table 4 lists the top ten acoustic parameters in the trained model, as well as the means and s.d.s of these parameters in the predominant positive and negative symptom groups. For a full list of Gini-importance scores, see online Supplementary Table S2. Patients with positive symptoms had a smaller spectral slope indicating a smaller difference in energy between low and high frequencies. Other spectral parameters did not differ between groups. Seven of the top ten parameters were frequency parameters (i.e. relating to the way vowels are pronounced); patients with positive symptoms had a smaller F1 and F2 bandwidth (i.e. vowel bandwidth, indicating breathiness), lower mean F1 frequencies, less jitter variation (i.e. roughness or voice quality) and less variation in F3 frequency (i.e. vowel frequency).

Table 4. Top ten acoustic parameters in the psychotic symptoms classifier (positive v. negative symptoms)

Ranking based on average Gini-importance scores. s.d.: standard deviation; UV: unvoiced; V: voiced. ** indicates p value <0.01, * indicates p value <0.05.

Associations between acoustic variables and clinical characteristics

Psychotic and cognitive symptoms

Pearson correlations for the top ten parameters from both classifiers with PANSS positive, negative and general and composite BACS scores are presented in Table 5. Acoustic variables correlated most strongly with negative and to a lesser degree general PANSS scores. After FDR correction, no significant associations with cognitive functioning were found.

Table 5. Associations between top acoustic parameters and psychotic and cognitive symptoms

Legend: Reported values are Pearson's r values. ** indicates p value <0.01, * indicates p value <0.05. Bold values remained significant after FDR correction for multiple comparisons.

Antipsychotic medication

Pearson correlations between chlorpromazine equivalents and all 88 acoustic parameters revealed only one weak association; slope V500–1500 (s.d.) was correlated with chlorpromazine equivalent dose (r = −0.207, p = 0.019). This association did not remain significant after FDR correction. A multivariate analysis of variance (MANOVA) analysis comparing patients with low DR2 occupancy drugs with patients using high DR2 occupancy drugs, and chlorpromazine equivalent dose as a covariate, revealed no significant overall group effect of medication type (F _{(1, 85)} = 1.191, p = 0.269) or dose (F _{(1, 85)} = 1.260, p = 0.206) on the acoustic parameters. An ANOVA analysis comparing the two medication groups on pause duration (mean unvoiced segment length) specifically, with chlorpromazine equivalent dose as a covariate, revealed a significant main effect of drug type (F _{(1, 126)} = 4.532, p = 0.035, partial η ² = 0.035). Antipsychotic dose showed no significant effect on pause duration (p = 0.642). Patients with high DR2 occupancy drugs showed a greater pause duration (mean = 0.28) than the patients with low DR2 occupancy drugs (mean = 0.24).

Discussion

This study confirms previous research in showing that patients with a schizophrenia-spectrum disorder can be reliably distinguished from healthy control participants by using automated speech-based techniques, in a large sample. We extend the current literature by further showing that patients with a schizophrenia-spectrum disorder can be divided into those with predominantly positive v. negative symptoms based on their speech acoustics.

By using standardized acoustic measures, high sensitivity, specificity and accuracy could be achieved in differentiating patients from controls. The model assigns higher probability to a schizophrenia-spectrum diagnosis when speech is fragmented (i.e. frequent short-speech segments), has a decreased spectral slope (i.e. less difference in energy between high- and low-speech frequencies) and longer pauses. We further demonstrated that speech parameters can be used to identify subjects with positive and negative symptoms in schizophrenia-spectrum disorders. This model assigns higher probability to positive symptoms when there is less variation in jitter (i.e. roughness of the voice), as well as differences in vowel quality (e.g. breathiness and rounding of vowels, indicated by reduced variation in F3 formant frequencies, a lower F1 formant frequency and a smaller F1 and F2 formant bandwidth (i.e. reach of the formants, affected by jaw/mouth opening as well as tongue and lip positioning). This indicates that speech-markers do not only differentiate between ‘normal’ and ‘disturbed’ speech, but also (reliably) reflect positive and negative symptoms.

Our results showed that temporal (i.e. timing related) parameters are highly important in classifying patients and controls, as three out of six temporal parameters were included in the top ten parameters. This is an interesting finding since temporal parameters comprise only 6.8% of the acoustic parameters used. These findings are in correspondence with a recent meta-analysis indicating that temporal aspects such as speech rate and pausing patterns are often disturbed in schizophrenia-spectrum disorders (Cohen et al., Reference Cohen, Mitchell and Elvevåg2014; Parola et al., Reference Parola, Simonsen, Bliksted and Fusaroli2020). Contrastingly, temporal parameters were not among the top ten parameters characterizing positive v. negative psychotic symptoms. Timing-related parameters are not specific to a single-mental process. In this study, we did not find an association between overall cognitive functioning and the timing related parameters in our models. However, a more detailed exploration of this relation could reveal associations between acoustic-timing and specific aspects of cognitive functioning. Reduced speech rate can, for example, be the result of slower word-retrieval, slower processing speed, slower articulation or increased distractibility. Recent work has shown that increased pausing in schizophrenia-spectrum disorders is mostly observed clause-initial, suggesting that these patients require more time to formulate a sentence (Çokal et al., Reference Çokal, Zimmerer, Turkington, Ferrier, Varley, Watson and Hinzen2019). Moreover, in a previous report by our group we found that patients with increased pausing specifically had difficulties with memory-related tasks (Oomen et al., Reference Oomen, de Boer, Brederoo, Voppel, Brand, Wijnen and Sommer2021). We found that several of the acoustic parameters in our models were associated with negative and to a lesser degree general PANSS scores, including loudness, formant bandwidth and amplitude. These findings suggest that patients with high negative PANSS scores have a more breathy voice and reduced voice quality. Moreover, voiced segments per s was negatively associated with positive PANSS scores, suggesting that patients with high positive PANSS have less fragmented speech. A thorough exploration of these associations is beyond the scope of the current study; further research is needed to explore how acoustic aspects are related to individual PANSS items.

Our results revealed no effect of antipsychotic medication dose on the acoustic parameters, which is in accordance with our previous work (de Boer et al., Reference de Boer, Voppel, Brederoo, Wijnen and Sommer2020). Furthermore, although we did not find an overall effect of drug type on the acoustic variables, we replicated our previous work in showing that the use of high DR2 drugs was associated with increased pausing (de Boer et al., Reference de Boer, Voppel, Brederoo, Wijnen and Sommer2020).

Research on spectral profile parameters suggests they are mostly associated with vocal valence, such as angry speech, as well as the control over emotions (Eyben et al., Reference Eyben, Scherer, Schuller, Sundberg, Andre, Busso and Truong2016; Goudbeek & Scherer, Reference Goudbeek and Scherer2010). Spectral slope and pitch variation have also been associated with emotional stress (Simantiraki, Giannakakis, Pampouchidou, & Tsiknakis, Reference Simantiraki, Giannakakis, Pampouchidou and Tsiknakis2016). Frequency related parameters have been associated with arousal, alertness and engagement in the conversation (Goudbeek & Scherer, Reference Goudbeek and Scherer2010). Applying these interpretations to our findings suggests that patients with positive v. negative symptoms show differences in terms of arousal, alertness and engagement (frequency related parameters), and to a lesser degree in vocal valence (e.g. angry/happy). However, it should be noted that there is little research on the interpretation of acoustic parameters in relation to psychopathology. Most research has been conducted in healthy individuals. Interpreting the different parameters in relation to schizophrenia-spectrum disorders or psychotic symptoms therefore remains speculative for now.

Of note, there is some circularity in this line of research. On the one hand PANSS scores are – for some part – based on a clinical interpretation of speech (e.g. pauses; Alpert, Pouget, & Silva, Reference Alpert, Pouget and Silva1995), on the other hand acoustic speech measurements are validated by their association with negative symptoms. Clinicians describe a persons' speech as being ‘flat’, ‘monotonous’ or ‘aprosodic’, which is then traced back to smaller F0 range and decreased variability in F2 (Compton et al., Reference Compton, Lunden, Cleary, Pauselli, Alolayan, Halpern and Covington2018). Speech analysis should be used routinely to understand the changes in speech during psychosis, and should be favored over less precise clinical descriptions of spoken language. Phonetic parameters have (relatively) clear underlying biological mechanisms (Eyben et al., Reference Eyben, Scherer, Schuller, Sundberg, Andre, Busso and Truong2016), while we do not know what a clinician hears when he describes a patient's speech as ‘monotonous’. For example, there is evidence that clinicians are influenced by speaking patterns in assessing the severity of all negative symptoms; when pause length is manipulated in spoken language, clinicians tend to rate other (non-speech) negative symptoms as more severe (Alpert et al., Reference Alpert, Pouget and Silva1995).

This line of research is still in the proof of concept phase and will face a number of challenges before it can be clinically implemented (Dukart, Weis, Genon, & Eickhoff, Reference Dukart, Weis, Genon and Eickhoff2021). For example, biases in machine learning (Schnack, Reference Schnack, Mechelli and Vieira2020), privacy issues, generalizability and technical pitfalls should be dealt with in the first place (Jacobson et al., Reference Jacobson, Bentley, Walton, Wang, Fortgang, Millner and Coppersmith2020; Starke, De Clercq, Borgwardt, & Elger, Reference Starke, De Clercq, Borgwardt and Elger2020). After these critical hurdles are taken, we envision acoustic analysis to have several valuable clinical implementations. For example, speech analyses could be used in primary care facilities as an initial screening instrument. Currently, diagnostic accuracy in primary care is low as the personnel is not specifically trained in psychiatric problems. Individuals with a high likelihood of a schizophrenia-spectrum disorder could for example be prioritized for a referral to a mental health care professional. Moreover, we expect that accuracy of models like these will significantly improve from the combination of different types of analyses, such as a combination of acoustic, semantic and syntactic analyses (de Boer et al., Reference de Boer, Brederoo, Voppel and Sommer2020). With sufficient levels of accuracy, possible clinical implementations could include relapse prediction, prediction of treatment efficacy, prognosis, early diagnosis and differential diagnosis.

Although the temporal stability of speech anomalies in schizophrenia-spectrum disorders has scarcely been studied (Cohen et al., Reference Cohen, Fedechko, Schwartz, Le, Foltz, Bernstein and Elvevåg2019), phonetic research has shown that some acoustic properties such as formant trajectories are highly stable over time (Hasan, Jamil, & Rahman, Reference Hasan, Jamil and Rahman2004; Ingram, Prandolini, & Ong, Reference Ingram, Prandolini and Ong2013; Nolan & Grigoras, Reference Nolan and Grigoras2005), making them suitable even for speaker identification. If acoustic speech properties are indeed highly personal and stable, deviations from such a stable personal pattern may very well be a cue for relapse into psychosis in an individual person. Further longitudinal studies are required to ascertain how speech disturbances in schizophrenia-spectrum disorders develop over time (Arevian et al., Reference Arevian, Bone, Malandrakis, Martinez, Wells, Miklowitz and Narayanan2020; Cohen et al., Reference Cohen, Fedechko, Schwartz, Le, Foltz, Bernstein and Elvevåg2019).

There were a number of limitations in the study. First, although conservative leave-ten-out cross-validation was employed, there remains a risk of overfitting. Replication in an independent dataset is therefore required. Secondly, although several variables that can influence the results have been explored, not all factors could be controlled for. For example, we had no data on smoking behavior, length or weight, which are known to influence some aspects of speech. Moreover, the recordings were made in different rooms, which may have affected some of the acoustic aspects of speech. Therefore, future research should assess the specificity of the classifier by controlling for these factors and by testing its performance in differential diagnosis of psychiatric disorders. Finally, it should be noted that we averaged the acoustic parameters over the interview duration (mean duration 10.8 min), which precludes dynamic differences over time. Further research should therefore also incorporate dynamic aspects of speech acoustics, such as convergence between speech partners, to fully acknowledge the within-subject variety of speech acoustics as well.

In conclusion, these findings show that standardized acoustic speech parameters can be used to accurately classify patients with a schizophrenia-spectrum disorder and healthy controls, as well as differentiate between patients with predominantly positive v. negatives symptoms. Our findings support the usefulness of computational tools to characterize complex human behavior such as speech. We strongly encourage the use of standardized open-source software packages such as eGeMAPS since they ensure easy comparison and replication across samples and even cross-linguistically.

Supplementary material

The supplementary material for this article can be found at https://doi.org/10.1017/S0033291721002804.

Data

The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available as they contain information that could compromise research participant privacy or consent.

Acknowledgements

We are grateful to all participants and research interns for their help with data collection and preparation.

Author contributions

Janna de Boer: conceptualization, Methodology, Formal analysis, Investigation, Writing – Original draft. Alban Voppel: Methodology, Software, Investigation. Sanne Brederoo: Writing – Review and Editing. Hugo Schnack: Writing - Review & Editing, Supervision. Khiet Truong: Writing - Review & Editing, Supervision, Frank Wijnen: Writing - Review & Editing, Supervision. Iris Sommer: Writing - Review & Editing, Supervision, Funding acquisition

Financial support

I. S. received a TOP grant from The Netherlands Organization for Health Research and Innovation (ZonMW, project: 91213009).

Conflict of interest

I. S. is a consultant to Gabather, received research support from Janssen Pharmaceuticals Inc. and Sunovion Pharmaceuticals Inc.

References

Alpert, M., Pouget, E. R., & Silva, R. (1995). Cues to the assessment of affects and moods: Speech fluency and pausing. Psychopharmacology Bulletin, 31(2), 421–424.Google Scholar

Alpert, M., Shaw, R. J., Pouget, E. R., & Lim, K. O. (2002). A comparison of clinical ratings with vocal acoustic measures of flat affect and alogia. Journal of Psychiatric Research, 36(5), 347–353.CrossRef Google Scholar PubMed

Andreasen, N. C., Flaum, M., & Arndt, S. (1992). The comprehensive assessment of symptoms and history (CASH): An instrument for assessing diagnosis and psychopathology. Archives of General Psychiatry, 49(8), 615–623.CrossRef Google Scholar PubMed

Arevian, A. C., Bone, D., Malandrakis, N., Martinez, V. R., Wells, K. B., Miklowitz, D. J., & Narayanan, S. (2020). Clinical state tracking in serious mental illness through computational analysis of speech. PLoS One, 15(1), e0225695.CrossRef Google Scholar PubMed

Bedi, G., Carrillo, F., Cecchi, G. A., Slezak, D. F. F., Sigman, M., Mota, N. B. N. B., … Corcoran, C. M. (2015). Automated analysis of free speech predicts psychosis onset in high-risk youths. NPJ Schizophrenia, 1(May), 15030.CrossRef Google Scholar PubMed

Bernardini, F., Lunden, A., Covington, M., Broussard, B., Halpern, B., Alolayan, Y., … Compton, M. T. (2016). Associations of acoustically measured tongue/jaw movements and portion of time speaking with negative symptom severity in patients with schizophrenia in Italy and the United States. Psychiatry Research, 239, 253–258. doi: 10.1016/j.psychres.2016.03.037CrossRef Google Scholar PubMed

Boersma, P., & Weenink, D. (2013). Praat software. University of Amsterdam: Amsterdam.Google Scholar

Cohen, A. S., Fedechko, T. L., Schwartz, E. K., Le, T. P., Foltz, P. W., Bernstein, J., … Elvevåg, B. (2019). Ambulatory vocal acoustics, temporal dynamics, and serious mental illness. Journal of Abnormal Psychology, 128(2), 97.CrossRef Google Scholar

Cohen, A. S., Mitchell, K. R., Docherty, N. M., & Horan, W. P. (2016). Vocal expression in schizophrenia: Less than meets the ear. Journal of Abnormal Psychology, 125(2), 299–309. doi: 10.1037/abn0000136CrossRef Google Scholar PubMed

Cohen, A. S., Mitchell, K. R., & Elvevåg, B. (2014). What do we really know about blunted vocal affect and alogia? A meta-analysis of objective assessments. Schizophrenia Research, 159(2–3), 533–538. doi: 10.1016/j.schres.2014.09.013CrossRef Google Scholar PubMed

Çokal, D., Zimmerer, V., Turkington, D., Ferrier, N., Varley, R., Watson, S., … Hinzen, W. (2019). Disturbing the rhythm of thought: Speech pausing patterns in schizophrenia, with and without formal thought disorder. PLoS One, 14(5), e0217404.CrossRef Google Scholar PubMed

Compton, M. T., Lunden, A., Cleary, S. D., Pauselli, L., Alolayan, Y., Halpern, B., … Covington, M. A. (2018). The aprosody of schizophrenia: Computationally derived acoustic phonetic underpinnings of monotone speech. Schizophrenia Research, 197, 392–399.CrossRef Google Scholar PubMed

Corcoran, C. M., Carrillo, F., Fernández-Slezak, D., Bedi, G., Klim, C., Javitt, D. C., … Cecchi, G. A. (2018). Prediction of psychosis across protocols and risk cohorts using automated language analysis. World Psychiatry, 17(1), 67–75. doi: 10.1002/wps.20491CrossRef Google Scholar PubMed

Corcoran, C. M., Mittal, V. A., Bearden, C. E., Gur, R. E., Hitczenko, K., Bilgrami, Z., … Wolff, P. (2020). Language as a biomarker for psychosis: A natural language processing approach. Schizophrenia Research, 226, 158–166.CrossRef Google Scholar PubMed

Covington, M. A., Lunden, S. L. A., Cristofaro, S. L., Wan, C. R., Bailey, C. T., Broussard, B., … Compton, M. T. (2012). Phonetic measures of reduced tongue movement correlate with negative symptom severity in hospitalized patients with first-episode schizophrenia-spectrum disorders. Schizophrenia Research, 142(1–3), 93–95.CrossRef Google Scholar

Cuthbert, B. N., & Insel, T. R. (2013). Toward the future of psychiatric diagnosis: The seven pillars of RDoC. BMC Medicine, 11(1), 126.CrossRef Google Scholar PubMed

de Boer, J. N., Brederoo, S. G., Voppel, A. E., & Sommer, I. E. C. (2020). Anomalies in language as a biomarker for schizophrenia. Current Opinion in Psychiatry, 33(3), 212–218. doi: 10.1097/YCO.0000000000000595.CrossRef Google Scholar PubMed

de Boer, J., van Hoogdalem, M., Mandl, R., Brummelman, J., Voppel, A., Begemann, M., … Sommer, I. (2020). Language in schizophrenia: Relation with diagnosis, symptomatology and white matter tracts. NPJ Schizophrenia, 6(1), 1–10. doi: 10.1038/s41537-020-0099-3CrossRef Google Scholar PubMed

de Boer, J., Voppel, A. E., Begemann, M. J. H., Schnack, H. G., Wijnen, F., & Sommer, I. E. C. (2018). Clinical use of semantic space models in psychiatry and neurology: A systematic review and meta-analysis. Neuroscience & Biobehavioral Reviews, 93, 85–92.CrossRef Google Scholar

de Boer, J., Voppel, A. E., Brederoo, S. G., Wijnen, F. N. K., & Sommer, I. E. C. (2020). Language disturbances in schizophrenia: The relation with antipsychotic medication. npj Schizophrenia, 6(1), 1–9. Retrieved from doi:10.1038/s41537-020-00114-3CrossRef Google Scholar PubMed

Dukart, J., Weis, S., Genon, S., & Eickhoff, S. B. (2021). Towards increasing the clinical applicability of machine learning biomarkers in psychiatry. Nature Human Behaviour, 5(4), 431–432.CrossRef Google Scholar PubMed

Eyben, F., Scherer, K. R., Schuller, B. W., Sundberg, J., Andre, E., Busso, C., … Truong, K. P. (2016). The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Transactions on Affective Computing, 7(2), 190–202. doi: 10.1109/TAFFC.2015.2457417CrossRef Google Scholar

Eyben, F., Weninger, F., Gross, F., & Schuller, B. (2013). Recent developments in OpenSMILE, the Munich open-source multimedia feature extractor. Proceedings of the 21st ACM International Conference on Multimedia, 835–838. ACM.CrossRef Google Scholar

First, M. B. (2014). Structured clinical interview for the DSM (SCID). The Encyclopedia of Clinical Psychology, 1–6.Google Scholar

Fusar-Poli, P., Papanastasiou, E., Stahl, D., Rocchetti, M., Carpenter, W., Shergill, S., & McGuire, P. (2015). Treatments of negative symptoms in schizophrenia: Meta-analysis of 168 randomized placebo-controlled trials. Schizophrenia Bulletin, 41(4), 892–899.CrossRef Google Scholar PubMed

Goudbeek, M., & Scherer, K. (2010). Beyond arousal: Valence and potency/control cues in the vocal expression of emotion. The Journal of the Acoustical Society of America, 128(3), 1322–1336.CrossRef Google Scholar PubMed

Hasan, M. R., Jamil, M., & Rahman, M. (2004). Speaker identification using Mel frequency Cepstral coefficients. Variations, 1(4), 565–568.Google Scholar

Hitczenko, K., Mittal, V. A., & Goldrick, M. (2020). Understanding language abnormalities and associated clinical markers in psychosis: The promise of computational methods. Schizophrenia Bulletin, 47(2), 344–362.CrossRef Google Scholar

Holmlund, T. B., Chandler, C., Foltz, P. W., Cohen, A. S., Cheng, J., Bernstein, J. C., … Elvevåg, B. (2020). Applying speech technologies to assess verbal memory in patients with serious mental illness. npj Digital Medicine, 3(1), 1–8.CrossRef Google Scholar PubMed

Ingram, J. C. L., Prandolini, R., & Ong, S. (2013). Formant trajectories as indices of phonetic variation for speaker identification. International Journal of Speech Language and the Law, 3(1), 129–145.CrossRef Google Scholar

Insel, T. R. (2014). The NIMH research domain criteria (RDoC) project: Precision medicine for psychiatry. American Journal of Psychiatry, 171(4), 395–397.CrossRef Google Scholar PubMed

Insel, T. R. (2017). Digital phenotyping: Technology for a new science of behavior. Jama, 318(13), 1215–1216.CrossRef Google Scholar PubMed

Jacobson, N. C., Bentley, K. H., Walton, A., Wang, S. B., Fortgang, R. G., Millner, A. J., … Coppersmith, D. D. L. (2020). Ethical dilemmas posed by mobile health and machine learning in psychiatry research. Bulletin of the World Health Organization, 98(4), 270.CrossRef Google Scholar PubMed

Kay, S. R., Fiszbein, A., & Opfer, L. A. (1987). The Positive and Negative Syndrome Scale (PANSS) for schizophrenia. Schizophrenia Bulletin, 13(2), 261.CrossRef Google Scholar PubMed

Keefe, R. S. E., Goldberg, T. E., Harvey, P. D., Gold, J. M., Poe, M. P., & Coughenour, L. (2004). The brief assessment of cognition in schizophrenia: Reliability, sensitivity, and comparison with a standard neurocognitive battery. Schizophrenia Research, 68(2–3), 283–297.CrossRef Google Scholar PubMed

Keefe, R. S. E., Harvey, P. D., Goldberg, T. E., Gold, J. M., Walker, T. M., Kennel, C., & Hawkins, K. (2008). Norms and standardization of the brief assessment of cognition in schizophrenia (BACS). Schizophrenia Research, 102(1–3), 108–115.CrossRef Google Scholar PubMed

Kølbæk, P., Blicher, A. B., Buus, C. W., Feller, S. G., Holm, T., Dines, D., … Correll, C. U. (2018). Inter-rater reliability of ratings on the six-item Positive and Negative Syndrome Scale (PANSS-6) obtained using the Simplified Negative and Positive Symptoms Interview (SNAPSI). Nordic Journal of Psychiatry, 72(6), 431–436.CrossRef Google Scholar PubMed

Leucht, S., Samara, M., Heres, S., Patel, M. X., Woods, S. W., & Davis, J. M. (2014). Dose equivalents for second-generation antipsychotics: The minimum effective dose method. Schizophrenia Bulletin, 40(2), 314–326.CrossRef Google Scholar PubMed

Lin, C.-H., Huang, C.-L., Chang, Y.-C., Chen, P.-W., Lin, C.-Y., Tsai, G. E., & Lane, H.-Y. (2013). Clinical symptoms, mainly negative symptoms, mediate the influence of neurocognition and social cognition on functional outcome of schizophrenia. Schizophrenia Research, 146(1–3), 231–237.CrossRef Google Scholar PubMed

Marder, S. R., & Galderisi, S. (2017). The current conceptualization of negative symptoms in schizophrenia. World Psychiatry, 16(1), 14–24.CrossRef Google Scholar PubMed

Marmar, C. R., Brown, A. D., Qian, M., Laska, E., Siegel, C., Li, M., … Vergyri, D. (2019). Speech-based markers for posttraumatic stress disorder in US veterans. Depression and Anxiety, 36(7), 607–616.CrossRef Google Scholar PubMed

Martínez-Sánchez, F., Muela-Martínez, J. A., Cortés-Soto, P., García Meilán, J. J. O. S., Vera Ferrándiz, J. A. N., Egea Caparrós, A., & Pujante Valverde, I. M. A. (2015). Can the acoustic analysis of expressive prosody discriminate schizophrenia? The Spanish Journal of Psychology, 18(2015), E86.CrossRef Google Scholar PubMed

Meaux, L. T., Mitchell, K. R., & Cohen, A. S. (2018). Blunted vocal affect and expression is not associated with schizophrenia: A computerized acoustic analysis of speech under ambiguous conditions. Comprehensive Psychiatry, 83, 84–88. doi: 10.1016/j.comppsych.2018.03.009CrossRef Google Scholar

Milev, P., Ho, B.-C., Arndt, S., & Andreasen, N. C. (2005). Predictive values of neurocognition and negative symptoms on functional outcome in schizophrenia: A longitudinal first-episode study with 7-year follow-up. American Journal of Psychiatry, 162(3), 495–506.CrossRef Google Scholar PubMed

Mota, N. B., Furtado, R., Maia, P. P. C., Copelli, M., & Ribeiro, S. (2014). Graph analysis of dream reports is especially informative about psychosis. Scientific Reports, 4, 3691.CrossRef Google Scholar PubMed

Nolan, F., & Grigoras, C. (2005). A case for formant analysis in forensic speaker identification. International Journal of Speech, Language and the Law, 12(2), 143–173.CrossRef Google Scholar

Oomen, P., de Boer, J. N., Brederoo, S. G., Voppel, A. E., Brand, B. A., Wijnen, F. N. K., … Sommer, I. E. C. (2021). Characterizing speech heterogeneity in schizophrenia-spectrum disorders. Journal of Abnormal Psychology, In press.Google Scholar

Palaniyappan, L., Mota, N. B., Oowise, S., Balain, V., Copelli, M., Ribeiro, S., & Liddle, P. F. (2019). Speech structure links the neural and socio-behavioural correlates of psychotic disorders. Progress in Neuro-Psychopharmacology and Biological Psychiatry, 88(April 2018), 112–120. doi: 10.1016/j.pnpbp.2018.07.007CrossRef Google Scholar PubMed

Parola, A., Simonsen, A., Bliksted, V., & Fusaroli, R. (2020). Voice patterns in schizophrenia: A systematic review and Bayesian meta-analysis. Schizophrenia Research, 216, 24–40. doi: 10.1093/schbul/sby016.414 LKCrossRef Google Scholar PubMed

Rezaii, N., Walker, E., & Wolff, P. (2019). A machine learning approach to predicting psychosis using semantic density and latent content analysis. npj Schizophrenia, 5(1), 1–12.CrossRef Google Scholar PubMed

Schnack, H. (2020). Bias, noise, and interpretability in machine learning: From measurements to features. In Mechelli, A. & Vieira, S. (Eds.), Machine learning: Methods and applications to brain disorders (pp. 307–328). Cambridge, Massachusetts: Elsevier.CrossRef Google Scholar

Sheehan, D. V., Lecrubier, Y., Sheehan, K. H., Amorim, P., Janavs, J., Weiller, E., … Dunbar, G. C. (1998). The mini-international neuropsychiatric interview (MINI): The development and validation of a structured diagnostic psychiatric interview for DSM-IV and ICD-10. Journal of Clinical Psychiatry, 59(Suppl 20), 22–33.Google Scholar PubMed

Simantiraki, O., Giannakakis, G., Pampouchidou, A., & Tsiknakis, M. (2016). Stress Detection from Speech Using Spectral Slope Measurements.Google Scholar

Starke, G., De Clercq, E., Borgwardt, S., & Elger, B. S. (2020). Computing schizophrenia: Ethical challenges for machine learning in psychiatry. Psychological Medicine, 1–7.Google Scholar PubMed

Tahir, Y., Yang, Z., Chakraborty, D., Thalmann, N., Thalmann, D., Maniam, Y., … Dauwels, J. (2019). Non-verbal speech cues as objective measures for negative symptoms in patients with schizophrenia. PLoS One, 14(4), 1–17.CrossRef Google Scholar PubMed

Tan, E. J., & Rossell, S. L. (2020). Questioning the status of aberrant speech patterns as psychiatric symptoms. The British Journal of Psychiatry, 217(3), 469–470.CrossRef Google Scholar PubMed

Voppel, A., de Boer, J., Brederoo, S., Schnack, H., & Sommer, I. (2021). Quantified language connectedness in schizophrenia-spectrum disorders. Psychiatry Research, 304, 114130.CrossRef Google Scholar PubMed

Table 1. Demographics

Table 2. Classification of performance metrics

Table 3. Top ten acoustic parameters in the diagnostic classifier (schizophrenia-spectrum disorder v. healthy control)

Table 4. Top ten acoustic parameters in the psychotic symptoms classifier (positive v. negative symptoms)

Table 5. Associations between top acoustic parameters and psychotic and cognitive symptoms

de Boer et al. supplementary material

File 19.6 KB

Article contents

Acoustic speech markers for schizophrenia-spectrum disorders: a diagnostic and symptom-recognition tool

Abstract

Keywords

Introduction

Methods

Participants

Procedure

Speech recording

Speech pre-processing

Acoustic parameters

Cognitive functioning

Statistical analysis

Results

Diagnostic classifier

Psychotic symptoms classifier

Associations between acoustic variables and clinical characteristics

Psychotic and cognitive symptoms

Antipsychotic medication

Discussion

Supplementary material

Data

Acknowledgements

Author contributions

Financial support

Conflict of interest

References

de Boer et al. supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests