Hostname: page-component-75d7c8f48-fdrlk Total loading time: 0 Render date: 2026-03-25T10:16:39.743Z Has data issue: false hasContentIssue false

When artificial intelligence speaks: psychologically adverse effects of the shift from text- to voice-based chatbots

Published online by Cambridge University Press:  13 February 2026

Marc Augustin*
Affiliation:
Protestant University of Applied Sciences Bochum , Germany
Søren Dinesen Østergaard
Affiliation:
Department of Clinical Medicine, Aarhus University, Aarhus, Denmark Department of Affective Disorders, Aarhus University Hospital - Psychiatry, Aarhus, Denmark Institute for Mental and Physical Health and Clinical Translation (IMPACT), Deakin University, Australia
*
Corresponding author: Marc Augustin; Email: marc.augustin@evh-bochum.de
Rights & Permissions [Opens in a new window]

Abstract

Information

Type
Editorial
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NC
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial licence (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original article is properly cited. The written permission of Cambridge University Press or the rights holder(s) must be obtained prior to any commercial use.
Copyright
© The Author(s), 2026. Published by Cambridge University Press on behalf of Scandinavian College of Neuropsychopharmacology

Worldwide, around 800 million people use ChatGPT every week, making it the most prominent chatbot based on generative artificial intelligence (AI chatbot) (Bellan, Reference Bellan2025). While AI chatbots and their underlying technology arguably have a positive potential, there are reports about psychologically adverse effects, such as AI-associated delusions, mania, or suicidality, that affect a fraction of users (Augustin, Reference Augustin2025; Hill, Reference Hill2025; Morrin et al., Reference Morrin, Nicholls, Levin, Yiend, Iyengar, DelGuidice, Bhattacharyya, MacCabe, Tognin, Twumasi, Alderson-Day and Pollak2025; Olsen et al., Reference Olsen, Reinecke-Tellefsen and Østergaard2025; Østergaard, Reference Østergaard2025a, Reference Østergaard2025b; Pierre et al., Reference Pierre, Gaeta, Raghavan and Sarma2025). Regarding AI-associated delusions or mania, OpenAI – the tech company behind ChatGPT – estimated in a press release from October 2025, ‘[…] that around 0.07% of users active in a given week indicate possible signs of mental health emergencies related to psychosis or mania’ (OpenAI, 2025). Indicators of ‘potential suicidal planning or intent’ were present in 0.15% of weekly active users (OpenAI, 2025). With 800 million users, if assuming that these estimates are valid (OpenAI has provided very little context about how the estimation was carried out), this amounts to half a million individuals worldwide who have chat interactions that show possible signs of psychosis or mania, and around 1.2 million users with potential suicidal planning or intent.

Notably, all of this happened with AI chatbots producing mostly text output, a mode of communication that might be considered more distant and cognitively demanding than hearing direct spoken language. The advanced voice mode, which made ChatGPT sound ‘more natural and expressive’, was released to all free users in July 2025, expanding access that had previously been available only to paying subscribers (ChatGPT – Release Notes, 2026). First reports about AI-associated delusions and mania preceded that release by months (Morrin et al., Reference Morrin, Nicholls, Levin, Yiend, Iyengar, DelGuidice, Bhattacharyya, MacCabe, Tognin, Twumasi, Alderson-Day and Pollak2025). While, to our knowledge, no public data is available on the proportion of text versus voice use in AI chatbot interactions, OpenAI recently reported an increase in the use of dictation and conversational features in its ChatGPT app over the past year (Mims, Reference Mims2026). Also, the company is developing a new device focused on dialogue (Mims, Reference Mims2026). In a similar way, Meta offers smart glasses with microphones and ear speakers, and Apple plans to extend AirPod capabilities to enable voice-based AI chatbot interaction (Mims, Reference Mims2026). Therefore, the primary mode of communication with AI chatbots will likely soon shift from text to voice. Here, we hypothesize that this shift in modality will affect phenomena such as AI chatbot-associated delusions, mania, and suicidal ideation. Specifically, as voice is more immersive and may further blur perceptual boundaries between humans and AI chatbots, we propose that the shift from text to voice is likely to accentuate these severe psychological adverse effects.

While humans are hardwired to hear spoken language, reading text is an acquired skill. Auditory input, such as hearing a mother’s voice, is the primary mode of language processing in an infant. For the first few years, language processing in children is entirely focused on spoken language. Such input is directly processed by brain areas that are poised to adapt to spoken language. Human auditory dorsal and ventral fiber tracks have evolutionary roots in other primate species, such as marmosets and macaques, a finding that supports the notion that language-processing brain areas can be considered ‘old’ from an evolutionary perspective (Aboitiz, Reference Aboitiz2018; Zhang et al., Reference Zhang, Shen, Bibic and Wang2024). Reading written language, on the other hand, typically begins much later, when children start attending school. Compared to how the brain processes spoken language, reading text is a much ‘younger’ brain skill, dating back only several thousand years to the invention of writing systems. According to the cortical recycling hypothesis, neural plasticity for culturally newer processes, such as reading, develops upon ‘older’ neural circuits that were developed for the processing of other stimuli, such as moving limbs or handling tools (Kubota et al., Reference Kubota, Grill-Spector and Nordt2024). While earlier findings supported the concept that hearing and reading language activate different brain networks (Buchweitz et al., Reference Buchweitz, Mason, Tomitch and Just2009), a more recent study showed that both spoken and written language activate the same regions overall (Deniz et al., Reference Deniz, Nunez-Elizalde, Huth and Gallant2019).

Despite the overlap in brain network processing, reading – as an acquired skill – likely provides greater distance from what is communicated than spoken language. Indeed, text lacks paralinguistic information that voice can convey, such as tone, pitch, rhythm, or emphasis. Such features may explain why voice-based, compared to text-based reviews, are more effective at altering consumer behavior (Flavián et al., Reference Flavián, Akdim and Casaló2023). In a similar consumer setting, voice-based assistants were perceived as more efficient, satisfying, and enjoyable while requiring less cognitive effort compared with text-based chatbots (Rzepka et al., Reference Rzepka, Berger and Hess2022). Notably, a randomized controlled study coauthored by OpenAI showed that psychosocial outcomes differed, depending on whether ChatGPT interactions were based on text, a neutral, or an engaging voice (Fang et al., Reference Fang, Liu, Danry, Lee, Chan, Pataranutaporn, Maes, Phang, Lampe, Ahmad and Agarwal2025). Participants spent more time with the voice-based ChatGPT than with the text-based version, suggesting greater engagement. Voice mode initially appeared to be associated with better outcomes, such as reduced loneliness. However, longer engagement was overall linked to more negative outcomes, leading to the following conclusions by the authors: ‘This implies that as people spend more time daily with the AI, the positive effects associated with voice modalities might diminish or become negative. The neutral voice modality in particular potentially leads to less socialization with real people and more problematic use of AI chatbots compared to using text’. (Fang et al., Reference Fang, Liu, Danry, Lee, Chan, Pataranutaporn, Maes, Phang, Lampe, Ahmad and Agarwal2025).

The alignment of the evolutionary age and significance of speech processing in the human brain with the early findings regarding the ‘superiority’ of voice-based compared to text-based digital assistants/AI chatbots when it comes to user engagement, is the root of our concern. Specifically, we find it more likely than unlikely that this leap in anthropomorphism will translate into AI chatbots being associated with even higher risks of psychologically adverse events than is currently the case. Already, the text-based AI chatbots seem to be remarkably ‘effective’ in eliciting and maintaining delusions and mania (Augustin, Reference Augustin2025; Morrin et al., Reference Morrin, Nicholls, Levin, Yiend, Iyengar, DelGuidice, Bhattacharyya, MacCabe, Tognin, Twumasi, Alderson-Day and Pollak2025; Østergaard, Reference Østergaard2025a), but the voice-based versions will probably be next-level in this regard for a variety of reasons: The human-like conversations are more salient and, thus, more likely to be taken at face value with cognitive guards down. The emotional impact likely increases when being spoken to. The interaction is faster, with speech recognition almost three times as fast as typing in English language (Ruan et al., Reference Ruan, Wobbrock, Liou, Ng and Landay2018). It is also more seamless, as reading and typing take time, thereby removing natural breaks for reflection and push-back, which are likely essential for maintaining reality testing.

In conclusion, we argue that the change from text to voice mode risks intensifying adverse effects of AI chatbots such as delusions, mania, and suicidal ideation – or presents other mental health risks related to increased immersion (e.g., addictive use of AI chatbots). Spoken language, even when generated by AI chatbots, activates evolutionarily older neural pathways, conveys richer paralinguistic information, and appears to foster greater engagement. Indeed, early evidence suggests that voice-based AI chatbot interactions are associated with increased usage time and potentially more negative psychosocial outcomes. As AI chatbots become increasingly voice-enabled, clinicians, researchers, and technology developers should monitor whether this modality shift alters the incidence, severity, or phenomenology of AI-associated mental health crises. Also, experimental studies, with appropriate safety precautions, directly comparing the psychological consequences of text- and voice-based AI chatbot use in vulnerable populations are needed.

Data availability statement

This editorial did not involve data.

Acknowledgements

None.

Author contributions

MA conceived and drafted this editorial, which was subsequently revised for important intellectual content by SDØ.

Funding statement

There was no specific funding for this editorial. Outside this work, SDØ reports funding from Independent Research Fund Denmark (grant numbers: 7016-00048B and 2096-00055A), the Lundbeck Foundation (grant numbers: R358-2020-2341 and R344-2020-1073), the Danish Cancer Society (grant number: R283-A16461), the Central Denmark Region Fund for Strengthening of Health Science (grant number: 1-36-72-4-20), and the Danish Agency for Digitization Investment Fund for New Technologies (grant number: 2020-6720). These funders played no role in relation to this editorial.

Competing interests

MA holds minority shares in Vivam GmbH and provides unpaid consultancy services to the same company. He has contributed to the patent application WO2025109222A1. He owns/has owned stocks with stock tickers ABEA, ABNB, AFX, ALV, MO, T, AMV0, BAS, BAYN, BNTX, BTI, CSCO, KO, CON, DHL, DTE, EQNR, IBM, LNTH, MC, MCD, MRNA, MUV2, NSRGY, NEE, NVO, PANW, P911, PYPL, PEP, RED, RIO, SRPT, SESG, SHEL, SBUX, TTE, VZ, VOW3, VNA. SDØ received the 2020 Lundbeck Foundation Young Investigator Prize. SDØ owns/has owned units of mutual funds with stock tickers DKIGI, IAIMWC, SPIC25 KL, and WEKAFKI, and owns/has owned units of exchange-traded funds with stock tickers BATE, TRET, QDV5, QDVH, QDVE, SADM, IQQH, USPY, EXH2, 2B76, IS4S, OM3X, EUNL, and SXRV.

References

Aboitiz, F (2018) A brain for speech: Evolutionary continuity in primate and human auditory-vocal processing. Frontiers in Neuroscience 12, 174.Google Scholar
Augustin, M (2025) AI-associated psychosis: Evidence from first cases. Der Nervenarzt 96(7), 699.Google Scholar
Bellan, R (2025) Sam Altman Says ChatGPT has Hit 800M Weekly Active Users. TechCrunch. Available at https://techcrunch.com/2025/10/06/sam-altman-says-chatgpt-has-hit-800m-weekly-active-users/ (accessed 26 January 2026).Google Scholar
Buchweitz, A, Mason, RA, Tomitch, LMB and Just, MA (2009) Brain activation for reading and listening comprehension: An fMRI study of modality effects and individual differences in language comprehension. Psychology & Neuroscience 2, 111123.Google Scholar
ChatGPT — Release Notes (2026) OpenAI Help Center. Available at https://help.openai.com/en/articles/6825453-chatgpt-release-notes (accessed 26 January 2026).Google Scholar
Deniz, F, Nunez-Elizalde, AO, Huth, AG and Gallant, JL (2019) The representation of semantic information across human cerebral cortex during listening versus reading is invariant to stimulus modality. Journal of Neuroscience 39(39), 77227736.Google Scholar
Fang, CM, Liu, AR, Danry, V, Lee, E, Chan, SWT, Pataranutaporn, P, Maes, P, Phang, J, Lampe, M, Ahmad, L and Agarwal, S (2025) How AI and Human Behaviors Shape Psychosocial Effects of Extended Chatbot Use: A Longitudinal Randomized Controlled Study. Available at https://arxiv.org/html/2503.17473v1 (accessed 26 January 2026).Google Scholar
Flavián, C, Akdim, K and Casaló, LV (2023) Effects of voice assistant recommendations on consumer behavior. Psychology & Marketing 40(2), 328346.Google Scholar
Hill, K (2025) A Teen was Suicidal. ChatGPT was the Friend he Confided in. The New York Times . Available at https://www.nytimes.com/2025/08/26/technology/chatgpt-openai-suicide.html (accessed 27 January 2026).Google Scholar
Kubota, E, Grill-Spector, K and Nordt, M (2024) Rethinking cortical recycling in ventral temporal cortex. Trends in Cognitive Sciences 28(1), 817.Google Scholar
Mims, C (2026) Our Gadgets Finally Speak Human, and Tech will Never be the Same. Wall Street Journal . Available at https://www.wsj.com/tech/ai/voice-technology-ai-hardware-4d39f6d2 (accessed 25 January 2026).Google Scholar
Morrin, H, Nicholls, L, Levin, M, Yiend, J, Iyengar, U, DelGuidice, F, Bhattacharyya, S, MacCabe, J, Tognin, S, Twumasi, R, Alderson-Day, B and Pollak, T (2025) Delusions by Design? How Everyday AIs Might be Fuelling Psychosis (and what can be done about it). Available at https://doi.org/10.31234/osf.io/cmy7n_v6 (accessed 27 January 2026).Google Scholar
Olsen, SG, Reinecke-Tellefsen, CJ and Østergaard, SD (2025) Potentially Harmful Consequences of Artificial Intelligence (AI) Chatbot Use among Patients with Mental Illness: Early Data from a Large Psychiatric Service System. https://doi.org/10.1101/2025.11.19.25340580.Google Scholar
OpenAI (2025) Strengthening ChatGPT’s Responses in Sensitive Conversations. Available at https://openai.com/index/strengthening-chatgpt-responses-in-sensitive-conversations/ (accessed 26 January 2026).Google Scholar
Østergaard, SD (2025a) Emotion contagion through interaction with generative artificial intelligence chatbots may contribute to development and maintenance of mania. Acta Neuropsychiatrica 37, e79.Google Scholar
Østergaard, SD (2025b) Generative artificial intelligence Chatbots and delusions: From guesswork to emerging cases. Acta Psychiatrica Scandinavica 152(4), 257259.Google Scholar
Pierre, J, Gaeta, B, Raghavan, G and Sarma, K (2025) “You’re Not Crazy”: A case of new-onset AI-associated psychosis. Innovations in Clinical Neuroscience 22(10–12), 1113.Google Scholar
Ruan, S, Wobbrock, JO, Liou, K, Ng, A and Landay, JA (2018) Comparing speech and keyboard text entry for short messages in two languages on touchscreen phones. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1(4), 23.Google Scholar
Rzepka, C, Berger, B and Hess, T (2022) Voice assistant vs. Chatbot – Examining the fit between conversational agents’ interaction modalities and information search tasks. Information Systems Frontiers 24(3), 839856.Google Scholar
Zhang, Y, Shen, SX, Bibic, A and Wang, X (2024) Evolutionary continuity and divergence of auditory dorsal and ventral pathways in primates revealed by ultra-high field diffusion MRI. Proceedings of The National Academy of Sciences of The United States of America 121(9), e2313831121.Google Scholar