Worldwide, around 800 million people use ChatGPT every week, making it the most prominent chatbot based on generative artificial intelligence (AI chatbot) (Bellan, Reference Bellan2025). While AI chatbots and their underlying technology arguably have a positive potential, there are reports about psychologically adverse effects, such as AI-associated delusions, mania, or suicidality, that affect a fraction of users (Augustin, Reference Augustin2025; Hill, Reference Hill2025; Morrin et al., Reference Morrin, Nicholls, Levin, Yiend, Iyengar, DelGuidice, Bhattacharyya, MacCabe, Tognin, Twumasi, Alderson-Day and Pollak2025; Olsen et al., Reference Olsen, Reinecke-Tellefsen and Østergaard2025; Østergaard, Reference Østergaard2025a, Reference Østergaard2025b; Pierre et al., Reference Pierre, Gaeta, Raghavan and Sarma2025). Regarding AI-associated delusions or mania, OpenAI – the tech company behind ChatGPT – estimated in a press release from October 2025, ‘[…] that around 0.07% of users active in a given week indicate possible signs of mental health emergencies related to psychosis or mania’ (OpenAI, 2025). Indicators of ‘potential suicidal planning or intent’ were present in 0.15% of weekly active users (OpenAI, 2025). With 800 million users, if assuming that these estimates are valid (OpenAI has provided very little context about how the estimation was carried out), this amounts to half a million individuals worldwide who have chat interactions that show possible signs of psychosis or mania, and around 1.2 million users with potential suicidal planning or intent.
Notably, all of this happened with AI chatbots producing mostly text output, a mode of communication that might be considered more distant and cognitively demanding than hearing direct spoken language. The advanced voice mode, which made ChatGPT sound ‘more natural and expressive’, was released to all free users in July 2025, expanding access that had previously been available only to paying subscribers (ChatGPT – Release Notes, 2026). First reports about AI-associated delusions and mania preceded that release by months (Morrin et al., Reference Morrin, Nicholls, Levin, Yiend, Iyengar, DelGuidice, Bhattacharyya, MacCabe, Tognin, Twumasi, Alderson-Day and Pollak2025). While, to our knowledge, no public data is available on the proportion of text versus voice use in AI chatbot interactions, OpenAI recently reported an increase in the use of dictation and conversational features in its ChatGPT app over the past year (Mims, Reference Mims2026). Also, the company is developing a new device focused on dialogue (Mims, Reference Mims2026). In a similar way, Meta offers smart glasses with microphones and ear speakers, and Apple plans to extend AirPod capabilities to enable voice-based AI chatbot interaction (Mims, Reference Mims2026). Therefore, the primary mode of communication with AI chatbots will likely soon shift from text to voice. Here, we hypothesize that this shift in modality will affect phenomena such as AI chatbot-associated delusions, mania, and suicidal ideation. Specifically, as voice is more immersive and may further blur perceptual boundaries between humans and AI chatbots, we propose that the shift from text to voice is likely to accentuate these severe psychological adverse effects.
While humans are hardwired to hear spoken language, reading text is an acquired skill. Auditory input, such as hearing a mother’s voice, is the primary mode of language processing in an infant. For the first few years, language processing in children is entirely focused on spoken language. Such input is directly processed by brain areas that are poised to adapt to spoken language. Human auditory dorsal and ventral fiber tracks have evolutionary roots in other primate species, such as marmosets and macaques, a finding that supports the notion that language-processing brain areas can be considered ‘old’ from an evolutionary perspective (Aboitiz, Reference Aboitiz2018; Zhang et al., Reference Zhang, Shen, Bibic and Wang2024). Reading written language, on the other hand, typically begins much later, when children start attending school. Compared to how the brain processes spoken language, reading text is a much ‘younger’ brain skill, dating back only several thousand years to the invention of writing systems. According to the cortical recycling hypothesis, neural plasticity for culturally newer processes, such as reading, develops upon ‘older’ neural circuits that were developed for the processing of other stimuli, such as moving limbs or handling tools (Kubota et al., Reference Kubota, Grill-Spector and Nordt2024). While earlier findings supported the concept that hearing and reading language activate different brain networks (Buchweitz et al., Reference Buchweitz, Mason, Tomitch and Just2009), a more recent study showed that both spoken and written language activate the same regions overall (Deniz et al., Reference Deniz, Nunez-Elizalde, Huth and Gallant2019).
Despite the overlap in brain network processing, reading – as an acquired skill – likely provides greater distance from what is communicated than spoken language. Indeed, text lacks paralinguistic information that voice can convey, such as tone, pitch, rhythm, or emphasis. Such features may explain why voice-based, compared to text-based reviews, are more effective at altering consumer behavior (Flavián et al., Reference Flavián, Akdim and Casaló2023). In a similar consumer setting, voice-based assistants were perceived as more efficient, satisfying, and enjoyable while requiring less cognitive effort compared with text-based chatbots (Rzepka et al., Reference Rzepka, Berger and Hess2022). Notably, a randomized controlled study coauthored by OpenAI showed that psychosocial outcomes differed, depending on whether ChatGPT interactions were based on text, a neutral, or an engaging voice (Fang et al., Reference Fang, Liu, Danry, Lee, Chan, Pataranutaporn, Maes, Phang, Lampe, Ahmad and Agarwal2025). Participants spent more time with the voice-based ChatGPT than with the text-based version, suggesting greater engagement. Voice mode initially appeared to be associated with better outcomes, such as reduced loneliness. However, longer engagement was overall linked to more negative outcomes, leading to the following conclusions by the authors: ‘This implies that as people spend more time daily with the AI, the positive effects associated with voice modalities might diminish or become negative. The neutral voice modality in particular potentially leads to less socialization with real people and more problematic use of AI chatbots compared to using text’. (Fang et al., Reference Fang, Liu, Danry, Lee, Chan, Pataranutaporn, Maes, Phang, Lampe, Ahmad and Agarwal2025).
The alignment of the evolutionary age and significance of speech processing in the human brain with the early findings regarding the ‘superiority’ of voice-based compared to text-based digital assistants/AI chatbots when it comes to user engagement, is the root of our concern. Specifically, we find it more likely than unlikely that this leap in anthropomorphism will translate into AI chatbots being associated with even higher risks of psychologically adverse events than is currently the case. Already, the text-based AI chatbots seem to be remarkably ‘effective’ in eliciting and maintaining delusions and mania (Augustin, Reference Augustin2025; Morrin et al., Reference Morrin, Nicholls, Levin, Yiend, Iyengar, DelGuidice, Bhattacharyya, MacCabe, Tognin, Twumasi, Alderson-Day and Pollak2025; Østergaard, Reference Østergaard2025a), but the voice-based versions will probably be next-level in this regard for a variety of reasons: The human-like conversations are more salient and, thus, more likely to be taken at face value with cognitive guards down. The emotional impact likely increases when being spoken to. The interaction is faster, with speech recognition almost three times as fast as typing in English language (Ruan et al., Reference Ruan, Wobbrock, Liou, Ng and Landay2018). It is also more seamless, as reading and typing take time, thereby removing natural breaks for reflection and push-back, which are likely essential for maintaining reality testing.
In conclusion, we argue that the change from text to voice mode risks intensifying adverse effects of AI chatbots such as delusions, mania, and suicidal ideation – or presents other mental health risks related to increased immersion (e.g., addictive use of AI chatbots). Spoken language, even when generated by AI chatbots, activates evolutionarily older neural pathways, conveys richer paralinguistic information, and appears to foster greater engagement. Indeed, early evidence suggests that voice-based AI chatbot interactions are associated with increased usage time and potentially more negative psychosocial outcomes. As AI chatbots become increasingly voice-enabled, clinicians, researchers, and technology developers should monitor whether this modality shift alters the incidence, severity, or phenomenology of AI-associated mental health crises. Also, experimental studies, with appropriate safety precautions, directly comparing the psychological consequences of text- and voice-based AI chatbot use in vulnerable populations are needed.
Data availability statement
This editorial did not involve data.
Acknowledgements
None.
Author contributions
MA conceived and drafted this editorial, which was subsequently revised for important intellectual content by SDØ.
Funding statement
There was no specific funding for this editorial. Outside this work, SDØ reports funding from Independent Research Fund Denmark (grant numbers: 7016-00048B and 2096-00055A), the Lundbeck Foundation (grant numbers: R358-2020-2341 and R344-2020-1073), the Danish Cancer Society (grant number: R283-A16461), the Central Denmark Region Fund for Strengthening of Health Science (grant number: 1-36-72-4-20), and the Danish Agency for Digitization Investment Fund for New Technologies (grant number: 2020-6720). These funders played no role in relation to this editorial.
Competing interests
MA holds minority shares in Vivam GmbH and provides unpaid consultancy services to the same company. He has contributed to the patent application WO2025109222A1. He owns/has owned stocks with stock tickers ABEA, ABNB, AFX, ALV, MO, T, AMV0, BAS, BAYN, BNTX, BTI, CSCO, KO, CON, DHL, DTE, EQNR, IBM, LNTH, MC, MCD, MRNA, MUV2, NSRGY, NEE, NVO, PANW, P911, PYPL, PEP, RED, RIO, SRPT, SESG, SHEL, SBUX, TTE, VZ, VOW3, VNA. SDØ received the 2020 Lundbeck Foundation Young Investigator Prize. SDØ owns/has owned units of mutual funds with stock tickers DKIGI, IAIMWC, SPIC25 KL, and WEKAFKI, and owns/has owned units of exchange-traded funds with stock tickers BATE, TRET, QDV5, QDVH, QDVE, SADM, IQQH, USPY, EXH2, 2B76, IS4S, OM3X, EUNL, and SXRV.