1 Introduction
In the twenty-first century, social networking and disinformation have become almost inseparable. Global platforms such as Facebook, TikTok and Twitter host immense quantities of user-generated content, and increasingly, users are taking advantage of the reach of these platforms to deliberately deceive or manipulate those who consume online media, as, for example, shown by Hoffmann (this volume) for Twitter use in political campaigns. One element that can underpin this wide scale deception is social media algorithms. Social media algorithms are a form of ‘recommender system’ – a way of attempting to provide users with content they are likely to prefer (Kozyreva et al., Reference Kozyreva, Lorenz-Spreen, Hertwig, Lewandowsky and Herzog2021). This supposed preference is derived from analysing those users’ previous behaviours in an effort to show information that they are more likely to want to know, and to exclude content they may not be interested in. These algorithms, however, can be manipulated to promote disinformation, automating the delivery of deceptive content online to mass audiences.
Algorithmic disinformation is something that has become a public concern in recent years. In a YouGov poll of 2,000 people, 65 per cent of respondents reported being ‘concerned’ or ‘very concerned’ when asked about the spread of ‘fake news’ in the UK (YouGov, Reference YouGov2022). Meanwhile, given various high-profile news stories involving use of algorithms in the UK (Kolkman, Reference Kolkman2020; Williams, Reference Williams2021), the concern surrounding algorithms is no longer a niche of tech design. It is a topic that has clearly made the jump into something being present in offline consciousness. Further, new-media technologies are not necessarily well understood by the people who use them. While a majority of people in the UK are familiar with concepts such as ‘AI’ or ‘targeted advertising’, only 12 per cent are familiar with the notion of recommender systems. This unfamiliarity happens even though 70 per cent of internet users in the UK use social networking sites (ONS, 2020). Despite the high internet and social media penetration in our lives, there is a very low baseline understanding of what a (recommender) algorithm is and how they affect us on a daily basis.
The purpose of this chapter is to break down some of the core components of disinformation to understand how it interacts with, and is impacted by, social networking algorithms. First, I will break down the disinformation ecosystem and describe the different components of disinformation dissemination including disentangling key terminology. Second, to ground the chapter in real-world discussions of disinformation online, the chapter will explore the key issues and topics that are mentioned when people talk about disinformation online. The purpose of this is to contribute to our understanding of not only what disinformation is, but how and why it matters to different individuals and the implications of this on how we address disinformation.
This chapter explores three core pillars of algorithmically fuelled disinformation online: amplification, reception and correction. For each of these, the chapter will explore how they are affected by social media algorithms, using case studies and making practical recommendations for how the effects of disinformation can be mitigated on both an individual and system-wide level. Table 11.1 describes each component of the disinformation ARC (amplification; reception; correction) and defines what is included in each of the pillars.
Table 11.1 The three components of ARC.
| Amplification | Reception | Correction |
|---|---|---|
| the spread of disinformation online and how recommender systems amplify the reach of false content | the impact of disinformation at the individual level and the outcomes and consequences of disinformation | what we can do to mitigate or stop the effects of algorithmic disinformation individually, at the platform level and through policy |
Disinformation is a socio-technical issue (Fernández et al., Reference Fernández, Bellogín and Cantador2021) that is affected by both individual factors such as emotions, psychological heuristics and information processing, and structural, technical ones such as how social networks are structured and how information is recommended to individuals. ARC covers both these social and technical aspects of algorithmic disinformation by considering the logistics of how content spreads, but also the human element of how it is received. Before the ARC topics are discussed and any meaningful discussion of disinformation or algorithms can begin, it is vital to define both.
1.1 Defining Disinformation
When looking at disinformation on social media, it is useful to consider both the noun disinformation and the verb disinform. This is because we are not just considering artefacts of false news online (i.e., the news stories themselves) but the processes by which people are deceived and manipulated online – how they are disinformed. While some authors may opt for misinformation in general discussions (referring to the unintentional dissemination of false information), this chapter will favour the term disinformation. This is because the term disinformation acknowledges ‘not just […] the content but the context in which it is presented – and the narratives, networks and actors behind it’ (GDI, 2019). When assessing the spread of false content on social media platforms, it is important to consider the whole ecosystem of how and why false content is produced, and not just the false information itself; focusing on disinformation allows us to do this.
It is not in the scope of this chapter to fully break down the term disinformation, but for clarity it is important to distinguish it from other associated concepts. Guess et al. (Reference Guess, Nagler and Tucker2019, p. 6) refer to disinformation as ‘knowingly false content’, while Allcott and Gentzkow (Reference Allcott and Gentzkow2017, p. 213) refer to content that is ‘intentionally and verifiably false, [that] could mislead readers’. These definitions highlight intentionality (the content is false on purpose) but do not distinguish disinformation from associated concepts such as satire (see below). It is important to clarify when studying disinformation that the content being looked at is not only deliberately false, but also designed to deceive – it is false content designed to be taken as factual. The definitions in Table 11.2 will be used in this chapter.
Table 11.2 Definitions of disinformation and disinform.
| Disinformation, n. | false information that is disseminated with the intent to deceive |
| Disinform, v. | the deliberate use of factually incorrect information to deceive and mislead for financial, political, ideological, hostile or other purposes |
Fabricated disinformation is different from mis-reported news such as premature obituaries because while these stories may be false, there is no intention to deceive; they are misinformation. It is also routine for corrections to be issued in cases such as these. Further, disinformation is distinct from satire because satire is fabricated but does not intend to deceive. Although, while satire may not set out to deceive readers, it has the potential to be believed and further disseminated as legitimate news (thus becoming misinformation). Finally, disinformation is different from commonplace biased news practices such as selective reporting and misleading news because these are methods of news distortion that still use factual news, albeit for nefarious purposes; whereas disinformation fabricates news. That is not to say disinformation is exclusively fabricated – often truthful and false information are combined together.
Table 11.3 uses real examples of headlines to demonstrate how these differences exist on a cline, with fuzzy, subjective boundaries. It is also important to note that disinformation and misinformation are subjective: to one individual who is aware what they are reading is false, an item is disinformation; to another, who is not aware it is false, it is misinformation. The above section has defined disinformation itself, but it is important to understand how algorithmically fuelled disinformation functions and how algorithms work as a content delivery system.
Table 11.3 A misinformation-disinformation cline.Footnote 1
| LOW | Intention to deceive | HIGH | ||||
|---|---|---|---|---|---|---|
| Mis-reported news | Parody | Satire | Bias | Selective reporting | Misleading news | Fabricated news |
| DO NOT PUBLISH- Former first lady Barbara Bush d*es at age 92 DO NOT PUBLISH! | HOLY SHIT MAN WALKS ON FUCKING MOON | Kim Kardashian becomes Archbishop of Kanterbury | Email scandal proves Hillary learned wrong lessons from Nixon and Watergate | Benghazi report points out Obama, Clinton lies | Annual tidal wave of 228,000 non-EU migrants who use European passports to gain access to Britain | Sturgeon bans Union flag for Queen’s birthday |
1 Sources from left to right: CBS News (redacted); The Onion; The Daily Mash; Fox News; The Washington Times; The Daily Express; The Scottish Daily Mail.
1.2 Algorithms and Recommender Systems
In its most basic form, an algorithm is a set of instructions (usually carried out by a computer) designed to complete a task such as solving a problem or creating a specific outcome. Algorithms receive some form of input, perform a task and then produce an output (Cormen et al., Reference Cormen, Leiserson, Rivest and Stein2022). A recommender algorithm, or recommender system (RS), is a system where particular content is recommended to users of a service based on data about them (Shokeen & Rana, Reference Shokeen and Rana2020). This can include speculative information such as (perceived) class and nationality, as well as concrete information such as user inputted age, gender and the content previously interacted with on a website/app. RSs revolve around the idea of relevance – what is it a user wants to see. Social media RSs exist to prevent information overload by filtering content before users see it, so they are only shown a small proportion of the information available (Anandhan et al., Reference Anandhan, Shuib, Ismail and Mujtaba2018; Shokeen & Rana, Reference Shokeen and Rana2020). The result is users are shown information they are likely to be interested in, rather than being overwhelmed by the output of increasingly popular social media (Pew Research Center, 2022).
RSs are dynamic in that they do not rely on a fixed set of data nor do they always make the same recommendations. Modern RSs use models that continuously and automatically update themselves harnessing the ‘reservoir of inexhaustible information’ that social networks offer (Shokeen & Rana, Reference Shokeen and Rana2020, pp. 637–638). Using machine learning (see Mahesh, Reference Mahesh2020) RSs are designed to learn how to suggest increasingly relevant content as they are fed data and users respond to previous recommendations (thus generating even more user data). The interface of disinformation and RSs is when the latter recommends the former to users of a platform. For example, as demonstrated in Figure 11.1, a user could click on a set of news articles recommended to them because they have previously demonstrated interest in current affairs news items or technology news. This active engagement with news articles leads the RS to learn that the user is interested in articles with 5G in the title. It then self-adjusts and begins recommending more articles containing 5G. For one of these articles, the RS potentially recommends disinformation.

Figure 11.1 A simplified illustration of how recommender systems can send disinformation to social media users.
The following sections will address each component of ARC exploring how they are affected by algorithms. To understand the less technical, and more human aspect of disinformation, the chapter will culminate in a study exploring how terms such as disinformation, misinformation and fake news are discussed online.
2 Amplification
At the most fundamental level, the purpose of disinformation is to be consumed and engaged with. This engagement can be both passive and active – passive engagement refers to reading the headline or viewing an article, while active engagement is doing something with that article, such as liking, sharing, retweeting, replying or forwarding it (Gerson et al., Reference Gerson, Plagnol and Corr2017). While some may classify reading an article as active engagement, here active/passive is seen as the difference between engaging with an article or communicating engagement with it to others.
Different types of disinformation campaigns rely on different types of engagement, or a hybrid of the two. For example, ‘zone flooding’ is a discursive practice whereby bad actors overload the information environment with all types of information to obfuscate the legitimate information, thus decreasing people’s ability to understand what is true or false (Illing, Reference Illing2020; McRae et al., Reference McRae, del Mar Quiroga, Russo-Batterham and Doyle2022). This type of action relies heavily on passive engagement, and in-fact functions by encouraging only passive engagement. It is a brute-force form of information operation that does not necessarily rely upon algorithmic promotion; it essentially emphasises quantity over quality.
The type of disinformation that requires active engagement is the type that has most to benefit from algorithmic amplification. The drawback to curative algorithms is that they unintentionally act as a vector between disinformation producers and social media users, meaning RSs can be exploited to carry out mass information operations and disinformation campaigns online (Howard et al., Reference Howard, Bolsover, Kollanyi, Bradshaw and Neudert2017). These algorithms are usually programmed to favour high engagement websites but in turn this can promote sensationalist ‘clickbait’ sites (Benkler et al., Reference Benkler, Faris and Roberts2018) over heritage news sources that may give more balanced, and therefore less dramatic, coverage. The resulting information environment then becomes one where high engagement (i.e., shocking, dramatic, attention-grabbing) content is foregrounded, and information reporting (such as news) is backgrounded.
Social media algorithms can also be deliberately manipulated. Information operations and computational propaganda have seen the agency behind social media posts switch from individual social media users to automated computer programs designed to mimic humans (Woolley & Howard, Reference Woolley and Howard2016). These operations can expertly exploit algorithms to amplify selected content to hundreds of millions of people quickly and at a relatively low cost (Arnaudo, Reference Arnaudo2017). However, it is important to examine where the blame lies for algorithmic promotion of disinformation. Disinformation very closely mimics legitimate content in form and thus RSs may not be able to tell the difference between legitimate news and disinformation. This means RSs can be manipulated and abused by disinformation producers. For example, the first of the two headlines in Table 11.4 is true (reported by The Independent) while the second is false (from World News Daily Report).
Table 11.4 A truthful news headline and a disinforming headline.
| Truthful news story | Fabricated news story |
|---|---|
| Dozens Of Camels Barred From Saudi Beauty Pageant For Using Botox (Independent.co.uk, 2021) | Elderly Woman Accused Of Training Her 65 Cats To Steal From Neighbors (Worldnewsdailyreport.com, 2017) |
Both these headlines are equally (un)believable and could both be promoted by an RS that deems unusual content to be relevant to users. While these examples are largely trivial, RSs also treat much more serious content in the same way. This shows the inherent flaws that exist in RSs and curation algorithms: their function is to provide relevant content, and they perform that function irrespective of veracity. Further, RSs often experience popularity biases, whereby they privilege content that is already popular among social media users (Bellogín et al., Reference Bellogín, Castells and Cantador2017). Given that disinformation is often in the form of sensationalist clickbait (Mourão & Robertson, Reference Mourão and Robertson2019) this can further confound the RSs as the content satisfies the conditions for relevant information that they are programmed to promote. Conversely, it is the responsibility of social media platforms to create RSs that do not actively promote disinformation to new audiences. With mass deception and manipulation increasingly requiring less effort and cost (Tsikerdekis & Zeadally, Reference Tsikerdekis and Zeadally2014), RSs must be designed with disinformation in mind. This means integrating data such as information about news sources and reports from fact checkers into the RSs. This is discussed further below in the Correction section.
3 Reception
While the previous section has explained how and why (dis)information is amplified by algorithms, this section will detail the effects of this amplification on everyday users of social media networks. Research into the reception of disinformation largely focuses on two main areas: (1) the outcomes and consequences of disinformation; (2) reasons for belief in disinformation, largely relying on research concerning psychological heuristics (Pennycook & Rand, Reference Pennycook and Rand2021). The former will be discussed here and some of the main findings and arguments in the area will be examined and mapped onto the case of algorithmic disinformation.
3.1 Outcomes of Disinformation
With the COVID-19 pandemic it has become clear that disinformation and misinformation online can lead to real-world harms (Jolley & Paterson, Reference Jolley and Paterson2020; Wardle & Singerman, Reference Wardle and Singerman2021). Examples of this include criminal damage inspired by COVID disinformation in the form of arson attacks on 5G cellular towers (Ahmed et al., Reference Ahmed, Vidal-Alaball, Downing and López Seguí2020) and severe illness caused by self-administration of the antiparasitic drug ivermectin to treat COVID-19 infection based on health misinformation (CDC, 2021). The ability for false content to harm individuals demonstrates that disinformation and misinformation are inherently a security threat, and something that can harm institutions, processes and individuals.
In a study of 3,000 US respondents who answered surveys while their internet activity was monitored, Ognyanova et al. (Reference Ognyanova, Lazer, Robertson and Wilson2020) found that exposure to disinformation was associated with decreased trust in mainstream media. This scepticism and decreasing trust are not just caused by textual disinformation. Deepfakes, a type of image, audio or video where individuals are digitally inserted to give the appearance that someone not in the original media is present, also negatively impact trust. In a UK context, Vaccari and Chadwick (Reference Vaccari and Chadwick2020) find that while deepfakes do not always successfully deceive, they do sow uncertainty and can decrease trust in news on social media. Similarly, in a study during the 2018 US midterm elections, Jones-Jang et al. (Reference Jones-Jang, Kim and Kenski2020) found that perceptions of exposure to disinformation increased feelings of political cynicism, and that this was stable across political belief. These studies demonstrate how exposure to commentary and discussion of disinformation can affect trust. That is to say, meta-discussion of whether information can be trusted can influence trust levels.
Perception is an important point here. Jones-Jang et al. (Reference Jones-Jang, Kim and Kenski2020, p. 3111) note that ‘perceived trustworthiness about a system is more important than whether a system is indeed trustworthy or not’, illustrating that perceptions can be as important as reality when it comes to trust. This demonstrates how one outcome of disinformation is the meta-concern that it causes. That is to say, people become (over-)concerned about disinformation and so its negative consequences are increased even more. In a study assessing whether so-called ‘elite discourses’ affect evaluations of news media, Van Duyn and Collier (Reference Van Duyn and Collier2019) find that participants who were exposed to tweets from ‘elites’ (politicians, journalists, activists) discussing disinformation experienced decreased trust in the media, demonstrating how disinformation can further harm trust.
There is also a key difference here between people consuming misinformation and disinformation. In a survey of 400 participants, Tandoc et al. (Reference Tandoc, Duffy, Jones-Jang and Pin2021) found that individuals who were exposed to disinformation, and later told it was false, experienced a decrease in news media trust, whereas those who were not told it was false did not experience a change in perception of news media. The authors conclude from these findings that this demonstrates how ‘the social impact of fake news is not limited to its direct consequences of misinforming individuals, but also includes the potentially adverse effects of discussing fake news’ (Tandoc et al., Reference Tandoc, Duffy, Jones-Jang and Pin2021, p. 783). This leads to an information environment where people can overextend their distrust in disinformation to all types of information, because they become suspect of everything. Echterhoff et al. (Reference Echterhoff, Groll and Hirst2007) refer to this phenomenon as the tainted truth effect, whereby prior warnings of disinformation can lead individuals to discredit truthful information.
On the surface the findings above about perception make sense: people who are warned that there may be deception in the information they consume seek to find this deception. But it is the hypercorrection and overextension of this scepticism that is stark here. This demonstrates how pernicious disinformation is; it is not just the false content itself that can decrease trust in institutions and the media, but the mere discussion of disinformation too. This leads to a dilemma whereby we need to balance education and awareness of disinformation with (over-)exposure of the topic.
4 Correction
There are two key barriers to dealing with disinformation: (1) social and (2) environmental. The former concerns understanding why disinformation spreads to then try and mitigate its impacts at the individual or population level, while the latter concerns what tangible changes can be made to/by social networks to reduce the spread of false content online. The following section will explore changes that can be made to RSs, focusing on three main areas: design, oversight and transparency.
4.1 Correcting Algorithms
As they currently stand, social media algorithms promote ‘negatively marked online behaviours’ (or NMOBs) (Hardaker, Reference Hardaker2012), whether this is extremist (Lawson, this volume; Whittaker et al., Reference Whittaker, Looney, Reed and Votta2021), conspiratorial (Faddoul et al., Reference Faddoul, Chaslot and Farid2020), or self-harm content (Walker, Reference Walker2022). This is something that needs to be addressed and corrected to create a safer online environment – social media platforms must restructure what algorithms prioritise. Similarly, algorithms must be human designed and human oriented. This means requiring consistent human input and correction so RSs can be corrected proactively, rather than companies reactively responding to harms caused by RSs.
Humans have biases, and these are coded into algorithms (Bozdag, Reference Bozdag2013). Institutional biases, such as gender disparities, racial discrimination and ableist biases (Ayre & Craner, Reference Ayre and Craner2018), can be coded into algorithms and then inadvertently perpetuated. For example, in 2018 at Google and Facebook, only 21 per cent and 22 per cent of technical worker roles were held by women (Chin, Reference Chin2018) – this lack of diversity means algorithms are being designed and coded by a fairly homogeneous group that may lack the diversity to be aware of the challenges and biases that affect others. This means that, whether unknowingly or otherwise, biases can be built into algorithms from their very inception. Given that these RSs then learn over time, if they are corrupt from the very beginning then they learn with these biases as their foundation. This is why algorithms should require internal and external oversight. In 2021, the UK government published an ‘algorithmic transparency standard’ for public sector bodies that advocates for transparency and scrutiny (Central Digital and Data Office, 2021). This follows a review from the UK Centre for Data Ethics and Innovation (CDEI) that concluded that ‘we must ensure decisions can be scrutinised, explained and challenged’ (CDEI, 2020, p. 6). While these reports concern public bodies using algorithms for decision making, their principles can, and should, be mapped onto private enterprises. However, oversight is confounded by another issue: transparency.
Social media algorithms and RSs are black box, proprietary systems (Leerssen, Reference Leerssen2020), meaning there is very little transparency regarding how they function. The intricacies of their functionality are purely the domain of the companies themselves, as is the data they use and the impacts they have. This is chiefly for purposes of protection; social media algorithms are very complex, and decades of research have contributed to them, therefore it is logical that companies like Meta and TikTok would seek to protect their algorithms as companies like Coca-Cola and others protect their product recipes. However, this also conceals their design, what data they use, and the impacts they have on their users. The issue at present is that transparency and oversight are entirely discretionary. Tech giants like Meta, Google and Twitter can choose to have transparency (Twitter, 2023), but can equally choose not to be transparent (Oversight Board, 2021, p. 7). In reality, there is little to no incentive for social media companies to be meaningfully transparent with how their algorithms operate, unless they are compelled to do so. This is where transparency and oversight intersect, demonstrating the need for algorithms to be scrutinised for biases, negative outcomes and other issues by external parties.
This is a complex issue that requires an equally complex solution – there is no silver bullet that will mitigate the harmful effects of algorithms and disinformation. Instead, a programme of regulation, policy, education, fact-checking and research is needed to help reduce the impacts algorithmic disinformation can have. This is part of the rationale for ARC: to demonstrate there are many moving parts and different areas which all need different responses. The various aspects of disinformation amplification, reception and correction will need different responses that will also need to account for changes over time and consider social, cultural, political and economic contexts.
5 How People Talk About False Information Online
With the advent of large social media platforms, studies of disinformation have grown considerably in the past 20 years. Research has traditionally focused on detecting disinformation by training natural language processing (NLP) models on large amounts of textual data (de Beer & Matthee, Reference De Beer, Matthee and Antipova2021). The aim of such research is generally to build a tool that can identify disinformation automatically, without a reliance on human factcheckers. From a forensic linguistic perspective, Sousa-Silva (Reference Sousa-Silva2022) argues that disinformation satisfies the criteria of a language crime, demonstrating the need for forensic linguistic analysis of disinformation. Similarly, Moura et al. (Reference Moura, Sousa-Silva and Lopes Cardoso2021) draw on methods from forensic linguistics to create more accurate disinformation detection models. These recent studies demonstrate a much-needed shift in disinformation research – analysis that explores the nuances of false content online by exploring micro-level text permutations in disinforming texts themselves.
There is also an increasing body of work that researches false content from a discourse analytic perspective. Maci (Reference Maci2019) explores how individuals (as opposed to organisations such as news outlets, etc.) use Twitter to spread (mis)information about vaccination, finding that posts regularly exploit imagery of death verbally and visually, with discussions of death often occurring with mention of children, government or conspiracy. Using a broader approach of looking at ‘fake news’ as a genre, Lorusso (Reference Lorusso2023) notes that disinformation is intricately connected with social paradigms and has many functions, including maintaining social relationships and allowing individuals to express themselves. These studies offer insights into how false content is discursively constructed and demonstrate the validity of using both quantitative and qualitative linguistic methods to explore the production and promulgation of false content. There are also insights to be gained from exploring how false content itself is represented discursively – by exploring meta-discussions of phenomena such as disinformation.
In a study using the News on the Web (NOW) corpus, Cunha et al. (Reference Cunha, Magno, Caetano, Teixeira, Almeida, Staab, Koltsova and Ignatov2018) explore how the 2016 US presidential election shifted perceptions of the search term fake news. They identify clear shifts, where before the election discussion of fake news focused on general topics such as the news industry and the internet, post-election discussions shifted to specific political events (such as the election) with increasing amount of meta-discussion of fake news. Studies of news media are valuable for understanding how common discourses are (re)produced and sustained in the press. Li and Su (Reference Li and Su2020) find that discussions surrounding the term fake news on Twitter are characterised by in-group and out-group forming language, while Brummette et al. (Reference Brummette, DiStaso, Vafeiadis and Messner2018) find that discussions surrounding the term fake news on Twitter are emotionally charged and often obscure the term by using it to de-legitimise legitimate information (see also Hoffmann, this volume, for analysis of the term ‘fake news’ online). These studies demonstrate the insights that can be gained by studying social media datasets to identify patterns of language in discussions of disinformation, going beyond looking solely at the disinformation itself.
The term fake news is, however, increasingly seen as problematic due to its use as an insult or slur to discredit legitimate information (American Dialect Society, 2018). The term is still of relevance and importance, but it should not be the only term considered when exploring representations of false content online. Rather than focusing on one term, the present study explores how users on the social media platform Twitter discuss the terms disinformation, misinformation and fake news online. The aim of this case study is to investigate one main question: how do people talk about false content online? By looking at metacommentary around issues of disinformation, we can monitor discussions for what people find important at a given moment in time. This method can help researchers understand how people online view disinformation and can help us to develop key areas of importance to focus on that align with real-world concerns.
5.1 Methods
To understand how people talk about disinformation, this section will compare a dataset of disinformation discussion to a generic dataset which has been randomly collected. Using methods well founded within corpus linguistics (McEnery & Hardie, Reference McEnery and Hardie2011), a corpus can be compared to a reference, comparator corpus to isolate and identify words and phrases that are over-used. This identification of ‘key’ items (Scott, Reference Scott1997) uses effect size measures and statistical confidence thresholds to identify unusually frequent words in the target corpus, but it is not the entire picture. Once results have been generated, they must be manually analysed and explored; this will be done thematically to explore trends in the data and relate them back to the research question: how do people talk about false content?
Looking beyond words one at a time, this analysis explores semantic domains (Rayson et al., Reference Rayson, Archer, Piao and McEnery2004) as a way of grouping words together for analysis. A semantic domain is a group of ‘word senses that are related by virtue of their being connected at some level of generality with the same mental concept’ (Archer et al., Reference Archer, Wilson and Rayson2002, p. 1) and is an effective way of grouping words with similar meanings in context together. For example, Spanish test and classroom would belong to a semantic domain of education, while COVID test and prescription would belong to a domain of medical terms. The process of assigning words semantic domains is automated and was carried out using the USAS tagset in the online corpus analysis environment Wmatrix (Rayson, Reference Rayson2008).
5.2 Data
The data being explored comprise two corpora: a target corpus of randomly collected tweets containing the terms disinformation or misinformation or fake news, and a comparator, reference corpus of randomly collected tweets. Tweets were collected for the first six months of 2022 using Twitter’s Academic API that allows researchers to collect up to 10,000,000 tweets a month for academic purposes. For each month and each corpus, approximately 10,000 tweets were collected, evenly dispersed across 100 random time points; these time points were generated using PHP’s Mersenne Twister implementation (Joulain-Jay, Reference Joulain-Jay2021). The data were then de-duplicated to remove repeating tweets; in each instance one duplicate was preserved in the corpus. This resulted in a loss of 18,747 tweets (33%) for the target corpus, and 300 tweets (0.5%) for the reference corpus. See Table 11.5 for details on the corpus make-up.
Table 11.5 Tweets in the target and reference corpus.
| Target corpus | Reference corpus | |||
|---|---|---|---|---|
| Month | Tweets | Tokens | Tweets | Tokens |
| January | 6,540 | 142,860 | 9,251 | 138,002 |
| February | 6,361 | 139,576 | 9,410 | 138,867 |
| March | 6,649 | 142,838 | 9,441 | 140,305 |
| April | 6,422 | 138,628 | 9,256 | 137,318 |
| May | 6,142 | 130,407 | 9,440 | 141,150 |
| June | 6,264 | 136,316 | 9,467 | 141,703 |
| Totals | 38,378 | 830,625 | 56,265 | 837,345 |
5.3 Qualitative Analysis
First, before the key domains are discussed, it is valuable to explore how algorithms are discussed in the data. A wildcard search for algorithm* retrieved four forms (algorithm; algorithms; algorithmically; algorithmic) occurring just 24 times across 24 tweets in the target corpus, a relative frequency of 0.29 per 10,000 wordsFootnote 1. Immediately, it is striking how seldom algorithms are mentioned in the data. Given how integral RSs are to social media platforms, it is noteworthy that in a corpus of 38,378 tweets, just 0.06 per cent of these mention algorithms. To put this in context, for comparison, the lemma dog* (retrieving dog and dogs) occurs 33 times in 31 tweets throughout the target corpus. That is to say, people discuss dogs more often than they do algorithms in relation to disinformation, misinformation and fake news in the target corpus.
These findings seem to confirm the point made in the introduction of this chapter that while many people are concerned about disinformation, they are not necessarily aware of how RSs function or the role RSs play in the amplification of disinformation on social media platforms. This has implications for how we seek to redress algorithmically fuelled disinformation. In this sample of tweets, very few people discuss algorithms and subsequently the same individuals might then view attempts to tackle algorithmic disinformation as confusing or unnecessary because they are not necessarily aware of it as an issue. This is not to say that lack of explicit mention equals a total lack of awareness, but it is nonetheless salient how few times people discuss algorithms.
When algorithms are mentioned, the discussion exclusively centres around their negative impact on social media platforms (see Table 11.6 for some examples). The only tweet that bucks this trend and mentions how algorithms can prevent fake news is citing a book title (Example 6). In the select examples above, users discuss victims of disinformation, bias and harassment caused by algorithms, and how algorithms can stifle fair debate (Example 5). It would appear that those who are discussing algorithms understand how they operate but simply that not many people are discussing them to begin with, suggesting an issue of awareness. This is something both policymakers and educators should bear in mind when they are addressing disinformation.
Table 11.6 Examples of algorithm* in context.
| Example | Left context | Node word | Right context |
|---|---|---|---|
| 1 | or people that have been targeted victims of profit driven | algorithmic | misinformation, government and capital is responsible – period. |
| 2 | And with sophisticated | algorithm, | the misinformation of dangerous ideas are amplified. |
| 3 | Facebook’s | algorithm | rewards engagement, these superusers have enormous influence over which posts. |
| 4 | companies shared more data, researchers would help platforms understand misinformation, | algorithmic | bias and harassment; develop better, safer systems; inform regulations. |
| 5 | The board of Misinformation wants to silence conservatives, social media | algorithms | already silence conservative opinion. |
| 6 | Check it out & also read his book ‘How | algorithms | Create and Prevent Fake News’ for more on the topic. |
5.4 Key Semantic Domains
For the analysis of key semantic domains, a log ratio (Hardie, Reference Hardie2014) threshold of 1.00+ was used; this means the semantic domain is used at least two times more often in the target corpus than the reference corpus (this is calculated using relative not raw frequency). Similarly, a log likelihood cut-off of 15.13 was used (p < 0.0001). A minimum frequency cut-off of 10 was implemented. These measures are used in tandem to account for both effect size (log ratio) and confidence (log likelihood) (see Brezina, Reference Brezina2018 for more details). The results are listed in Table 11.7.
Table 11.7 Key domains above log ratio 1.00+.
The first two key domains in Table 11.7 are the result of the parameters for the corpora and are both dominated by the three search terms disinformation, misinformation and fake (news). Similarly, it is unsurprising that domains such as Q4 The Media, Q4.2 The Media: Newspapers, etc. and X2.2+ Knowledgeable (comprising mostly the token news) were also key. The domains cover terms relating to the media, and terms relating to newspapers (Archer et al., Reference Archer, Wilson and Rayson2002, p. 24). Again, these are artefacts of the search terms used to build the corpus (three terms describing false news); it is then not surprising that people might discuss other types of media at the same time.
While there are many domains that are key, the following sections discuss those that can be thematically split into three categories: (1) characterising disinformation; (2) the effects of disinformation; (3) solutions to disinformation. These groups of domains will be discussed in turn below. It is not possible to explore all the semantic domains in this chapter but where suitable, examples from multiple domains will be used to demonstrate how discourses are constructed.
5.4.1 Characterising Disinformation
Table 11.8 lists the semantic tags and domains which can be categorised as ‘characterising disinformation’.
Table 11.8 Semantic tags characterising disinformation.
| Semantic tag | Semantic domain | Example items in domain |
|---|---|---|
| S1.2.1 | Formal/unfriendly | enemy, enemies, divisive, hostile |
| A11.2+ | Noticeable | blatant, obvious, evident |
| A15− | Danger | dangerous, risk |
| X9.1− | Inability/unintelligence | idiot, morons, incompetent |
| G2.2− | Unethical | shame, corrupt, evil, scam |
Where adversaries such as enemy and enemies are discussed, there is a rough split between uses, with some referring to disinformation itself as the enemy while others refer to how disinformation is used by perceived adversaries such as other political parties, Russia and even historical accounts such as disinformation in Nazi Germany. These uses are similar to those found in A15− Danger where people discuss dangerous disinformation and misinformation, but also dangerous rhetoric and dangerous claims. In this sense, a distinction seems to be made between the actors responsible for the content, and the content itself. These findings support the comments earlier from the GDI where disinformation can be used to describe both the disinformation environment as well as individual disinforming news items.
People also comment on how some people act brazenly when it comes to producing and sharing disinformation, with uses such as blatant and obvious in A11.2+. In these instances, we see people criticising others for seemingly openly carrying out NMOBs in the public sphere, see the examples in Table 11.9.
Table 11.9 Examples of blatant + [false content].
| Example | Left context | Node word | Right context | |
|---|---|---|---|---|
| 7 | Why is the head of comms for UNICEF spreading | blatant misinformation | About what is happening in Ethiopia? | |
| 8 | This is | blatant misinformation | and should be taken down if poss | |
| 9 | This is literal, verifiable, | blatant fake news | Completely fraudulent academia. | |
| 10 | Meanwhile, #Russia’s state media keeps spreading | blatant disinformation, | preemptively blaming #Ukraine | |
| 11 | But you have been spreading | blatant disinformation, | Russian propaganda and untruths for the last month | |
These accusations of knowingly and intentionally spreading false information tie in with G2.2− Unethical, where we start to see the actors behind alleged disinformation being evaluated. Similarly, in these uses, Twitter users refer to accused disinformation producers as having no shame and being corrupt and evil. This category also contains moral evaluations where terms such as nefarious and wickedFootnote 2 exemplify the negative part of NMOBs. There is a clear pattern here that disinformation itself, and the actors responsible for the production of it, are consistently labelled as having moral failings for knowingly spreading potentially dangerous content. This also highlights how terms such as misinformation and disinformation are used interchangeably: blatant implies blame, and so canonically would refer to intentional disinformation, but that is not the case in the data, where misinformation is also labelled as blatant. This shows while their dictionary definitions may differ, in actual usage contexts, disinformation and misinformation are often used as synonyms.
5.4.2 The Effects of Disinformation
Table 11.10 lists the semantic tags and domains which were categorised as showing ‘the effects of disinformation’.
Table 11.10 Semantic tags showing the effects of disinformation.
| Semantic tag | Semantic domain | Example items in domain |
|---|---|---|
| G1.2 | Politics | propaganda, election, democracy |
| G2.1− | Crime | conspiracy, crime, fraud, illegal |
| B3 | Medicines and medical treatment | vaccine, medical, doctors, abortion |
| E2− | Dislike | hate, hatred, hateful, homophobic |
| G1.1 | Government | government, country, official |
These key domains again demonstrate the multiple words that are often used as synonyms and near-synonyms for disinformation, namely propaganda and conspiracy (theory/theories). Throughout the corpus, these are often used interchangeably or alongside each other. In part this demonstrates the importance of defining disinformation because it is often conflated with similar, yet distinct, terms such as these (see Seargeant, Reference Seargeant, Demata, Zorzi and Zottola2022 for an exploration of complementary forms of disinformation).
Similar to dangerous in the previous section, there is a clear strand of tweets that discuss the real-world impacts of disinformation. The examples below highlight some instances of these where people remark how disinformation can weaken democracy, spread hate and how health-related disinformation (abortion; vaccines) can be a threat to public health and is even destroying our country. In these uses, disinformation is positioned explicitly as a security threat that can affect our institutions and our individual freedoms.
The examples in Table 11.11 demonstrate that not only are many people aware of the impacts of disinformation, but also that a breadth of impacts is discussed in the data too. A conclusion that can be drawn at this point is that in general, the public are reasonably well versed in the implications of disinformation on society. This is reinforced by the finding that there is not seemingly one dominant topic in the data – one could have expected a skew towards certain high-profile disinformation cases (former President Trump, the Russo-Ukraine war, COVID-19) but this is not the case, demonstrating public awareness that disinformation affects many aspects of life.
Table 11.11 Examples of the effects of disinformation.
5.4.3 Solutions to Disinformation
The final category, solutions to disinformation, consists of one key domain: S8− Hindering. This domain contains terms that denote forms of hindrance, resistance or obstruction. Looking at the top three lemmas, which make up 60 per cent (716) of all words in the domain, there is a clear pattern (see Table 11.12).
Table 11.12 Examples of fight*, combat*, and prevent*.
| Lemma | Tokens | Uses |
|---|---|---|
| fight | fight, fighting | Digital Empowerment Foundation’s Soochnapreneurs (Information Entrepreneurs) across 1000 locations in India are also working on a daily basis to fight #Misinformation & fraud Democracies have focused a lot on fighting, blocking, stopping disinformation. |
| combat | combat, combating, combatting, combats | We all have a role to play in combating misinformation and preserving democracy. More than 80 fact-checking organisations urged YouTube to better combat disinformation |
| prevent | prevent, preventing, prevents, prevention, prevented, preventative | Wonder if Twitter is gonna intervene to prevent the spread of misinformation The establishment isn’t interested in preventing #misinformation; they want instead 2 manage the flow of info, regardless of its veracity |
While combat* as a verb almost exclusively occurs in the formula [combat] + [disinformation], for fight* as a verb there is an interesting mix of literal and figurative use. While the most common bigram is again [fight] + [disinformation], there are also instances of literal use of fight, such as fighting wars and fight Russia, both referring to the Russo-Ukrainian War; this is an outcome of the parameters of the data collection, which coincided with Russia’s invasion of Ukraine. This finding highlights how disinformation is overwhelmingly seen as a problem to be fixed, and not something that is a given part of using social media. It also demonstrates how we frame disinformation: it is an enemy that must be fought. This metaphorical framing is similar to those used for COVID-19 where the disease is framed through metaphors of fights and battles (Semino, Reference Semino2021), as a way of embodying something which is not necessarily tangible.
Prevent offers a more complex view of the data as it is interspersed with mentions of COVID-19 and preventative health measures. When we focus on just instances of spread co-occurring with either dis-/misinformation, a dominant pattern in the corpus is preventing not just disinformation, but specifically the spread of disinformation (see Table 11.13).
Table 11.13 Examples of prevent.
| Example | Use |
|---|---|
| 18 | tips on how to help prevent the spread of disinformation |
| 19 | To help prevent the spread of misinformation, it’s important to verify sources and content posted on websites or social media |
| 20 | Wonder if Twitter is gonna intervene to prevent the spread of misinformation |
| 21 | #Telegram allegedly failed to prevent users from spreading #disinformation |
| 22 | What, the president is making a ministry meant to prevent the spread of lies and misinformation? OH NO! |
While fight and combat are used in much vaguer senses and refer to disinformation as a phenomenon in general, prevent is used often to refer specifically to the spread of disinformation on social media. In Examples 20 and 21 we see specific actors named to tackle disinformation; Twitter and the messaging service Telegram. Here, users are specifically calling on them to address the spread of dis-/misinformation. In Example 22, the US government is mocked for addressing misinformation at the federal level, showing how interventions to tackle disinformation can be both requested but also decried and criticised.
Investigating how people discuss disinformation, misinformation and fake news gives a wealth of insights into what people deem as (un)important and how people feel about these topics. The analysis of the corpus shows that a wide range of issues are represented in the data, including topics covering health misinformation, tech companies’ responses to disinformation, and how we can stop false content spreading. Given a broad range of topics were discussed in the data, the lack of discussions about algorithms and RSs becomes even more stark. Every user of social media is affected by RSs and they play a considerable role in the spread of disinformation, but there appears to be a general lack of awareness of this.
6 Conclusion
When we examine real-world discussions of disinformation, misinformation and fake news across a six-month period, clear patterns are identifiable. First, discussions of RSs and algorithms are very infrequent, with trivial, non-related topics discussed more often in the data. This finding suggests an issue of awareness because while social media users discuss a wide range of topics regarding disinformation, discussions of algorithms are largely absent. There are also clear concerns in the data: namely, how social media companies will address the spread of disinforming content online and how such content affects our society across all levels, including our security, democratic systems, human rights and healthcare. The findings show that disinformation is framed as an enemy and often the blame for the spread of this content is aimed at the platforms on which it spreads, harming the reputation of social media platforms.
Social media algorithms and recommender systems are designed to prevent information overload online. They help provide a rich user experience to people worldwide and can foster learning, help develop social relationships, and promote business. RSs work by taking user data and filtering new content by relevance and then feeding this content back to users to prevent information overload. This design, however, has an inherent flaw. The sole focus on relevance comes at the cost of safety and RSs are responsible for massive quantities of disinformation being directly delivered to the devices of social media users. It is the intersection of algorithms and disinformation, algorithmically fuelled disinformation, that can bring about the worst of the each. Disinformation affects billions of people worldwide, many without them even knowing they have seen falsehoods (cf. misinformation), and can affect information environments, distort world views, and bring about real-world harms, such as radicalisation (as discussed by Lawson, this volume). There is also a knock-on effect that disinformation has on how we view information where disinformation can lead to decreasing trust overall, making it harder for people to distinguish between disinformation and legitimate information.
The disinformation ARC in this chapter has demonstrated a practical way to break down the disinformation environment into separate, interrelated sections that can then be assessed and researched. It offers a useful way of highlighting the key areas to focus on and also demonstrates how different parts of the disinformation environment affect and complement each other. One element absent from this framework is the production of disinformation. While beyond the scope of this chapter, it is important to consider why disinformation exists and how it is (mass-)produced for dissemination. Further, disinformation is a burgeoning area of research that can be studied from various technical and social dimensions. It is important, however, to remember the human element of disinformation, and not to treat the technical and social aspects as mutually exclusive. Further research should explore how the real-world concerns of people in natural settings interfaces with the amplification, reception and correction of disinformation to understand what people are concerned about, and how changes to our information environments online can address real concerns surrounding the spread of disinformation.
1 Introduction
1.1 Features and Functions of Breathing
Vegetative, quiet (i.e., non-speech) breathing differs from speech breathing in a number of respects, see Fuchs and Rochet-Capellan (2020) for a summary. The ability of human beings to temporarily seize conscious control of their breathing is a prerequisite for speech communication – a prerequisite that is of fundamental importance and that, in terms of the magnitude of changes to quiet breathing, we do not share with many other species on this planet. For example, MacLarnon and Hewitt (Reference MacLarnon and Hewitt1999, p. 341) write: “One neglected aspect is the evolution of increased breathing control. […] Without sophisticated breath control, early hominids would only have been capable of short, unmodulated utterances, like those of extant nonhuman primates,” see also MacLarnon and Hewitt (Reference MacLarnon and Hewitt2004, p. 182ff.) for further details. Unlike quiet breathing, controlled speech breathing is characterized, among other things, by a strongly asymmetrical breath cycle, namely a short inhalation phase followed by a relatively much longer exhalation phase (10% vs. 90%). In quiet breathing (and without physical exertion), the breath cycle is largely symmetric and its rate usually within a 3–4 Hz window, variation due to body size included. Thus, the exhalation phase is a maximum of 1–2 seconds long (Gupta et al., Reference Gupta, Lin and Chen2010). In speech, by contrast, the exhalation phase is, more often than not, a multiple of this value, depending on the speaking task. For example, when reading, the exhalation phase is on average shorter (Rochet-Capellan & Fuchs, Reference Rochet-Capellan and Fuchs2013) than when giving presentations, where it often exceeds 5–6 seconds (Barbosa et al., Reference Barbosa, Niebuhr and Neitsch2019). Age also plays a role in exhalation duration. Between 16–50 years, speakers use comparably long exhalation durations, while both older and younger speakers show significantly shorter exhalation durations (Huberand & Stathopoulos, Reference Huberand, Stathopoulos and Reford2015).
Furthermore, speech breathing is not just a ‘refueling process’ in which air pressure is built up and converted into acoustic energy. Rather, speech breathing is an integral part of the communication system of spoken language. That is, not only does speaking depend on breathing. Breathing also depends on speaking in that it adapts to linguistic structures and phonetic patterns (Fuchs & Rochet-Capellan, Reference Fuchs and Rochet-Capellan2021). For example, speakers change their voice quality in the direction of a more compressed voice in order to be able to reduce the breathing rate in certain communication situations (Aare et al., Reference Aare, Lippus, Włodarczak and Heldner2018). Furthermore, the breathing cycle interplays with syntactic structures (Rochet-Capellan & Fuchs, Reference Rochet-Capellan and Fuchs2013), to the point that backchannels or similarly brief isolated utterances tend to be timed such that they are produced at approximately 70% of an exhalation phase (Aare et al., Reference Aare, Włodarczak and Heldner2014).
With regard to the communicative functions of speech breathing, it has been shown that emotional and expressive differences are related less to the durations of inhalation and exhalation but more to their amplitudes (Barbosa et al., Reference Barbosa, Madureira, Fontes and Menegon2020). In technical applications, breathing analyzes can be used for speaker identification (or sex identification) and for diagnosing mental and respiratory diseases (Sharma et al., Reference Sharma, Krishnan, Kumar, Ramoji, Chetupalli, Ghosh and Ganapathy2020). In contexts of prosodic phonology, audible breathing has turn-taking and turn-yielding functions in dialogue situations (Rochet-Capellan & Fuchs, Reference Rochet-Capellan and Fuchs2014; Włodarczak & Heldner, Reference Włodarczak and Heldner2016). And long breath pauses help listeners understand the content of the subsequent utterance and remember it better or longer (Elmers et al., Reference Elmers, Werner, Muhlack, Möbius and Trouvain2021). After discomforting questions like “Will you help me move at the weekend?,” the addressee can use a long, audible inhalation to buy additional thinking time without making the answer sound insincere (Kohtz & Niebuhr, Reference Kohtz and Niebuhr2017). Note that, within these general patterns, the functions of nonverbal sounds like breath intakes can vary across languages and discourse genres. As for the latter, Mettouchi (Reference Mettouchi2018) compared recounts and folktales in Kabyle and found genre-specific distributions and phonetic patterns of breath intakes. Similarly, Winter and Grawunder (Reference Winter and Grawunder2012, p. 808) reported for Korean that “formality also affected breathing patterns, leading to a noticeable increase in the amount of loud ‘hissing’ breath intakes in formal speech.” With regard to cross-language differences, Oh (Reference Oh2012, p. 3516) stresses for a contrastive analysis of Chinese and Korean poem performances “that different linguistic structures operate powerfully upon preplanning and respiratory patterns [and] furthermore […] show the correlation of voice-quality variation and respiratory rhythm in different languages.”
Practically all current research on speech breathing primarily deals with the question of why and when speakers inhale and exhale (in relation to linguistic elements and structures) or is about how often, for how long, how much, and how audibly speakers breathe; and despite some work on the coordination between the chest and abdominal muscles during speech (e.g., Hixon, Reference Hixon1987; Hoit et al., Reference Hoit, Jenks, Watson and Cleveland1996), details about the interrelation and/or domination of chest and abdominal breathing and associated phonetic effects remained largely unexplored so far, especially in connection with speaking styles. However, it is precisely this question that plays a central role and fills entire chapters or sessions in rhetorical guidebooks, public-speaking workshops, and media-training exercises.
“Breathing is the foundation of a good delivery” (Atkinson, Reference Atkinson2004, p. 360). While this statement is undoubtedly true, most rhetorical guidebooks and trainers emphasize the essential importance of abdominal breathing (often referred to as ‘belly breathing’ in popular-science contexts). For example, Fox Cabane (Reference Cabane2012, p. 192) reminds her readers: “make sure you’re breathing deeply into your belly.” Similarly, Carnegie and Eisenwein (Reference Carnegie and Esenwein2011, p. 223) claim that “deep breathing – breathing from the diaphragm – give[s] the voice a better support [and] a stronger resonance,” both of which they consider key features of the art of persuasive public speaking. Likewise, Barker (Reference Barker2011, p. 132f.) draws a direct connection between abdominal breathing and persuasive, charismatic speech by stating that “the deepest kind of breathing, which works from the stomach rather than the upper part of the lungs […] works wonders for the voice: it gives it depth and power, and makes for a more convincing delivery.”
There is, in fact, empirical evidence that abdominal breathing is useful to treat voice and breathing disorders (Xu et al., Reference Xu, Ikeda and Komiyama1991); and it can enhance the performance of singers as well (Salomoni et al., Reference Salomoni, van den Hoorn and Hodges2016; Thorpe et al., Reference Thorpe, Cala, Chapman and Davis2001). Many rhetorical guidebooks and trainers explicitly draw a comparison to singing. But is this comparison appropriate? (1) When singing, one can only breathe about every 5–12 seconds (sometimes even less frequently, Bernardi et al., Reference Bernardi, Snow, Peretz, Orozco Perez, Sabet-Kassouf and Lehmann2017). When speaking, it is possible to breathe at least twice as often. (2) When singing, specific tonal intervals must be hit exactly. When speaking, the speaker’s (almost) only crucial task is to produce a phonologically sufficiently pronounced tonal contrast between a high-pitched element and an adjacent low-pitched element (or vice versa), as is the case, for example, in the pitch accents of many languages and in the language-specific phonologization of f0 declination (and its subsequent f0 reset), see Ladd (Reference Ladd, Haspelmath, König, Oesterreicher and Raible2001, Reference Ladd2008). (3) When singing, tonal events must be sustained exactly. When speaking, only local tonal targets need to be approached (Ladd, Reference Ladd2008). (4) Finally, singing, unlike speaking, means to phonate continuously with a relatively loud voice; the difference to speech is about 12 dB (Monson et al., Reference Monson, Lotto and Story2014). So, in view of (1)–(4), why should we assume that what is known to support singing – namely, abdominal breathing – would also support speaking? In other words: does it also make sense from a scientific point of view to use, train, and improve abdominal breathing for public-speaking purposes?
1.2 Summary of Our Previous Research on Breathing and Charisma
Although the last years of phonetic research revealed how multifaceted the phenomenon of (speech) breathing is and that it does more than merely creating a phonatory basis for speech communication, no evidence was presented to evaluate the rhetorical statements about the relevance of breathing – and abdominal breathing in particular – for a speaker’s performance on stage. Against this background, Barbosa et al. (Reference Barbosa, Niebuhr and Neitsch2019) and Barbosa and Niebuhr (Reference Barbosa, Niebuhr, Elmentaler and Niebuhr2020) have started a separate line of research dedicated to the scientific evaluation of rhetorical statements on speech breathing.
In the first step, this line of research aims to address three questions: (1) do the speakers, who naturally use more abdominal breathing, automatically have a more charismatic acoustic voice profile? That is, is there a natural correlation between charismatic acoustics and abdominal breathing? (2) Do speakers, who switch from a matter-of-fact to a charismatic presentation mode, automatically use more abdominal breathing in the latter mode? (3) Are speakers, who switch from matter-of-fact to charismatic presentation mode, perceived the more charismatic the more they increase their abdominal breathing proportion in the latter mode? The results on the signal-related questions (1)–(2) are already available and described in detail below. The perception-related question (3) is the subject of the present chapter.
Barbosa et al. (Reference Barbosa, Niebuhr and Neitsch2019) had 18 formally trained public speakers (9m, 9f, more details on the nature of training are provided below) perform the same investor-pitch presentation. The pitch (pre-formulated on paper) had a persuasive style and contained both overt expressions of persuasion as cross-linguistically examined by Fuchs (Chapter 7, this volume) and instances of the Charismatic Leadership Tactics (CLTs) summarized in Antonakis et al. (Reference Antonakis, Bastardoz, Jacquart and Shamir2016).
During the performances, Barbosa et al. synchronously recorded the speakers’ speech and breathing signals, the latter with two separate sensor belts wrapped around chest and abdomen. The corresponding measurement system is called RespTrack (Heldner et al., Reference Heldner, Włodarczak, Branderud and Stark2019), a device that also allows the setting of an individual recording gain for the chest and abdomen signals. Special breathing exercises ensure that, after putting on the belts, a constant relative gain of the two signals can be set before the recording starts. The speech signal was recorded via a headset microphone, see the photo in Figure 12.1. The age of the speakers (22–49 years) was chosen with reference to Huberand and Stathopoulos (Reference Huberand, Stathopoulos and Reford2015) such that no age-related artifacts in breathing behavior was to be expected.

Figure 12.1 Example of investor-pitch elicitation within the sound-treated area of the CIE Acoustics Lab.
The 18 speakers performed the investor pitch in four conditions, all being implemented as separate sessions with about one week in between. The four sessions (i.e., conditions) represented two factors with two levels each: Posture, that is, either sitting or standing; and Style, namely a performance of the investor pitch in either an emotionally neutral matter-of-fact news-reading style or in an enthusiastic, charismatic investor-oriented style. Thus, the four investor-pitch recording sessions were: news-reading style performed sitting, news-reading style performed standing, charismatic style performed sitting, and charismatic style performed standing. Session order was randomized across speakers, and performances were not repeated in a session.
All speakers had previously taken part in a formal public-speaker training, which was offered as part of an employee further-education programFootnote 1 and which did not include breathing exercises, but focused on prosodies and gestures – and in which the concept of ‘charismatic speech’ was explicitly explained and defined as a competent, self-confident, and passionate way of speaking. So, on the persuasion-to-coercion continuum spanned by Sorlin in Chapter 2 (this volume), charismatic speech is clearly more on the side of persuasion, but basically undefined with respect to its additional manipulative and/or seductive character. However, given that charismatic speakers essentially aim to “attract and retain followers” (Rosenberg & Hirschberg, Reference Rosenberg and Hirschberg2009, p. 640) and assuming that charisma tactics and signals alone suffice to win over people, using means of manipulation and/or seduction is probably to some degree incompatible with charismatic skills. This is all the more true when the speakers or actors are not humans but robots – whose charisma signals have similar effects on humans as in human-human interactions, see Langedijk and Fischer (this volume) and the references therein.
Barbosa and Niebuhr (Reference Barbosa, Niebuhr, Elmentaler and Niebuhr2020) tested whether the switch from a matter-of-fact to a charismatic presentation mode would be accompanied by an increased breathing amplitude and, if so, whether a disproportional shift toward more abdominal breathing would occur and to what degree such a shift would be specific to posture. In addition, it was tested whether such a shift would only or primarily show up for those speakers who managed to significantly increase their acoustic performance in the charismatic presentation mode.
Breathing amplitude here refers to the magnitude of chest and/or abdomen expansion measured by the RespTrack device. Both the chest and the abdomen signals were after each session converted into Praat sound objects (i.e., waveforms) that preserved the exact contours of the original signals. Accordingly, the physical measurements of the signals are (arbitrarily) given in dB units below, without losing their analyzes and interpretation values as breathing patterns related to changes in geometry.
Figure 12.2 shows the average difference in breathing amplitude between matter-of-fact and charismatic investor-pitch presentations for all 18 speakers, separately for the 2 posture conditions ‘sitting’ and ‘standing.’ The figure shows very clearly that we found the opposite of what would have been expected based on the statements of rhetorical guidebooks and trainers. On the one hand, it is consistent with these statements that switching from the matter-of-fact to the charismatic presentation mode means an overall increase in breathing amplitude, that is, values in Figure 12.2 are mostly ≥ 1. On the other hand, the main breathing activity does not shift to the abdomen. Instead, it is chest breathing that benefits disproportionately from the overall increase in breathing amplitude. When sitting, the breathing amplitude at the chest increases by on average 2.1 dB, and by as much as 2.7 dB when standingFootnote 2. In comparison, the increase in abdominal breathing amplitude is only about half as large (1.5 dB) and just over 0.5 dB when sitting. In fact, for the latter posture, a significant part of the change in mean abdominal breathing amplitude is negative (Figure 12.2, section (a)). That is, when performing their charismatic investor pitch while sitting, many especially male speakers not only intensified their chest breathing, they also reduced their abdominal breathing.

Figure 12.2 Violin plots of how speech-breathing changes (in dB) from matter-of-fact to charismatic presentation mode when speakers (N = 18) are sitting (left) and standing (right). Changes at chest (grey) and abdomen (white) are shown in separate plots. Dashed lines show 0 dB, i.e., no change in breathing due to presentation mode, dotted lines show the average change at the chest level.
Accordingly, Barbosa et al. (Reference Barbosa, Niebuhr and Neitsch2019) only found a significant increase in breathing amplitude from matter-of-fact to charismatic for chest breathing. They also point out that this significant increase is primarily due to the male speakers, who generally used significantly larger breathing amplitudes than the female speakers. Furthermore, as is obvious from Figure 12.2 sections (a)−(b), posture did not emerge as a significant factor for explaining breathing behavior in investor-pitch presentations.
Next, Barbosa and Niebuhr (Reference Barbosa, Niebuhr, Elmentaler and Niebuhr2020) acoustically analyzed in a follow-up study the vocal performances of the 18 speakers. The focus was on parameters that were identified by independent studies as significantly positively correlated with perceived speaker charisma: mean f0, f0 range, f0 variability (standard deviation), maximum f0 (all f0 parameters were gender-normalized on a semitone basis), spectral emphasis (a measure of spectral slope) and speaking rate (Biadsy et al., Reference Biadsy, Rosenberg and Carlson2008; Niebuhr et al., Reference Niebuhr, Skarnitzl and Tylečková2018; Niebuhr & Skarnitzl, Reference Niebuhr and Skarnitzl2019; Rosenberg & Hirschberg, Reference Rosenberg and Hirschberg2009; Strangert & Gustafson, Reference Strangert and Gustafson2008). Note that, unlike traditional spectral-slope measures, which correlate negatively with charisma (Niebuhr et al., Reference Niebuhr, Skarnitzl and Tylečková2018), spectral emphasis does not relate low-frequency spectral sections to high-frequency spectral sections, but rather subtracts the SPL (Sound Pressure Level) of the entire frequency spectrum from the f0’s SPL (Eriksson et al., Reference Eriksson, Barbosa and Åkesson2013). It is therefore positively correlated with charisma. Spectral emphasis is a good estimate of the speech signal’s perceived loudness (Heldner, Reference Heldner2003).
The results of Barbosa and Niebuhr (Reference Barbosa, Niebuhr, Elmentaler and Niebuhr2020) show a considerable increase in all parameters from matter-of-fact to charismatic presentation performance both within and across the 18 speakers. Regarding the relative magnitude of these increases, the largest one concerns spectral emphasis, which almost doubles from matter-of-fact to charismatic performances. The speaking rate shows an increase of about 10–15 percent. F0 parameters increase by 4–8 semitones, independently of the specific measure. Figure 12.3 gives an overview of selected findings. Note that some measurement distributions appear somewhat bimodal. This is due to integrating posture and gender differences. These variables are therefore separately represented in the statistical models of Barbosa and Niebuhr (Reference Barbosa, Niebuhr, Elmentaler and Niebuhr2020).

Figure 12.3 Illustration of some of the increases in charisma-inducing acoustic-prosodic parameters found for the switch from matter-of-fact to charismatic presentation mode in the study of Barbosa and Niebuhr (Reference Barbosa, Niebuhr, Elmentaler and Niebuhr2020). Each violin plot represents 18 speakers.
While all speakers, in line with their extensive experience and training in public speaking, managed to implement the instruction and performed significantly better in the charismatic condition, Barbosa and Niebuhr (Reference Barbosa, Niebuhr, Elmentaler and Niebuhr2020) do not find a single significant link between increased acoustic parameters and changes in speech breathing. Based on Pearson Product Moment Correlation Coefficients (PMCCs), the significance threshold for a sample of N = 18 is r[16] = 0.468. The PMCCs reported by Barbosa and Niebuhr (Reference Barbosa, Niebuhr, Elmentaler and Niebuhr2020) are all significantly lower. The only noticeable observation is that all PMCCs are positive. The strongest correlations concern mean f0 (chest: r = 0.27; abdomen: r = 0.31), f0 variability (chest: r = 0.15; abdomen: r = 0.26) and spectral emphasis (chest: r = 0.36; abdomen: r = 0.28). All other correlations remain at r < 0.1. That is, Barbosa and Niebuhr (Reference Barbosa, Niebuhr, Elmentaler and Niebuhr2020) could not find any empirical evidence for a natural connection between more a charismatic acoustics on the one hand and a stronger abdominal (or chest) breathing on the other.
2 Present Study: Breathing and Perceived Speaker Charisma
According to Barbosa et al. (Reference Barbosa, Niebuhr and Neitsch2019) and Barbosa and Niebuhr (Reference Barbosa, Niebuhr, Elmentaler and Niebuhr2020), the answers to questions (1) and (2) are negative. Compared to a matter-of-fact (news-reading like) presentation style, a more charismatic presentation style does not result in a disproportionate increase in abdominal breathing relative to chest breathing. This holds true for female speakers and even more so for male speakers who showed a statistically significant change in the opposite direction. Moreover, although speakers did not shift their breathing toward the abdominal component in the charismatic mode, their acoustic prosodies did shift significantly and considerably toward stronger charisma cues such as to higher f0 maxima, mean, and standard deviation values along with higher values for spectral emphasis, an acoustic correlate of vocal effort (Traunmüller & Eriksson, Reference Traunmüller and Eriksson2000) and loudness (Heldner, Reference Heldner2003). This empirical picture is taken up in the present study. Charisma is not an acoustic phenomenon. It is a perceptual phenomenon. It is true that perceived speaker charisma can now be estimated very precisely from the prosodic acoustics (Amari et al., Reference Amari, Okada, Matsumoto, Sadamitsu, Nakamoto and Meiselwitz2021; Chen et al., Reference Chen, Feng, Joe, Leong, Kitchen and Lee2014; Niebuhr, Reference Niebuhr2021; Wörtwein et al., Reference Wörtwein, Chollet, Schauerte, Morency, Stiefelhagen and Scherer2015), especially when the wording is identical and the perception is limited to an audio signal. Nevertheless, the possibility remains that, due to unknown influencing factors, there is a positive connection between abdominal breathing and charisma that could not be discovered by the acoustic analyzes of Barbosa and Niebuhr (Reference Barbosa, Niebuhr, Elmentaler and Niebuhr2020). Therefore, in the present study, a cross-check is carried out in which acoustic measures are replaced by the speaker-charisma ratings of human listeners.
2.1 Method
2.1.1 Listeners
The listener sample consisted of 21 participants, 12 men and 9 women, between 29 and 45 years old. All listeners were fluent non-native speakers of English (like the speakers) and experienced in evaluating investor-pitch performances, for example, as lecturers, founders, or business angels. However, all were naïve about the experiment insofar as they had neither phonetic nor rhetorical training, nor did they know anything about the experiment’s background, motivation, and independent variables. The listeners’ age range was deliberately chosen in view of Jokisch et al. (Reference Jokisch, Iaroshenko, Maruschke and Ding2018) whose findings suggest that 30–55-year-old listeners rate speaker charisma more similarly than significantly younger or older listeners. Listeners were not paid. But, as a reward for their participation, they received a free, individual information session (from the first author) in which they were sensitized to the acoustic mechanism of perceived speaker charisma and its possible influence on investment decisions.
2.1.2 Stimuli
The investor-pitch performances of the 18 speakers (9m, 9f), whose breathing and prosody patterns Barbosa and Niebuhr (Reference Barbosa, Niebuhr, Elmentaler and Niebuhr2020) had analyzed (see 1.2 above), served as stimuli in the perception experiment. Thus, the stimulus language was (non-native) English. Previous studies showed (e.g., Biadsy et al., Reference Biadsy, Rosenberg and Carlson2008) that listeners can make systematic ratings of perceived speaker charisma in a language that is not their mother tongue (all listeners were fluent, non-native speakers of English who use English to a large extent in their professional daily life, see 2.1.1), and that these ratings can fit in with native-speaker perceptions, when the prosodic phonologies are close (Niebuhr & Silber-Varod, Reference Niebuhr and Silber-Varod2021), as was the case here. Nevertheless, it must be assumed that relating non-native speakers’ stimuli to non-native listeners’ ratings always create a certain amount of statistical noise due to language-specific prosodic details and stereotypical attributes associated with foreign accents (Planchenault & Poljak, Reference Planchenault and Poljak2021). However, at least such stereotype attributions also apply to regional and social varieties within a language and are, thus, probably in most perception experiments on speaker traits an inevitable source of variability.
As the 18 speakers’ investor pitches were all over a minute long, only a two-sentence excerpt was used: “I’ve come up with an easy way for both employees and employers to lock and keep track of hours using just their cell phones and an app I have designed. It has reduced timecard inconsistencies and paycheck errors by 90%, saving both your time and money.” With regard to the known correlations between prosodic parameters and perceived speaker charisma, these two sentences represented the most charismatic core statements of the investor pitch; and at the same time, it was that part of the investor pitch in which the speakers’ acoustic charisma cues differed the most. The particularly expressive nature of the sentences and their length also led to high breathing amplitudes, again with pronounced inter-speaker differences of up to 17 dB. In short, those two sentences were chosen as stimuli that phenomenologically provided the richest (most diverse) and thus most sensitive basis for uncovering correlations between breathing and perceived charisma. With a duration of 17–25 seconds, they were also short enough to prevent artifacts due to fatigue in the course of the experiment. Figure 12.4 shows two stimulus examples of the male speakers MON and MTO who make very little or relatively strong use of abdominal breathing, respectively. The pronounced ups and downs in the intonation curves of both speakers underline the generally expressive character of the stimulus material.

Figure 12.4 Stimulus examples of male speakers MTO (left, 22.6 s) and MON (right, 23.6 s). Displayed are from top to bottom: abdominal breathing signal (thin line), chest-breathing signal (thick line), waveform, spectrogram, and time axis.
Recall that in the production experiment, each speaker presented the investor pitch four times, due to the cross-combination of the conditions Posture (sitting vs. standing) and Style (matter-of-fact vs. charismatic). All four variants per speaker were integrated in the perception experiment. This further increased the phenomenological range of the material in terms of both breathing and acoustics. Thus, the perception experiment included a total of 18 × 4 = 72 stimuli.
2.1.3 Procedure
The experiment was conducted in a series of six separate sessions in the CIE Acoustics Lab at the University of Southern Denmark. Per session, 3–5 participants took part in the experiment in parallel at separate PC workstations. As can be seen in Figure 12.5, the individual workstations were acoustically and visually isolated from one another. The participants also wore headphones (Bose Quiet Comfort II); first, to enhance the acoustic insulation and, secondly, to create the best possible acoustic rating conditions.

Figure 12.5 Experimental setup in which the 21 listeners rated perceived speaker charisma conveyed by the 72 stimuli.
At the beginning of each session, some metadata were collected from the participants (age, gender, language, and musical skills, etc.), and a printed informed consent form was signed. Then, the start window of the experiment was shown on the screen in front of each participant. The start screen provided the participants with general background information about the experiment, in other words, number, average length, and languageFootnote 3 of the stimuli, type of speech from which the stimuli were excerpted, and so on. Charisma was defined as well, namely as speaker trait that is composed of perceived competence, self-confidence, and passion. The concept of an investor pitch was not explained as all participants were familiar with it.
When scrolling to the next screen page, participants were instructed that they were to rate each stimulus according to two questions: (1) How resonant does the speaker’s voice sound? (2) How charismatic does the speaker’s presentation sound? It was explained that ‘resonant’ is meant in the sense of “rich, sonorous, harmonious, and strong” and not in the sense of pressed, tense, thin, breathy, and soft. Charisma was to be rated according to its definition, that is, with respect to “the three pillars competence, self-confidence and passion.” In addition, the terms inspiring and rousing were used to contextualize the concept of charisma.
Finally, the participants were instructed to wait until the end of each stimulus and then to make their ratings spontaneously along two provided scales. The scales would offer six rating levels (1–6), according to the grading system in German schools. Thus, 1 would be the best and 6 the worst grade (not all participants were native German speakers, but all were familiar with the German school grading system). Ratings were to be given by ticking the respective box on the scale.
The experiment was conducted using Lime Survey. Each stimulus appeared only once in the experiment. The 72 stimuli were presented in inter-individually randomized order. The entire experiment took about 30 minutes, including instruction and metadata collection.
2.1.4 Variables and Statistics
Our study was to examine a common rhetorical practice. Its underlying assumption is that speakers can change toward more pronounced abdominal breathing, and that this change (e.g., brought about by training) would entail a stronger perceived speaker charisma. Our focus was thus on within-speaker differences. We were not interested in whether someone who uses more abdominal breathing per se sounds more charismatic – but in whether someone whose relative abdominal breathing increases from condition A to condition B is also rated more charismatically-sounding in B as compared to A. Here, A and B were the two styles in which the investor pitch had been performed, in other words, the emotionally neutral matter-of-fact news-reading style and the enthusiastic, charismatic investor-oriented style.
Accordingly, we used relative measurements as variables in our results figures and statistics. That is, we represented the changes between charismatic and matter-of-fact presentation styles as difference values, like in Figure 12.2. For example, a value of +2 dB in Figure 12.2 indicated that the breathing amplitude in the charismatic speech was 2 dB higher than in the matter-of-fact speech, no matter whether these 2 dB resulted from
Another advantage of these difference-based within-speaker measurements was that, as illustrated by the two formulas, individual absolute baseline levels could not enter the analysis as statistical noise.
The primary independent variables in our experiment were Breathing and Posture. Breathing concerned the relative amplitude differences (matter-of-fact value minus charismatic value) of either abdominal or chest breathing (2 levels). Posture concerned the relative amplitude differences (matter-of-fact value minus charismatic value) for either sitting or standing speakers (2 levels). As supplementary independent variables, and in view of the close connection between questions (2) and (3) (see Section 1.1), we have integrated acoustic measures in the statistical analysis, also in the form of difference values.
The two dependent variables were the listener ratings on the degree of resonance in the speakers’ voices and the degree of charisma in the speakers’ performances. Note that we asked for the additional resonant-voice ratings with a view to singing. As outlined in Section 1.1, there is empirical evidence that, for singing, abdominal breathing can make voices sound more resonant or pleasant (Salomoni et al., Reference Salomoni, van den Hoorn and Hodges2016; Thorpe et al., Reference Thorpe, Cala, Chapman and Davis2001). It is therefore possible that increased abdominal breathing – in terms of our listeners’ ratings – is linked to the investor pitches being performed with more resonant voices, but not to the speakers sounding more charismatically.
2.2 Results
2.2.1 Posture and Speaker
The first step in analyzing the data was to conduct a series of linear mixed-effects models with the independent variable Posture as a fixed factor and all rating and acoustic measures as dependent variables. Speakers (N = 18) were included as random effects. The significance threshold (i.e., the alpha-error level) was set to p < 0.05. The models aimed to find out, in preparation of subsequent correlation series, whether Posture significantly affected listener ratings and acoustic measurements (note that this is not a mere replication of Barbosa & Niebuhr, Reference Barbosa, Niebuhr, Elmentaler and Niebuhr2020 as we used difference values here).
Results show on average slightly higher charisma and resonant-voice ratings for standing as compared to sitting speakers as well as an increase in the mean f0 range and spectral emphasis if speakers stood rather than sat. However, none of these differences reached statistical significance. Only the difference in mean f0 range approached the significance threshold with F[1,34] = 2.1, p = 0.15, d = 0.49. This outcome is consistent with Barbosa et al. (Reference Barbosa, Niebuhr and Neitsch2019) – as well as with Flory and Nolan (Reference Flory and Nolan2015, p. 4) who concluded that “speakers have been shown to become familiar with perturbations to their vocal tract, and to be able to develop compensatory strategies.” This applies in particular to normal (i.e., highly familiar) postures like sitting and standing.
The effect of the random factor Speaker was negligible for most parameters. Especially for the f0-related parameters like range and maximum, Speaker was responsible for only between 2–5% of the models’ residual variation, namely in that variation that was not explained by the fixed factor Posture. This is very likely due to our use of experienced speakers who all implemented the matter-of-fact and charismatic investor-pitch presentation styles similarly (well). In addition, we assume that our semitone normalization was effective in removing sex-specific differences from the measurements. It is probably for the latter reason, that is, sex-specific differences, that the random factor Speaker was more relevant for the mean spectral emphasis measure, which contributed 21.7% to the respective model’s residual variation. For the two rating dimensions, the random factor Speaker also played only a minor role; yet this minor role was orders of magnitude larger for the charisma than for the resonant-voice ratings, that is, 13.3% as opposed to 1.7% of the models’ residual variation. This difference very likely reflects that rating speakers’ charisma is complex and always generates considerable inter-rater differences, compare Jokisch et al. (Reference Jokisch, Iaroshenko, Maruschke and Ding2018) and Niebuhr et al. (Reference Niebuhr, Skarnitzl and Tylečková2018). So, the variation in charisma ratings caused by our speaker sample does not undermine the validity of our data. Rather, the opposite is true. It is indicative of a representative behavior of both speakers and listeners.
In summary, the results of the linear mixed-effects model analyzes show that inter-speaker differences in implementing the matter-of-fact and charismatic investor-pitch presentations were negligible and that, at least for the trained and experienced speakers we used, Posture had no significant effects on the prosody or rating data. Thus, for the actual key statistics below, that is, series of PMCC correlation analyzes, we pooled the data of both Posture conditions and focused fully on the Breathing variable.
2.2.2 Breathing
Figure 12.6 sections (a)–(d) shows the correlations that emerged between the ratings of perceived speaker charisma on the one hand and the individual acoustic measures on the other. Recall in interpreting the figures that our analyzes were based on difference values. Accordingly, the ‘charisma rating’ values along the x-axis indicate how much higher (or lower) a speaker’s perceived charisma was rated (on a 1–6 scale) in the charismatic presentation condition as opposed to the matter-of-fact presentation condition (cf. 2.1.4). So, the two Style conditions are represented by a single value, with positive values indicating an increase and negative values a decrease in perceived charisma from matter-of-fact to charismatic performance. The y-axis shows how the respective acoustic parameter changed from the one condition to the other. For instance, a f0-range value of +6 means that the given speaker had used a 6 st larger pitch range in the charismatic presentation than in the matter-of-fact presentation condition (across the two Posture conditions that we pooled in these correlation analyzes, cf. 2.2.1).

Figure 12.6 Significant correlations (based on PMCCs) between changes in charisma ratings (matter-of-fact values minus charismatic presentation values) and changes in acoustic parameters. Each data point represents a speaker’s overall performance in both Posture conditions, i.e., N = 36.
Based on that, there were significantly positive correlations between an increase in perceived speaker charisma and increases in the speakers’ f0 variability, range, and maximum, as well as in spectral emphasis (which corresponds to a negative correlation, that is, a decrease, for most other spectral measures, see 1.2 above). The PMCCs associated with these correlations are (a) r[34] = 0.66, p < 0.001, (b) r[34] = 0.46, p < 0.01, (c) r[34] = 0.54, p < 0.001, (d) r[34] = 0.68, p < 0.001. Thus, f0 variability and spectral emphasis were most strongly correlated with perceived speaker charisma; f0 range and maximum correlated more weakly with perceived speaker charisma. No significant correlations emerged for mean f0 level and speaking rate (with r < 0.3). Comparing magnitudes of the significant correlations in z tests showed that the link between the charisma ratings and the effort/loudness measure of spectral emphasis was stronger than the links between the charisma ratings and the f0-related measures, especially f0 range for which the difference in strength approached significance (z = 1.34, p = 0.089).
Note that it is also interesting where the estimated correlation line crosses the y-axis. For all f0 parameters, this intersection is at a positive value. The value is about 1 st in the case of the parameters based on local measurements, that is, f0 range and maximum. For the distributional parameter f0 variability, it is 0.33 st. This means that, first, not every tiny parameter improvement triggers an increase in perceived speaker charisma. For example, as long as speakers do not expand their f0 range by at least 1 st or more from their matter-of-fact to their charismatic presentations, they will not be perceived as more charismatic by listeners. Second, as regards the magnitude of the required parameter improvement, the values that crystallize here match well with the pitch-related Just Noticeable Differences (JNDs) of human listeners, see Niebuhr et al. (Reference Niebuhr, Reetz, Barnes, Alan, Gussenhoven and Chen2020) for an overview. Furthermore, although spectral emphasis is particularly closely correlated with perceived speaker charisma, it does not need to increase itself in order to raise people’s perceived charisma. Both are consistent with previous studies. The close, sex-independent correlation of spectral emphasis (and similar spectral-slope measures) with charisma has already been stressed by Niebuhr and Skarnitzl (Reference Niebuhr and Skarnitzl2021); and that spectral emphasis itself does not have to increase is probably related to the interaction of such effort/loudness measures and f0 measures (cf. Signorello et al., Reference Signorello, Demolin, Bernardoni, Gerratt, Zhang and Kreiman2020). That is, increases in f0 parameters can already make speakers sound more charismatic before an increase in spectral emphasis comes in.
Unlike the charisma ratings, the resonant-voice ratings were less strongly guided by the measured prosodic parameters. A correlation analysis parallel to the one in Figure 12.6 sections (a)–(d) yielded only a single significant outcome: the f0 variability was positively correlated with resonant-voice ratings. That is, the more variable a speaker’s intonation the more resonant-sounding his/her voice was rated (r[34] = 0.38, p < 0.05). That lower mean f0 levels and higher spectral-emphasis levels were linked to more resonant-sounding voices only emerged in the data as weak significant trends (r[34] = −0.27, p = 0.1; r[34] = 0.28, p < 0.1).
Sections (a)–(d) in Figure 12.7 summarize the results of the Breathing variable. The difference in breathing amplitude (in dB) between matter-of-fact and charismatic performance (y-axis) is plotted against the rating difference from matter-of-fact to charismatic (x-axis). In sections (a)–(b) of Figure 12.7 the rating differences refer to perceived speaker charisma, whereas sections (c)–(d) of Figure 12.7 show the differences in resonant-voice ratings. As can be seen, an increase in breathing amplitude is overall beneficial for both how charismatic speakers are perceived and how resonant their voices sound. The latter ratings are, moreover, less strongly related to breath amplitude than the charisma ratings, with PMCCs being r[34] = 0.62 (p < 0.001) and r[34] = 0.57 (p < 0.001) at abdomen and chest, respectively. For the charisma ratings we found a correlation of r[34] = 0.73 (p < 0.001) between increases in chest breathing and increases in perceived speaker charisma from the matter-of-fact to the charismatic presentation performance. But there is one notable exception: abdominal breathing was correlated neither positively nor negatively with perceived speaker charisma. The corresponding PMCC of r[34] = 0.17 is far from even being a significant trend.

Figure 12.7 Significant correlations (based on PMCCs) between changes (matter-of-fact values minus charismatic presentation values) in charisma ratings (a/b, top) and resonant-voice ratings (c/d, bottom) and changes in abdominal (a/c) and chest (b/d) breathing amplitudes. Each data point represents a speaker’s overall performance in both Posture conditions, i.e., N = 36.
In general, rating differences from the matter-of-fact to the charismatic performance were mainly positive, in other words, most datapoints in Figures 12.6 and 12.7 were on the right side of the y-axis. This means that speakers were rated better in the charismatic than in the matter-of-fact condition, namely as sounding more charismatic and as having a more resonant voice. Moreover, as was shown in Figure 12.2 already, switching to the charismatic performance made almost all speakers increase their abdominal and chest-breathing amplitudes.
We also pointed out in the corresponding Section 1.2 that Barbosa and Niebuhr (Reference Barbosa, Niebuhr, Elmentaler and Niebuhr2020) observed the overall increase in chest-breathing amplitude from matter-of-fact to charismatic presentations mainly for male speakers. Taking up that observation here, we conducted a post-hoc t-test to investigate whether this sex-specific behavior also manifested itself as a charismatic ‘gender gap’ in our perception data. In fact, we find a marginally significant difference in perceived speaker charisma in favor of male speakers. Their increase in charisma from the matter-of-fact to charismatic presentation performance was about 0.3 scale points larger than for the female speakers (t[34] = 1.597, p = 0.059, Cohen’s d = 0.588). This is also interesting because it raises the further question of whether the often reported ‘gender gap’ in charisma (Gutnyk et al., Reference Gutnyk, Niebuhr and Gu2021; Jokisch et al., Reference Jokisch, Iaroshenko, Maruschke and Ding2018) is a social phenomenon or – at least in part –a ‘self-induced’ problem due to sex-specific breathing habits in which women (perhaps in order to comply with voice-reflected social roles; Aung & Puts, Reference Aung and Puts2020) rely more strongly on abdominal breathing, even under circumstances that require a charismatic performance. We will flesh out this idea further in the discussion below.
2.3 Discussion
Research over the last decade has gained many new insights into how the frequency and amplitude of breathing and the relative durations of inhalation and exhalation look like in different contexts of speech communication, also thanks to new instruments like the RespTrack system (Heldner et al., Reference Heldner, Włodarczak, Branderud and Stark2019). However, which breathing type – chest or abdominal breathing – dominates under which conditions has played hardly any role so far. That is exactly a core matter of rhetoric and public-speaker or media training, though. Based on the concept of perceived speaker charisma (cf. 1.1), Barbosa et al. (Reference Barbosa, Niebuhr and Neitsch2019) and Barbosa and Niebuhr (Reference Barbosa, Niebuhr, Elmentaler and Niebuhr2020) have taken on this applied question and were able to find clear evidence in both the breathing signals and the acoustic signals that, indeed, “breathing is the foundation of a good delivery,” as claimed, amongst others, by Atkinson (Reference Atkinson2004, p. 360). A higher breathing amplitude combined with increased breathing dynamics (e.g., shorter exhalation phases) characterizes more charismatic speech – and is, moreover, associated with stronger acoustic-prosodic charisma cues like higher levels of f0 mean, range, and variability, higher spectral-emphasis values, and so on. However, neither on the level of breathing nor on the level of acoustics could a single piece of evidence be found for the core claim that has shaped public-speaking seminars and rhetorical guidebooks ever since: abdominal (‘belly’) breathing is an essential requirement for a more charismatic vocal performance.
The present study aimed to look for such evidence at the level of perception. In addition, because there is also consistent evidence that abdominal breathing improves singers’ performances in terms of louder, richer, and more controlled voices, the listeners in our experiment rated not only how charismatic speakers sounded, but also how resonant their voices were. Professional singers, unlike untrained singers, adapt their breathing behavior while singing such that abdominal breathing dominates, see Salomoni et al. (Reference Salomoni, van den Hoorn and Hodges2016). The study further reported that “these adaptations have been associated with changes in sound power spectrum and may have implications for voice quality, as commonly advocated by classical singing teachers and professionals” (Salomoni et al., Reference Salomoni, van den Hoorn and Hodges2016, p. 16). The present perception experiment is, over and above its main aim, the first to provide supporting evidence for these voice-quality related assumptions of Salomoni et al. (Reference Salomoni, van den Hoorn and Hodges2016) – but outside the area of sung speech.
That is, consistent with Salomoni et al. (Reference Salomoni, van den Hoorn and Hodges2016), we found that listeners’ ratings of how resonant the speakers’ voices sounded were positively correlated with the speakers’ breathing amplitudes for both chest and abdomen, but stronger for the abdomen than for the chest. So, speakers who breath in more and, in this, make especially more extensive use of abdominal breathing, are perceived to have a more resonant voice.
Crucially, there was no similarly positive evidence in connection with perceived speaker charisma. While we do find that more chest breathing is associated with higher degrees of perceived speaker charisma, this does not apply to the abdominal breathing, in other words, that type of breathing that is so strongly recommended in books and training courses. On the other hand, this asymmetry makes our perception findings agree with those of production: making speakers switch from a matter-of-fact to a charismatic presentation mode, all else equal, triggered a disproportional increase in chest breathing; and it was this significant shift in breathing behavior that was associated with the more charismatic acoustics voice profiles (cf. Section 1.2), especially for the male speakers. In a nutshell, our present findings can, in combination with previous evidence, be summarized in the conclusion: “Abdominal breathing may lend speakers a more resonant voice, but it is chest breathing that makes speakers sound more charismatic.” It seems that the extrapolation of rhetoric that what is good for singing must also be good for speaking was premature – or outdated, for example, because today, thanks to modern speech-enhancement and microphone technology, one no longer has to speak as loudly as in the days of Cicero and Plato.
From a practical point of view, this would mean that public-speaking seminars and guidebook chapters that dedicate entire sessions or pages to training ‘belly’ breathing probably overestimate its contribution to a speaker’s performance. Intensively training ‘belly’ breathing could even do indirect harm in that it takes away learners’ time and mental capacities from more effective exercises, such as those that help speakers increase their f0 variability or vocal effort/loudness, the latter of which (i.e., spectral emphasis) was not only most strongly correlated with charisma ratings, it also represents a feature that, like gestures and tempo, can be effectively trained, see, for example, the method suggested by Frese et al. (Reference Frese, Beimel and Schoenborn2003) and the more recent VR-based approach of Selck et al. (Reference Selck, Albert and Niebuhr2022).
Finally, two points of our study need to be taken up again here. The first one concerns the factor Posture. We omitted references to Posture in our main results section on breathing (Section 2.2.2) because the difference between sitting and standing turned out to have significant effects neither on the acoustic-prosodic charisma cues nor on the listeners’ ratings of the speakers’ performances. What seems surprising at first glance is actually consistent with major studies. While there is occasional and often anecdotal evidence that a standing posture in speeches enhances the impact speakers have on their audiences, experimental studies like that of Flory and Nolan (Reference Flory and Nolan2015) mainly show the opposite and stress that speakers can physiologically compensate for any changes in posture very quickly. In addition, in our study, we used trained, experienced speakers who also made pronounced gestures when presenting while sitting. Gestures are closely related to charisma-relevant prosodic features. For example, Cravotta et al. (Reference Cravotta, Busà and Prieto2019) showed that encouraging speakers to gesture more led to higher pitch and intensity peaks (related to sentence ends or stressed words) and to a generally higher speaking volume, compare also Pouw et al. (Reference Pouw, de Jonge‐Hoekstra, Harrison, Paxton and Dixon2021) on how gesture and prosody are linked at a basic physiological level. So, maybe the factor Posture would have played a bigger role in our study if we had used more naïve speakers who, amongst others, had used no or fewer gestures when giving their investor-pitch presentations sitting rather than standing.
The second point concerns our female speakers’ breathing. Some studies found that, compared to male speakers, female speakers use a higher proportion of chest than abdominal breathing (Binazzi et al., Reference Binazzi, Lanini, Bianchi, Romagnoli, Nerini, Gigliotti, Duranti, Milic-Emili and Scano2006; Sharp et al., Reference Sharp, Goldberg, Druz and Danon1975). These results seem to be at odds with the results of the present study. To a certain extent there is a contradiction, but one with a number of possible explanations. First, many previous studies deal with the relative proportion of chest and abdominal breathing, whereas our results relate to the absolute respiratory amplitudes, measured separately for chest and abdomen. Second, the analysis of Kaneko and Horie (Reference Kaneko and Horie2012) suggests that women’s higher level of chest breathing is an effect of body size rather than sex (and all our nine male and female speakers were similarly tall). Third, studies like those cited above often relate to quiet breathing or breathing during non-communicative tasks such as reading song lyrics and producing isolated or nonsense sentences. It is likely that, in communicative tasks such as that of the present study (presenting an investor pitch to a small audience), additional socio-phonetic factors come into play that make men and women strive for a different voice quality and volume/effort settings. In consequence, men and women apply different breathing patterns – patterns that give men a more charismatic voice (based on enhanced chest breathing) and women a more resonant voice (based on enhanced abdominal breathing). This assumption probably represents one of the most interesting, cross-disciplinary questions for future research.
3 Conclusion and Outlook
Is abdominal breathing really as important for a charismatic public-speaking performance as rhetoric books and trainers make us believe? Our research raises legitimate doubts as to whether this question can be answered positively; and, thus, calls into question common rhetorical practice. In view of this far-reaching implication, it seems imperative to increase the generalizability of our findings by using, in a follow-up study, a larger and more diverse sample of speakers. Our sample size of N = 18 is substantial and sex balanced, and we pointed out several aspects that support the validity of our data. Yet, we used experienced speakers who presented (without apparent anxiety) to small audiences and whose performance was limited to a brief investor pitch. In principle, each of these factors could have an influence on the role of abdominal breathing and must be examined in future experiments.
Particularly the factor public-speaking anxiety (PSA) could play an important role. Breathing and stress are closely linked, see Kimani et al. (Reference Kimani, Shamekhi and Bickmore2021). We currently carry out a follow-up study in which we collect the PSA scores of 21 speakers in a real conference setting, along with the RespTrack and audio signals of their presentations. Preliminary results largely confirm the conclusions made here. However, they also show that particularly anxious speakers – if they primarily rely on chest breathing – tend to breathe increasingly fast and shallow over the course of their presentation. In brief, there is evidence that, while chest breathing is associated with more perceived speaker charisma, it also increases the risk of hyperventilation in highly anxious speakers, with all its negative consequences such as dizziness and reduced attention, concentration, and psychomotor performance (Gilbert, Reference Gilbert1998). So, it seems that, while the more forceful, dynamic chest breathing gives less anxious speakers their vocal-charisma advantage, it is to the detriment of more anxious speakers. These preliminary results would not change the conclusion that conducting abdominal breathing exercises on a blanket one-serves-all basis in public-speaking and media courses is questionable. However, the new results could mean that such exercises could be useful for selected, very anxious speakers, also in line with Howe and Dwyer (Reference Howe and Dwyer2007).
A final point worth following up on in future studies is the question of whether the ‘gender gap’ in perceived speaker charisma (cf. Jokisch et al., Reference Jokisch, Iaroshenko, Maruschke and Ding2018; Niebuhr et al., Reference Niebuhr, Skarnitzl and Tylečková2018) is merely a social phenomenon, or whether the more strongly thoracically shaped breathing behavior of male speakers contributes to this gap. There is a need for further situationally, psychologically (e.g., PSA) and culturally differentiated speaker samples to investigate this question comprehensively. In general, there can be no doubt that we are only just beginning to explore and understand the foundations and limitations of the charisma myth of ‘belly’ breathing.
1 Introduction
Robots are being increasingly developed for healthcare applications. In such applications, robots can profit from being persuasive; for instance, to help people take their medication (e.g., Schweitzer & Hoerbst, Reference Schweitzer and Hoerbst2016), do their exercises (e.g., Görer et al., Reference Görer, Salah and Akın2017), or acquire new habits (Jelınek & Fischer, Reference Jelınek and Fischer2021). To achieve persuasiveness in robots, researchers have experimented with various different strategies; one such strategy is to use persuasive utterances (e.g., Winkle et al., Reference Winkle, Lemaignan, Caleb-Solly, Leonards, Turton and Bremner2019). Other methods are less direct and rely, for instance, on the personalization of the robot’s behavior (Ham, Reference Ham2021) or on the coupling of different modalities, like speech and gaze (Fischer et al., Reference Fischer, Langedijk, Nissen, Ramirez and Palinko2020; Ham et al., Reference Ham, Bokhorst, Cuijpers, van der Pol and Cabibihan2011). Other studies have explored the effects of the robot’s speaking style (Fischer et al., Reference Fischer, Niebuhr, Jensen and Bodenhagen2019) or discourse cues; for instance, Torrey et al. (Reference Torrey, Fussell and Kiesler2013) created small changes in the speech of the robot regarding the presence or absence of hedges and discourse markers and found significant effects on how the robot was evaluated when it used hedges and discourse markers such that the robot was perceived as less controlling and as more considerate and likeable as a helper when it used discourse and hesitation markers. Thus, the speech-related behaviors of robots have been found to be able to influence people, ranging from what the robot says to how it says it, both pragmatically and prosodically (Fischer et al., Reference Fischer, Langedijk, Nissen, Ramirez and Palinko2020; Fischer et al., Reference Fischer, Niebuhr and Alm2021) – and this is also where results from other chapters in this volume might become particularly relevant in their potential transfer to human-robot interaction (e.g., Niebuhr & Barbosa, Chapter 12, on prosody; Humă, Chapter 3, on specific lexical choices).
In this chapter, we present an experimental study in which we test the effects of two persuasive utterances, specifically two different appeals to expertise. The study traces how a robot succeeds in persuading people to drink more water.
2 Background and Motivation
Guiding people subtly and unobtrusively into behaviors that are to their own benefit has become a vibrant research area in the health sciences, in social psychology and in various technological fields (e.g., Fogg, Reference Fogg2002; Kaptein, Reference Kaptein2015; Thaler & Sunstein, Reference Thaler and Sunstein2009). One resource used in this research is the six strategies of influence defined by Cialdini (Reference Cialdini1987). We have used these principles as starting points for our human-robot experiments on persuasive dialog. The principles are widely used, but there is not much systematic work on their effectiveness, and only few studies that apply these principles to human-robot interaction (e.g., Lee & Liang, Reference Lee and Liang2019; Sandoval et al., Reference Sandoval, Brandstetter, Obaid and Bartneck2016; Winkle et al., Reference Winkle, Lemaignan, Caleb-Solly, Leonards, Turton and Bremner2019). Cialdini defines “six main roads to change” (Reference Cialdini2016, p. 151), which rely on mental shortcuts, which in human evolution have been proven mostly useful, but which can also be exploited by ‘compliance practitioners’. While these six main roads to change are general strategies of influence that can be instantiated in many ways, they can also be evoked by simple cues. This, in turn, means that a single message can be powerful enough to activate the mental shortcut.
The first principle is commitment and consistency; it relies on the fact that we strive for a coherent identity such that “we want to be (and to be seen) as consistent with our previous commitments” (Cialdini, Reference Cialdini2016, p. 168), which makes us more predictable for ourselves and others. In human-robot interaction, this principle is investigated by Lee and Liang (Reference Lee and Liang2019) who found that a robot was more persuasive when it used the ‘foot-in-the-door’ principle than when it comes out with its full request right away: first, the robot makes a small request (complete 30 tasks for 15 minutes) and then the larger request (complete 50 tasks for 25 minutes). The explanation is that since participants had already complied with the first request, they were more consistent with their own behavior if they complied with the larger request.
The second principle is scarcity; for instance, if only a few items are available, the product appears to be more attractive, and people want it more. Cialdini (Reference Cialdini1987) calls this principle ‘the rule of a few’. To our knowledge, no studies on human-robot interaction have addressed this principle, but it is very common in advertising (e.g., ‘just three items left in stock’) and in online marketing (cf. Kaptein, Reference Kaptein2015).
The third principle is reciprocity; people want to repay the other for what they have provided for us. This principle is also known as the ‘give and take’ principle. Lee and Liang (Reference Lee and Liang2016) show how reciprocity could be used in human-robot interaction. In their study, the robot and the participants played a game together. In one condition, the robot’s utterances were helpful, in the other condition, the utterances were not helpful. After the game, the robot asked the participants for help. The helpful robot received more help from participants compared to the unhelpful robot. Similarly, Sandoval et al. (Reference Sandoval, Brandstetter, Obaid and Bartneck2016) investigated reciprocity in a game-scenario as well. Their results show that participants collaborated more with the human agents but were equally reciprocal with both human agents and robots.
The fourth principle is social proof; people tend to look at others in order to plan their own behavior. A study by Goldstein et al. (Reference Goldstein, Cialdini and Griskevicius2008) explores the personalization of social proof concerning towel re-use in a hotel; their results show that “the more important a social category is to an individual’s social identity, the more likely he or she will be to follow the norms of that category” (Goldstein et al., Reference Goldstein, Cialdini and Griskevicius2008, p. 478). In other words, the more specifically one can address the communication partner, the more effective is the message. In our own previous work, we investigated this principle and found significant results of social proof on people’s water intake in human-robot interaction (Langedijk et al., under review). In another human-robot interaction study conducted by Salomons et al. (Reference Salomons, Van Der Linden, Strohkorb Sebo and Scassellati2018), they found that participants changed their initial belief if a group of robots communicated a different belief than the participant. The authors measured how many times participants changed their initial choice based on the robots’ answers. In one condition, the participants did not see the robots’ answers, whereas in the other condition, the participants saw the answers given by the robots and could change their initial answer, which many of them did.
The fifth principle is liking. This principle comprises several factors, such as, for instance, the physical attractiveness of, or similarity with, the persuader, which is likely to influence one’s choice. Compliments also evoke the principle of liking, as well as expressing positive feelings toward the other. Examples from human interactions comprise the sending of Christmas cards to customers or the use of phrases like ‘our most valued customer’. Winkle et al. (Reference Winkle, Lemaignan, Caleb-Solly, Leonards, Turton and Bremner2019) applied the principle in a human-robot interaction study. They investigated the effect of displays of expertise and goodwill and emphasizing similarity with the user in a study with 92 participants. The robot asked the participants to carry out therapeutic exercises, and the authors find significant effects on the display of goodwill and of emphasizing similarity on participants’ compliance.
This brings us to our own study and the last principle, namely authority. One way of gaining authority is expertise, which may be alluded to by particular symbols, for instance, by white lab coats or uniforms (see also Dayter & Rüdiger, this volume, for the role of expertise in persuasion). The principle of authority thus comprises real educated specialists, like doctors and scientists, but may also be exploited by people who are only associated with such expertise, like people wearing lab coats, or actors being known for a role as a doctor, which is then exploited in advertising of allegedly healthier products. Regarding robots, Andrist et al. (Reference Andrist, Spannan and Mutlu2013) found that linguistic cues to expertise make robots more persuasive (i.e., the sender). In contrast, Winkle et al. (Reference Winkle, Lemaignan, Caleb-Solly, Leonards, Turton and Bremner2019) did not find any effects on expertise; in their experiment, however, the cue to expertise may have been too subtle by saying that the robot was programmed by an expert (i.e., a third party), or it may have interfered with people’s beliefs about the autonomy of the robot.
In the current experiment, we address expertise explicitly and distinguish between appeals to the participants’ own professional expertise (i.e., the recipient) and appeals that concern other people’s expertise, namely the expertise of some unnamed researchers (i.e., a third party), where previous work on advertising has shown that even unspecific references to scientific research may affect consumers’ behavior (Van Mulken & Hornikx, Reference Van Mulken and Hornikx2011).
In particular, our research questions are:
Does appealing to users as experts have an effect on their water intake?
Does a reference to research findings (expert opinion) have an effect on participants’ water intake?
How does care personnel react to a robotic solution like ours?
That is, both interventions tested the effects of evoking expertise, which instantiates Cialdini’s principle of authority, but they do so in different ways: on the one hand, the participants are addressed as experts themselves by appealing to their professional identity and hence position of authority, on the other, findings from other experts as representatives of the scientific community and hence authorities are alluded to.
The experiment is carried out in a playful scenario, in which a robot guides a participant around the Living Lab of the local municipality and helps him/her to find and collect objects to set a table. In the experimental conditions, the robot produces the respective intervention and tells the participant about the importance of water intake. The experiment was conducted in collaboration with the local municipality, and the participants were all healthcare personnel.
3 Method
We designed a controlled lab study to investigate the research questions described above. The experiment was carried out in such a way that every participant only saw one of the three conditions (between-subject design). There is a baseline condition, in which no persuasion is used, and there are two persuasion conditions, one with an appeal to the participants’ expertise and one with an appeal to research findings.
3.1 Study Design: Our Experimental Conditions
Participants were asked to find and collect several objects together with the robot, like a plate, a napkin, and a placemat. The manipulation takes place when the robot instructs the participants to pick up a glass; here the robot tells them about the importance of water intake and then, depending on the condition, produces the respective intervention. In the first persuasion condition, the robot says: “Research shows that it is important to drink enough water. Most participants drink half a liter after this game.”Footnote 1 Here, we use a reference to research findings in general as an appeal to expertise; thus, a third party to which authority concerning factual information is alluded to. We will refer to this condition as the research condition. In the second persuasion condition, the robot says: “You as expert know that it is important to drink enough water. Most participants drink half a liter after this game.”Footnote 2 Here, we address the participants as experts. Thus, in this case, it is the recipient’s own expertise, which is presented in alignment with general expertise, which is evoked. Our participants were health care workers and thus indeed in possession of health-related knowledge, which however may or may not be in alignment with the general research findings. Our persuasive utterance is thus similar to the general Research condition by appealing to general scientific authority, but it is personalized by appealing to participants’ own professional identity. We will refer to this condition as the expert condition. In both experimental conditions, the robot also used an additional utterance of Social Proof (Cialdini, Reference Cialdini1987). In our previous study, this message alone did not cause a significant difference in participants’ behaviors, but the personalized version did (cf. Langedijk et al., under review). In the baseline condition, the robot merely instructs the participants to pick up a glass. At the end of the interaction, participants have collected everything they need to set a table and enjoy a snack and drink water while filling out the post-questionnaire (see Figure 13.1).

Figure 13.1 The set table at the end of the experiment after the participants have brought all the relevant objects together.
3.2 Materials and Measures
3.2.1 The Robot
The robot was semi-autonomous, where navigation and speech only needed little manual intervention. The robot moved autonomously from point to point, but the target locations were set by a remote wizard. The wizard also controlled the speech by choosing an utterance from a collection of predefined utterances. The robot is a Turtlebot-platform with an anthropomorphic Styrofoam top as a low-fidelity prototype of a healthcare robot (see Figure 13.2). The robot is around 1.10m in height and has a little smile to signal friendliness. Its design was based on initial design ideas for a service robot for elderly care (cf. Krüger et al., Reference Krüger, Fischer, Manoonpong, Palinko, Bodenhagen, Baumann, Kjærum, Rano, Naik, Juel, Haarslev, Ignasov, Marchetti, Langedijk, Kollakidou, Camillus Jeppesen, Heidtmann and Dalgaard2021).

Figure 13.2 Turtlebot.
3.2.2 Experimental Measures
Participants had to fill out a pre- and a post-experimental questionnaire. In the pre-experimental questionnaire, demographic information and previous experience with robots were elicited. In the post-experimental questionnaire, the participants had to rate the robot and were asked to comment on their experience after the experiment. The questionnaire consists of the verified Robotic Social Attributes Scale (Carpinella et al., Reference Carpinella, Wyman, Perez and Stroessner2017), which investigates to what extent people perceive the robot as social concerning the competence and warmth of the robot and elicits their amount of discomfort during the human-robot interaction experience. Each area contains a collection of adjectives on the basis of which the participants are asked to rate the robot or their experience. Competence includes the adjectives capable, responsive, interactive, reliable, competent, and knowledgeableFootnote 3. Warmth comprises the adjectives organic, happy, feeling, sociable, compassionate, and emotionalFootnote 4. The discomfort scale includes the adjectives scary, strange, awful, awkward, dangerous, and aggressiveFootnote 5. Seven additional questions address participants’ engagement with the robot by means of further adjectives, namely likeable, engaging, credible, persuasive, enthusiastic, boring, and convincingFootnote 6.
All items are rated on a 5-point Likert scale, where 1 corresponds to ‘not at all’ (Danish: slet ikke) and 5 corresponds to ‘very much’ (Danish: rigtig meget). The objective measure of our manipulations concerns water intake; in particular, we measure (in milliliters) how much water is missing from the 1-liter jug and from participants’ glasses after the experiment and thus to what extent people followed the robot’s suggestion to drink more water. In addition, we were interested in participants’ informal comments after they had interacted with the robot, and thus the participants were asked to comment on their experience and encouraged to voice everything that sprang to mind.
3.3 Procedure
The participants were greeted and led into a separate room by the experimenter, where they were asked to fill out the pre-experimental questionnaire and a consent form for the video recording. They were informed that their participation was voluntary and that they could stop the experiment at any point in time. After they gave their consent, they were led into the Living Lab, where the experimenter turned the cameras on and told the participants that the robot would start the experiment itself. The experimenter then left the room, and the robot started by providing some explanation of the goal of the ‘game’. After this, the robot moves to the nearest shelf. On its way, when it passes by Paro robot, a seal-like robot used in dementia care, the robot says “Hi Paro. Paro is my kind colleague.” (“Hej Paro. Paro er min fine kollega.”). On the nearest shelf, the participant is then instructed to find matches in one of the eight boxes. During the search, the robot leads the participant by means of verbal instructions like “You can find them in one of the white containers on the top shelf.” (“Du finder dem i en af de hvide beholdere på den øverste hylde.”).
When the participant has found the matches, the robot turns toward the table and says “The next thing we need to collect is a glass. It is on the table over there.” (“Det næste vi skal indsamle, er et glas. Det star på bordet derover.”). Here, our persuasion manipulation is used. In the baseline condition, the robot says nothing else and waits for the participant to pick up a glass. In the expert condition, the robot says “You as an expert know that it is important to drink enough water. Most participants drink half a liter after this game.” (“Som ekspert ved du jo, at det er vigtigt at drikke rigelig med vand. De fleste deltagere drikker en halv liter efter den her leg.”). In the research condition, the robot says “Research shows that it is important to drink enough water. Most participants drink half a liter after this game.” (“Forskning viser, at det er vigtigt at drikke rigelig med vand. De fleste deltagere drikker en halv liter efter den her leg.”). Then the robot says “Now we need to go to the kitchen and collect more items.” (“Så skal vi hen til køkkenet og samle flere ting ind.”) and moves to the other room, the participant follows. In the kitchen, the participant has to pick up a plate from the dishwasher, then choose between a cookie and some fruit, where the robot then shows its situation awareness and comments on the choice of snack either by saying “Ah okay, you choose a cookie then you do not need a knife.” (“Ah okay, du valgte en cookie, så behøver du ingen kniv.”), or by saying “Ah okay, you choose some fruit then you do need a knife which you can find on the kitchen table a bit further away.” (“Ah okay, du valgte en frugt, så skal du også have en lille kniv. Den finder du lidt længere henne på køkkenbordet.”). If no items have yet been placed on the robot, it offers to carry them for the participants by saying “If you like, you can place everything on top of me.” (“Hvis du vil, er du velkommen til at lægge alt af på mig.”). The next object is a candle, which is hidden in a drawer. Here, the robot again leads the participant through his/her search, for instance by saying “A bit more to the left.” (“Lidt længere til venstre.”).
Then the robot moves out of the kitchen area toward some shelves. Here the participant is instructed to take a napkin and a placemat. When the participant picks up one of the two placemats, the robot again shows its situation awareness by commenting on the color of the placemat, for instance, “Ah the green placemat, that is my favorite too.” (“Ah den grønne dækkeserviet. Det er også min favorit.”)Footnote 7. Finally, the robot moves to the table and instructs the participant to set the table, to have a seat, to take some water and to fill out the post-questionnaire, which is prepared for him/her on a tablet. At the end of the experiment, participants were led out of the Living Lab and debriefed about the wizard controlling the robot. Debriefing is an important aspect in human-robot interaction research as this helps people to understand the real capabilities of robots (Riek, Reference Riek2012).
3.4 Participants
Forty-six employees from the healthcare sector in the local municipality participated in our experiment. They were mainly nurses and professional caregivers in Danish elderly care facilities and were recruited through the employees at the Living Lab, which is run by the municipality in Sonderborg. The mean age of the participants is 42 (SD = 11). In accordance with the gender distribution in the profession, women are highly overrepresented, so that there are only two men among the participants. About 87 percent of the participants have seen a robot either only on TV or seen one once in some other setting. None of the participants had ever worked with a robot before, nor had they seen this particular robot. Most participants were native speakers of Danish (89.1%), but there were also participants whose native languages were German, English, and Oromo (spoken in Ethiopia). Participants were randomly assigned to each condition and distributed as equally as possible between the conditions: in the baseline, the no-persuasion condition, there were N = 16 participants, in the research condition there were N = 16 participants, and in the expert condition there were N = 14 participants.
3.5 Data Analysis
The data were analyzed quantitatively using common statistical tests; specifically, a one-way ANOVA was conducted to determine whether there was a difference between the three conditions (baseline, research, and expert). We conducted post hoc analyzes to analyze the differences between the groups further. We furthermore carried out a qualitative analysis of participants’ comments in the free interview session after the experiments.
4 Results
Our results concern participants’ subjective evaluations of the robot in the post- experimental questionnaire on the one hand and behavioral results in terms of the amount of water intake, namely the extent to which they complied with the robot’s suggestion to drink more, on the other.
4.1 The Subjective Evaluation: Questionnaire Data
Regarding participants’ evaluation of the robot, our one-way ANOVA revealed only statistically significant results for feeling and credible.
The analysis shows that there was a statistically significant difference between the conditions regarding the adjective feeling, F(2, 43) = 5.795, p = .006. Our post hoc comparisons using the Bonferroni correction indicated that the mean score for the expert condition (M = 3.4, SD = 1.3) was significantly different than the baseline condition (M = 2.1, SD = .9), p = .005. This means that participants in the expert condition were more likely to rate the robot to have feelings. Even though it was not significant, participants in the research condition also rated the robot to have feelings (M = 2.5, SD = .9) compared to the baseline condition.
Our analysis also revealed that there was a statistically significant difference in regard to the adjective credible, F(2, 43) = 4.063, p = .024. Our post hoc comparisons using the Bonferroni correction indicated that the mean score for the expert condition (M = 4.5, SD = .8) was significantly higher than the research condition (M = 3.7, SD = .8), p = .026. This means that participants in the expert condition found the robot to be more credible compared to the research condition. The robot in the baseline condition was rated in between both persuasion conditions (M = 4.3, SD = .9).
We did not find any further effects of the interventions in neither the expert condition nor in the research condition.
4.2 The Behavior: Water Intake
We measured the water intake for N = 44 (+2 missing values).Footnote 8 A one-way ANOVA revealed that there was a statistically significant difference in water intake between the conditions, F(2, 41) = 3.993, p = .026.
Post hoc comparisons using the Bonferroni correction indicated that the mean score for the expert condition (M = 73ml, SD = 55,9ml) was significantly different from the baseline condition (M = 20ml, SD = 38.6ml), p = .033. The difference between the baseline and research condition (M = 60ml, SD = 63.9ml) is not statistically significant, but on average, participants in the research condition still drank more than in the baseline condition. We show the water intake in Figure 13.3.

Figure 13.3 Mean water intake measured in ml.
4.3 Qualitative Feedback from Participants
The participants were exclusively professionals working in elderly care. During the experiments, we recorded comments directed at the robot, at us, and at the organizer of the Living Lab, who had invited the participants and who served as their main contact person.
In our thematic analysis, four categories emerged: comments on the experience in general, comments that bridge between the experience and the robot, comments on the robot design and its behavior, and finally comments on the robot as a persuasive agent. First, we received much positive feedback on the experience in general:
“Shut up, this was fun!”, “Utmost exciting”, and “Very funny”
“Food for thought”
“Very funny – much more of these kinds of tests and such”
These comments show how the participants are open to these types of experiments and to being part of the research and to the development of these types of technologies. This is also the general impression of the participants which we got during our tests.
We also had several comments that bridge between the experience and the robot design, namely comments that focus on both the robot and the human-robot interaction experience in general:
“It was sweet, it was funny [where ‘it’ refers to the robot] – in general it [where ‘it’ refers to the experiment] was just really exciting”
“This was exciting – but its dialog was slow so I waited because there could have come some more”
“While the robot guided me around, I was wondering ‘Should I interact with it and answer it, or should I just let it talk?’”
The last two comments reveal some insecurities among our participants. The response-time was not as fast as normal human-human interaction which means that participants who were not familiar with human-robot interactions needed to get used to these types of interactions. During the interactions, participants did not know that a wizard was controlling the utterances of the robot and what the robot was able to say. The wizard was presented to them after they finished the experiment (i.e., in the debriefing).
Furthermore, we have a category called robot design and behavior in which we have the following comments:
“It is sweet”
“A peculiar little fellow (pause) ahj, a nice fellow”
“It was less robotic because it was aware of the color of the placemat I took. It was really good because it made it more human”
The last comment shows that the participant did not assume that a wizard was controlling the robot, and therefore, also underlines the importance of debriefing (see Section 5). At the same time, this comment also shows the effects of the indicators of situation awareness that the robot used. This comment aligns with the statistical results of the questionnaire item feeling.
Within this category, we also have a few critical comments, for instance about the robot’s appearance:
“It is a robot, which stares at one”
This response concerns the design of the robot which is a low-fidelity prototype. The eyes were made out of bottle caps with some white paper for the sclera and black paper for eyebrows and the mouth. Low-fidelity prototypes are normal in experiments like this one; however, we may have been clearer about the state of the robot development so far.
Another critical comment concerned the robot’s behavior:
“When it began turning around itself, I thought it had lost and/or forgot it”
Sometimes the robot just started turning in circles. This happened accidentally and could not be stopped. We did not abort the experiment but continued when the robot was responding to the actions again. We are not sure what “lost/forgot it” was referring to. However, we assume “it” was referring to the previous dialog.
Within the last category, comments concern the role of the robot as a persuasive agent directly:
“It should say ‘drink something’ – it will not be enough to just say ‘take something’”
“I had to wait and follow more instead of leading. It is kind of the opposite of what we are used to do.”
These comments reflect the perspective of healthcare workers and are therefore also really helpful for researchers in the healthcare domain.
Another language-related comment concerns a near-homophone in Danish:
“Did it say higher [Danish: højere] or right [Danish: højre]?”
This is an important point for the development of Danish dialog systems.
To sum up, the qualitative analysis of participants’ comments shows that the experiments were perceived as exciting and engaging by the participants and that the experiment gave them an impression of where the research is heading. Participants confirmed that a robot would be a useful tool or helper and not necessarily as something that would take away their jobs. Their comments also suggest that Danish elderly care professionals are generally open to robots in elderly care and see them as interesting and useful additions to their work once they have experienced them – just like Korn and Zallio (Reference Korn and Zallio2022) show in their research. Furthermore, the analysis also shows that participants take all kinds of robot behaviors into account, suggesting that robot behavior design plays a crucial role for the resulting interactions.
5 Discussion
This study investigated the effects of appeals to expertise in a human-robot interaction context. This strategy was shown to have behavioral effects, such that participants’ water intake increased when the robot addresses the participant as expert or when the robot makes a reference to research findings.
The study shows two significant results concerning participants’ perception of the robot, such that the robot was understood to have feelings and to be more credible in the expert condition compared to the robot in the baseline condition. However, we also found a difference between the persuasion conditions, and the results confirm that a reference to expertise is especially persuasive when addressing users as experts. Thus, if the robot appeals to the expertise of the communication partner, it proves to be more persuasive. Nevertheless, also a reminder of scientific results proved effective to some extent.
In contrast, participants’ subjective evaluations were not influenced by the two persuasive utterances very much; out of 25 adjectives, only two were rated statistically significantly different. Our observations of the participants while filling out the questionnaire suggest that this may have been affected by their unfamiliarity with responding to questionnaires and the rating of perceptual effects, as evident from their many clarification questions while they were filling out the questionnaire.
The participants’ comments showed the importance of debriefing, especially when researchers work with actual end-users. Most of the participants assumed that the robot was autonomous, which may lead to unrealistic expectations about robot capabilities, which could be corrected during the debriefing.
In general, we found that our participants have a high interest in robotic technology; this may be due to the fact that our participants are healthcare workers, and the robot was designed to be employed in a healthcare context. Our participants saw the added value of such a robot and gave input to improve our research. To ask participants for a brief comment about their experience after an experiment is therefore highly recommended when doing experimental human-robot interaction studies (see also Lee et al., Reference Lee, Cheon, Lim and Fischer2022).
Concerning persuasion, our results are promising and suggest that the robots’ verbal behaviors may have significant effects on robot persuasiveness. We found clear behavioral effects based on only a single utterance in an interaction that took about 10 minutes. Three minutes into the experiment, the robot utters the persuasive message and only at the end of the experiment, participants had the opportunity to fill their glasses with water and the possibility to drink. As described above, we measured how much water was missing from both the glass and the jug and thus estimated how much participants actually drank. Therefore, we argue that we do not use coercion to get participants to behave in a certain way. The robot rather advises the participants to drink enough water during the day. Water-drinking is a healthy behavior and in a person’s own interest (cf. Thaler, Reference Thaler2018; Thaler & Sunstein, Reference Thaler and Sunstein2009). It is also not a direct offer but rather a subtle reminder, which means that the interactional design plays an important role in the persuasiveness (see also Humă, this volume). Again, as surprising as it might be, one single utterance in an interaction of several minutes leads to considerable differences in participants’ behavior. Furthermore, participants were recruited by the employee of the Living Lab and participated voluntarily. They were informed that they had the opportunity to stop at any point in time. This was made clear to the participants before starting the experiment, and while they were filling out the consent form. Based on this and the largely positive reactions after the experiment, we assume that participants perceived the experiment as a form of game and a possibility to engage with a robot.
A possible limitation of our study may be the effect of the second sentence in our persuasion conditions. In both persuasion conditions, the robot added “Most participants drink half a liter after this game” which makes use of social proof (Cialdini, Reference Cialdini1987), which we have found to influence people to some extent in an earlier study (see Langedijk et al., under review), however not significantly so. Yet, the utterance is used in both persuasion conditions, and while it thus may have enforced the effect of the two persuasion conditions compared to the baseline, it cannot have influenced the differences identified between the persuasion conditions.
Another possible limitation may be that we have only two male participants, so that our results do not represent the whole population; on the other hand, our sample reflects the gender distribution in elderly care so that our study represents the real world in healthcare institutions.
Our clear results that show a persuasive effect of appeals to expertise are in contrast with the results by Winkle et al. (Reference Winkle, Lemaignan, Caleb-Solly, Leonards, Turton and Bremner2019). This can have several reasons; for instance, our participants actually were experts in the field of healthcare, and thus the expert intervention was credible; on the other hand, even in the research condition, the strategy proved effective. Furthermore, in the study by Winkle et al. (Reference Winkle, Lemaignan, Caleb-Solly, Leonards, Turton and Bremner2019), expertise was presented implicitly by informing participants that the robot was programmed by an expert. This could be interfered with if people understood the robot as a social actor (Nass & Moon, Reference Nass and Moon2000). That is, if people did not regard the robot to be an artefact whose behavior was implemented by programmers, but rather as an autonomous social actor in its own right, the information about the expert being involved in the programming of the robot may have been perceived as too indirect and too inconsequential. In contrast, in our study, we used a direct appeal to research findings in the research condition and an appeal to participants’ own professional expertise in the expert condition. These two strategies are independent of how people view the robot.
6 Conclusion and Future Work
In this chapter, we explored the effectiveness of appealing to the participants’ expertise and to researchers’ expertise on how persuasive a robot is when suggesting a certain behavior, in our case drinking water. Thus, even though a robot is a technical artefact, and in our case even a low-fidelity prototype of a healthcare robot, it can persuade people successfully into a particular behavior. Furthermore, our study has shown that on top of successfully guiding people into drinking more water, participants also attributed certain human characteristics to the robot. If the robot is understood as human-like, one can expect that similar persuasive strategies may be as effective as in human interaction. On the other hand, participants’ comments show little anthropomorphizing of the robot and its behavior, which may suggest that the persuasive utterances used are effectively independent of their source – human or robot. We will have to leave it to future work to disentangle these two possible causes.










