To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This concluding chapter argues that language is a first-order driver of economic behaviour and outlines where the research should go next. It extends the LENS framework beyond one-shot decisions to strategic settings shaped by beliefs, and outlines the co-evolution between language and behaviour. Large language models are proposed as virtual laboratories, while a quantitative utility approach must accommodate multidimensional, non-linear emotions and norms, and expand to visual cues (VENS). The chapter highlights applications – from policy design to norm-sensitive AI – alongside serious risks of manipulation, surveillance, and bias. It closes with a call for transparent, ethically governed models that explain and responsibly influence decisions.
This chapter argues that language matters for economic decisions and that modern large language models (LLMs) can quantify this effect. After outlining the limits of lexicon-based tools, it examines BERT and MoralBERT, showing that generic sentiment scores struggle to predict human behaviour, while adding moral dimensions helps but the results remain imperfect. LLM-based chatbots (e.g., GPT-4) enable context-sensitive sentiment estimates that predict framing effects, particularly in Dictator Games. Building on this, the chapter formalises language-based utility functions that combine payoffs with sentiment or moral polarity and derives testable predictions. Evidence across Dictator, Equity–Efficiency, and Bribery games supports the approach, while highlighting caveats and aveThe chapter highlights applicationsnues for refinement.
The development of artificial intelligence and machine learning is leading to a revolution in the way we think about economic decisions. The Economics of Language explores how the use of generative AI and large language models (LLMs) can transform the way we think about economic behaviour. It introduces the LENS framework (Linguistic content triggers Emotions and suggests Norms, which shape Strategy choice) and presents empirical evidence that LLMs can predict human behaviour in economic games more accurately than traditional outcome-based models. It draws on years of research to provide a step-by-step development of the theory, combining accessible examples with formal modelling. Offering a roadmap for future research at the intersection of economics, psychology, and AI, this book equips readers with tools to quantify the role of language in decision-making and redefines how we think about utility, rationality, and human choice.
Large Language Models (LLMs) have the potential to profoundly transform and enrich experimental economic research. We propose a new software framework, “alter_ego”, which makes it easy to design experiments between LLMs and to integrate LLMs into oTree-based experiments with human subjects. Our toolkit is freely available at github.com/mrpg/ego. To illustrate, we run differently framed prisoner’s dilemmas with interacting machines as well as with human-machine interaction. Framing effects in machine-only treatments are strong and similar to those expected from previous human-only experiments, yet less pronounced and qualitatively different if machines interact with human participants.
Chapter 5 addresses yet another aspect of word meanings. Back in the mid-twentieth century, the linguist J. R. Firth (1957, p. 11) stated that “you shall know a word by the company it keeps.” More recently, this idea has been supported by distributional semantic models (DSMs), which come from computational linguistics and demonstrate that a word’s meaning can in fact be derived partly from its statistical co-occurrence patterns with other words. For instance, part of the meaning of scissors can be derived from its tendency to be used together with certain other words like sharp, pointy, cut, snip, paper, hair, etc. DSMs are surprisingly good at predicting people’s performance on many (although not all) conceptual tasks, and they are now so sophisticated that they constitute the engines of many chatbots and AI systems. What’s more, by combining DSMs with brain mapping methods, a rapidly growing line of research has been accumulating evidence that the distributionally based properties of word meanings are not only captured by purely verbal representations in the core language network, but enable a “quick and dirty” shortcut to comprehension.
Chapter 1 introduces basic terminology. Terms such as artificial intelligence, data, algorithm, machine learning, neural networks, deep learning, large language models, generative AI and symbolic AI are presented to develop a sense of what AI is, how it has evolved, and what it does. This chapter also introduces some of the major conceptual disagreements in the field. Different ideas about how to develop AI in the best way drive disagreements, as well as philosophical differences over what intelligence means and whether machines can develop human-like intelligence.
Emphasizing how and why machine learning algorithms work, this introductory textbook bridges the gap between the theoretical foundations of machine learning and its practical algorithmic and code-level implementation. Over 85 thorough worked examples, in both Matlab and Python, demonstrate how algorithms are implemented and applied whilst illustrating the end result. Over 75 end-of-chapter problems empower students to develop their own code to implement these algorithms, equipping them with hands-on experience. Matlab coding examples demonstrate how a mathematical idea is converted from equations to code, and provide a jumping off point for students, supported by in-depth coverage of essential mathematics including multivariable calculus, linear algebra, probability and statistics, numerical methods, and optimization. Accompanied online by instructor lecture slides, downloadable Python code and additional appendices, this is an excellent introduction to machine learning for senior undergraduate and graduate students in Engineering and Computer Science.
Creative thinking is a crucial step in the design ideation process, where analogical reasoning plays a vital role in expanding the design concept space. The emergence of Generative AI has brought a significant revolution in co-creative systems, with a growing number of studies on Design-by-Analogy support tools. However, there is a lack of studies investigating the creative performance of Large Language Model (LLM)-generated analogical content and benchmarking of language models in creative tasks such as design ideation. Through this study, we aim to (i) investigate the effect of creativity heuristics by leveraging LLMs to generate analogical stimuli for novice designers in ideation tasks and (ii) evaluate and benchmark language models across analogical creative tasks. We developed a support tool based on the proposed conceptual framework and validated it by conducting controlled ideation experiments with 24 undergraduate design students. Groups assisted with the support tool generated higher-rated ideas, thus validating the proposed framework and the effectiveness of analogical reasoning for augmenting creative output with LLMs. Benchmarking of the models revealed significant differences in the creative performance of analogies across various language models, suggesting that future studies should focus on evaluating language models across creative, subjective tasks.
Efforts to curb online hate speech depend on our ability to reliably detect it at scale. Previous studies have highlighted the strong zero-shot classification performance of large language models (LLMs), offering a potential tool to efficiently identify harmful content. Yet for complex and ambivalent tasks like hate speech detection, pre-trained LLMs can be insufficient and carry systemic biases. Domain-specific models fine-tuned for the given task and empirical context could help address these issues, but, as we demonstrate, the quality of data used for fine-tuning decisively matters. In this study, we fine-tuned GPT-4o-mini using a unique corpus of online comments annotated by diverse groups of coders with varying annotation quality: research assistants, activists, two kinds of crowd workers, and citizen scientists. We find that only annotations from those groups of annotators that are better than zero-shot GPT-4o-mini in recognizing hate speech improve the classification performance of the fine-tuned LLM. Specifically, fine-tuning using the highest-quality annotator group – trained research assistants – boosts classification performance by increasing the model’s precision without notably sacrificing the good recall of zero-shot GPT-4o-mini. In contrast, lower-quality annotations do not improve and may even decrease the ability to identify hate speech. By examining tasks reliant on human judgment and context, we offer insights that go beyond hate speech detection.
As people increasingly interact with large language models (LLMs), a critical question emerges: do humans process language differently when communicating with an LLM versus another human? While there is good evidence that people adapt comprehension based on their expectations toward their interlocutor in human–human interaction, human–computer interaction research suggests the adaptation to machines is often suspended until expectation violation occurs. We conducted two event-related potential experiments examining Chinese sentence comprehension, measuring neural responses to semantic and syntactic anomalies attributed to an LLM or a human. Experiment 1 revealed reduced N400 but larger P600 responses to semantic anomalies in LLM-attributed text than human-attributed one, suggesting participants anticipated semantic errors yet required increased composition/integration efforts. Experiment 2 showed enhanced P600 responses to LLM-attributed than human-attributed syntactic anomalies, reflecting greater reanalysis or integration difficulty in the former than in the latter. Notably, neural responses to LLM-attributed semantic anomalies (but not syntactic anomalies) were further modulated by participants’ belief about humanlike knowledge in LLMs, with a larger N400 and a smaller P600 in participants with stronger belief of humanlike knowledge in LLMs. These findings provide the first neurocognitive evidence that people develop mental models of LLM capabilities and adapt neural processing accordingly, offering theoretical insights aligned with multidisciplinary frameworks and practical implications for designing effective human–AI communication systems.
The use of Artificial Intelligence (AI) in Health Technology Assessment (HTA) activities presents an opportunity to enhance the efficiency, accuracy, and speed of HTA processes worldwide. However, the adoption of AI tools in HTA comes with diverse challenges and concerns that must be carefully managed to ensure their responsible, ethical, and effective deployment. The 2025 Health Technology Assessment international Global Policy Forum (GPF) informed GPF members of the integration of AI into HTA activities, with a particular focus on the use of Generative AI (GenAI). With the overarching goal of illuminating and inspiring tangible outputs and actionable recommendations, the event brought together a diverse range of interest holders to explore the opportunities and challenges of AI in HTA. This article summarizes the key discussions and themes that informed the GPF outcomes, including trust, human agency, and risk-based approaches, culminating in a proposed set of priority next steps for the HTA community regarding the integration of GenAI. It also highlights insights into the current state of digital transformation within HTA organizations and the life sciences industry, providing insights into where the field stands and where it is heading.
Artificial Intelligence is an area of law where legal frameworks are still in early stages. The chapter discusses some of the core HCI-related concerns with AI, including deepfakes, bias and discrimination, and concepts within AI and intellectual property including AI infringement and AI protection
This chapter defines data-intensive research in the context of the English language and explores its prospects. It argues that data intensiveness extends beyond a single digital method or the use of advanced statistical tools; rather it encompasses a broader transformation and fuller integration of digital tools and methods throughout the research process. We also address the potential pitfalls of data fetishism and over-reliance on data, and we draw parallels with the digital transformation in another discipline, specifically biosciences, to illustrate the fundamental changes proposed as a result of digitalization. The lessons learned from other fields underscore the need for increased multi- and interdisciplinary collaboration and the development of broader digital infrastructures. This includes investments in enhanced computing power, robust data management processes, and a greater emphasis on replicability and transparency in reporting methods, data, and analytical techniques.
Sentiment analysis and stance detection are key tasks in text analysis, with applications ranging from understanding political opinions to tracking policy positions. Recent advances in large language models (LLMs) offer significant potential to enhance sentiment analysis techniques and to evolve them into the more nuanced task of detecting stances expressed toward specific subjects. In this study, we evaluate lexicon-based models, supervised models, and LLMs for stance detection using two corpuses of social media data—a large corpus of tweets posted by members of the U.S. Congress on Twitter and a smaller sample of tweets from general users—which both focus on opinions concerning presidential candidates during the 2020 election. We consider several fine-tuning strategies to improve performance—including cross-target tuning using an assumption of congressmembers’ stance based on party affiliation—and strategies for fine-tuning LLMs, including few shot and chain-of-thought prompting. Our findings demonstrate that: 1) LLMs can distinguish stance on a specific target even when multiple subjects are mentioned, 2) tuning leads to notable improvements over pretrained models, 3) cross-target tuning can provide a viable alternative to in-target tuning in some settings, and 4) complex prompting strategies lead to improvements over pretrained models but underperform tuning approaches.
Social scientists have quickly adopted large language models (LLMs) for their ability to annotate documents without supervised training, an ability known as zero-shot classification. However, due to their computational demands, cost, and often proprietary nature, these models are frequently at odds with open science standards. This article introduces the Political Domain Enhanced BERT-based Algorithm for Textual Entailment (DEBATE) language models: Foundation models for zero-shot, few-shot, and supervised classification of political documents. As zero-shot classifiers, the models are designed to be used for common, well-defined tasks, such as topic and opinion classification. When used in this context, the DEBATE models are not only as good as state-of-the-art LLMs at zero-shot classification, but are orders of magnitude more efficient and completely open source. We further demonstrate that the models are effective few-shot learners. With a simple random sample of 10–25 documents, they can outperform supervised classifiers trained on hundreds or thousands of documents and state-of-the-art generative models. Additionally, we release the PolNLI dataset used to train these models—a corpus of over 200,000 political documents with highly accurate labels across over 800 classification tasks.
Large language models (LLMs) are increasingly used to address real-world design problems, especially during the design ideation phase. Although LLMs hold substantial promise for concept generation, the understanding of how they can effectively assist designers in enhancing the diversity of design concepts is still limited. In this study, we set up different strategies for prompting multiple professional personas to the LLM for design concept generation, including (1) multiple prompts for concept generation in parallel, each with a professional persona, (2) a single prompt for concept generation with multiple professional personas, and (3) a sequence of prompts for concept generation and update, each with a professional persona. We formulate and test several hypotheses on the effectiveness of different strategies. All hypotheses are tested by constructing professional knowledge bases, selecting design problems and personas, and designing the prompts. The results suggest that LLMs can facilitate the design ideation process and provide more diverse design concepts when they are given multiple prompts in parallel, each with a professional persona, or given a sequence of prompts with multiple professional personas to generate and update design concepts gradually.
Autoregressive language models generate text by predicting the next word from the preceding context. The regularities internalized from specific training data make this mechanism a useful proxy for historically situated readerly expectations, reflecting what earlier linguistic communities would find probable or meaningful. In this article, I pre-train a GPT model (223M parameters) on a broad corpus of Chinese texts (FineWeb Edu Chinese V2.1) and fine-tune it on the collected writings of Mao Zedong (1893–1976) to simulate the evolving linguistic landscape of post-1949 China. Identifying token sequences with the sharpest drops in perplexity – a measure of the model’s surprise – reveals the core phraseology of “Maospeak,” the militant language style that developed from Mao’s writings and pronouncements. A comparative analysis of modern Chinese fiction demonstrates how literature becomes unfamiliar to the fine-tuned model, generating perplexity spikes of increasing magnitude. The findings suggest a mechanism of attentional control: whereas propaganda backgrounds meaning through repetition (cognitive overfitting), literature foregrounds it through deviation (non-anomalous surprise). By visualizing token sequences as perplexity landscapes with peaks and valleys, the article reconceives style as a probabilistic phenomenon and showcases the potential of “cognitive stylometry” for literary theory and close reading .
The volumes of historical data locked behind unstructured formats have long been a challenge for researchers in the computational humanities. While optical character recognition (OCR) and natural language processing have enabled large-scale text mining projects, the irregular formatting, inconsistent terminology and evolving printing practices complicate automated parsing and information extraction efforts for historical documents. This study explores the potential of large language models (LLMs) in processing and structuring irregular and non-standardized historical materials, using the U.S. Department of Agriculture’s Plant Inventory books (1898–2008) as a test case. Given the frequent evolution of these historical records, we implemented a pipeline combining OCR, custom segmentation rules and LLMs to extract structured data from the scanned texts. It provides an example of how incorporating LLMs into data-processing pipelines can enhance the accessibility and usability of historical and archival materials for scholars.
This study investigates the English of-NP (noun phrase) evaluation construction (e.g., It’s nice of you to help me plan this wedding), hypothesizing that its constructional meaning encodes socially mediated evaluation and imposes semantic constraints on the NP slot. We adopt a dual methodological approach, combining collostructional analysis to identify lexeme–construction associations with surprisal analysis using a large language model (LLM) (GPT-2) to assess predictive processing difficulty. The two methods complement each other, capturing both static distributional patterns and dynamic expectancy profiles. Three experimental manipulations were implemented: preposition alternation, variation in NP agentivity and variation in NP intentionality. Results show that NPs conforming to the hypothesized slot constraints yield lower surprisal values, whereas constraint-violating NPs trigger higher surprisal, aligning with the observed collostructional strengths. These findings provide empirical support for the view that constructional compatibility shapes predictive processing and contributes to integrating Construction Grammar (CxG) with prediction-based models of language processing.
The conceptual design of mission-tailored aircraft is increasingly shifting towards system of systems (SoS) perspectives that account for system interactions using a holistic view. Agent-based modelling and simulation (ABMS) is a common approach for analysing an SoS, but the behaviour of its agents tends to be defined by rigid behaviour trees. The present work aims to evaluate the suitability of a prompt-engineered large language model (LLM) acting as the Incident Commander (IC), replacing the fixed behaviour trees that govern the agents’ decisions. The research contributes by developing a prompting framework for operational guidelines, constraints, and priorities to obtain an LLM commander within a wildfire suppression, SoS capable of replicating human decisions. By enabling agents in a simulation model with decision-making capabilities closer to those expected from humans, the commander’s decisions and potential emergent patterns can be translated into more defined requirements for aircraft conceptual design (ACD) (e.g., endurance, payload, sensors, communications, or turnaround requirements). Results showed that an LLM commander facilitated adaptive and context-aware decisions that can be analysed via decision logs. The results allow designers to derive aircraft requirements for their specific roles from operational outcomes rather than a priori assumptions, linking SoS mission needs and ACD parameters.