To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Emphasizing how and why machine learning algorithms work, this introductory textbook bridges the gap between the theoretical foundations of machine learning and its practical algorithmic and code-level implementation. Over 85 thorough worked examples, in both Matlab and Python, demonstrate how algorithms are implemented and applied whilst illustrating the end result. Over 75 end-of-chapter problems empower students to develop their own code to implement these algorithms, equipping them with hands-on experience. Matlab coding examples demonstrate how a mathematical idea is converted from equations to code, and provide a jumping off point for students, supported by in-depth coverage of essential mathematics including multivariable calculus, linear algebra, probability and statistics, numerical methods, and optimization. Accompanied online by instructor lecture slides, downloadable Python code and additional appendices, this is an excellent introduction to machine learning for senior undergraduate and graduate students in Engineering and Computer Science.
Creative thinking is a crucial step in the design ideation process, where analogical reasoning plays a vital role in expanding the design concept space. The emergence of Generative AI has brought a significant revolution in co-creative systems, with a growing number of studies on Design-by-Analogy support tools. However, there is a lack of studies investigating the creative performance of Large Language Model (LLM)-generated analogical content and benchmarking of language models in creative tasks such as design ideation. Through this study, we aim to (i) investigate the effect of creativity heuristics by leveraging LLMs to generate analogical stimuli for novice designers in ideation tasks and (ii) evaluate and benchmark language models across analogical creative tasks. We developed a support tool based on the proposed conceptual framework and validated it by conducting controlled ideation experiments with 24 undergraduate design students. Groups assisted with the support tool generated higher-rated ideas, thus validating the proposed framework and the effectiveness of analogical reasoning for augmenting creative output with LLMs. Benchmarking of the models revealed significant differences in the creative performance of analogies across various language models, suggesting that future studies should focus on evaluating language models across creative, subjective tasks.
Efforts to curb online hate speech depend on our ability to reliably detect it at scale. Previous studies have highlighted the strong zero-shot classification performance of large language models (LLMs), offering a potential tool to efficiently identify harmful content. Yet for complex and ambivalent tasks like hate speech detection, pre-trained LLMs can be insufficient and carry systemic biases. Domain-specific models fine-tuned for the given task and empirical context could help address these issues, but, as we demonstrate, the quality of data used for fine-tuning decisively matters. In this study, we fine-tuned GPT-4o-mini using a unique corpus of online comments annotated by diverse groups of coders with varying annotation quality: research assistants, activists, two kinds of crowd workers, and citizen scientists. We find that only annotations from those groups of annotators that are better than zero-shot GPT-4o-mini in recognizing hate speech improve the classification performance of the fine-tuned LLM. Specifically, fine-tuning using the highest-quality annotator group – trained research assistants – boosts classification performance by increasing the model’s precision without notably sacrificing the good recall of zero-shot GPT-4o-mini. In contrast, lower-quality annotations do not improve and may even decrease the ability to identify hate speech. By examining tasks reliant on human judgment and context, we offer insights that go beyond hate speech detection.
As people increasingly interact with large language models (LLMs), a critical question emerges: do humans process language differently when communicating with an LLM versus another human? While there is good evidence that people adapt comprehension based on their expectations toward their interlocutor in human–human interaction, human–computer interaction research suggests the adaptation to machines is often suspended until expectation violation occurs. We conducted two event-related potential experiments examining Chinese sentence comprehension, measuring neural responses to semantic and syntactic anomalies attributed to an LLM or a human. Experiment 1 revealed reduced N400 but larger P600 responses to semantic anomalies in LLM-attributed text than human-attributed one, suggesting participants anticipated semantic errors yet required increased composition/integration efforts. Experiment 2 showed enhanced P600 responses to LLM-attributed than human-attributed syntactic anomalies, reflecting greater reanalysis or integration difficulty in the former than in the latter. Notably, neural responses to LLM-attributed semantic anomalies (but not syntactic anomalies) were further modulated by participants’ belief about humanlike knowledge in LLMs, with a larger N400 and a smaller P600 in participants with stronger belief of humanlike knowledge in LLMs. These findings provide the first neurocognitive evidence that people develop mental models of LLM capabilities and adapt neural processing accordingly, offering theoretical insights aligned with multidisciplinary frameworks and practical implications for designing effective human–AI communication systems.
The use of Artificial Intelligence (AI) in Health Technology Assessment (HTA) activities presents an opportunity to enhance the efficiency, accuracy, and speed of HTA processes worldwide. However, the adoption of AI tools in HTA comes with diverse challenges and concerns that must be carefully managed to ensure their responsible, ethical, and effective deployment. The 2025 Health Technology Assessment international Global Policy Forum (GPF) informed GPF members of the integration of AI into HTA activities, with a particular focus on the use of Generative AI (GenAI). With the overarching goal of illuminating and inspiring tangible outputs and actionable recommendations, the event brought together a diverse range of interest holders to explore the opportunities and challenges of AI in HTA. This article summarizes the key discussions and themes that informed the GPF outcomes, including trust, human agency, and risk-based approaches, culminating in a proposed set of priority next steps for the HTA community regarding the integration of GenAI. It also highlights insights into the current state of digital transformation within HTA organizations and the life sciences industry, providing insights into where the field stands and where it is heading.
Artificial Intelligence is an area of law where legal frameworks are still in early stages. The chapter discusses some of the core HCI-related concerns with AI, including deepfakes, bias and discrimination, and concepts within AI and intellectual property including AI infringement and AI protection
This chapter defines data-intensive research in the context of the English language and explores its prospects. It argues that data intensiveness extends beyond a single digital method or the use of advanced statistical tools; rather it encompasses a broader transformation and fuller integration of digital tools and methods throughout the research process. We also address the potential pitfalls of data fetishism and over-reliance on data, and we draw parallels with the digital transformation in another discipline, specifically biosciences, to illustrate the fundamental changes proposed as a result of digitalization. The lessons learned from other fields underscore the need for increased multi- and interdisciplinary collaboration and the development of broader digital infrastructures. This includes investments in enhanced computing power, robust data management processes, and a greater emphasis on replicability and transparency in reporting methods, data, and analytical techniques.
Sentiment analysis and stance detection are key tasks in text analysis, with applications ranging from understanding political opinions to tracking policy positions. Recent advances in large language models (LLMs) offer significant potential to enhance sentiment analysis techniques and to evolve them into the more nuanced task of detecting stances expressed toward specific subjects. In this study, we evaluate lexicon-based models, supervised models, and LLMs for stance detection using two corpuses of social media data—a large corpus of tweets posted by members of the U.S. Congress on Twitter and a smaller sample of tweets from general users—which both focus on opinions concerning presidential candidates during the 2020 election. We consider several fine-tuning strategies to improve performance—including cross-target tuning using an assumption of congressmembers’ stance based on party affiliation—and strategies for fine-tuning LLMs, including few shot and chain-of-thought prompting. Our findings demonstrate that: 1) LLMs can distinguish stance on a specific target even when multiple subjects are mentioned, 2) tuning leads to notable improvements over pretrained models, 3) cross-target tuning can provide a viable alternative to in-target tuning in some settings, and 4) complex prompting strategies lead to improvements over pretrained models but underperform tuning approaches.
Social scientists have quickly adopted large language models (LLMs) for their ability to annotate documents without supervised training, an ability known as zero-shot classification. However, due to their computational demands, cost, and often proprietary nature, these models are frequently at odds with open science standards. This article introduces the Political Domain Enhanced BERT-based Algorithm for Textual Entailment (DEBATE) language models: Foundation models for zero-shot, few-shot, and supervised classification of political documents. As zero-shot classifiers, the models are designed to be used for common, well-defined tasks, such as topic and opinion classification. When used in this context, the DEBATE models are not only as good as state-of-the-art LLMs at zero-shot classification, but are orders of magnitude more efficient and completely open source. We further demonstrate that the models are effective few-shot learners. With a simple random sample of 10–25 documents, they can outperform supervised classifiers trained on hundreds or thousands of documents and state-of-the-art generative models. Additionally, we release the PolNLI dataset used to train these models—a corpus of over 200,000 political documents with highly accurate labels across over 800 classification tasks.
Large language models (LLMs) are increasingly used to address real-world design problems, especially during the design ideation phase. Although LLMs hold substantial promise for concept generation, the understanding of how they can effectively assist designers in enhancing the diversity of design concepts is still limited. In this study, we set up different strategies for prompting multiple professional personas to the LLM for design concept generation, including (1) multiple prompts for concept generation in parallel, each with a professional persona, (2) a single prompt for concept generation with multiple professional personas, and (3) a sequence of prompts for concept generation and update, each with a professional persona. We formulate and test several hypotheses on the effectiveness of different strategies. All hypotheses are tested by constructing professional knowledge bases, selecting design problems and personas, and designing the prompts. The results suggest that LLMs can facilitate the design ideation process and provide more diverse design concepts when they are given multiple prompts in parallel, each with a professional persona, or given a sequence of prompts with multiple professional personas to generate and update design concepts gradually.
Autoregressive language models generate text by predicting the next word from the preceding context. The regularities internalized from specific training data make this mechanism a useful proxy for historically situated readerly expectations, reflecting what earlier linguistic communities would find probable or meaningful. In this article, I pre-train a GPT model (223M parameters) on a broad corpus of Chinese texts (FineWeb Edu Chinese V2.1) and fine-tune it on the collected writings of Mao Zedong (1893–1976) to simulate the evolving linguistic landscape of post-1949 China. Identifying token sequences with the sharpest drops in perplexity – a measure of the model’s surprise – reveals the core phraseology of “Maospeak,” the militant language style that developed from Mao’s writings and pronouncements. A comparative analysis of modern Chinese fiction demonstrates how literature becomes unfamiliar to the fine-tuned model, generating perplexity spikes of increasing magnitude. The findings suggest a mechanism of attentional control: whereas propaganda backgrounds meaning through repetition (cognitive overfitting), literature foregrounds it through deviation (non-anomalous surprise). By visualizing token sequences as perplexity landscapes with peaks and valleys, the article reconceives style as a probabilistic phenomenon and showcases the potential of “cognitive stylometry” for literary theory and close reading .
The volumes of historical data locked behind unstructured formats have long been a challenge for researchers in the computational humanities. While optical character recognition (OCR) and natural language processing have enabled large-scale text mining projects, the irregular formatting, inconsistent terminology and evolving printing practices complicate automated parsing and information extraction efforts for historical documents. This study explores the potential of large language models (LLMs) in processing and structuring irregular and non-standardized historical materials, using the U.S. Department of Agriculture’s Plant Inventory books (1898–2008) as a test case. Given the frequent evolution of these historical records, we implemented a pipeline combining OCR, custom segmentation rules and LLMs to extract structured data from the scanned texts. It provides an example of how incorporating LLMs into data-processing pipelines can enhance the accessibility and usability of historical and archival materials for scholars.
This study investigates the English of-NP (noun phrase) evaluation construction (e.g., It’s nice of you to help me plan this wedding), hypothesizing that its constructional meaning encodes socially mediated evaluation and imposes semantic constraints on the NP slot. We adopt a dual methodological approach, combining collostructional analysis to identify lexeme–construction associations with surprisal analysis using a large language model (LLM) (GPT-2) to assess predictive processing difficulty. The two methods complement each other, capturing both static distributional patterns and dynamic expectancy profiles. Three experimental manipulations were implemented: preposition alternation, variation in NP agentivity and variation in NP intentionality. Results show that NPs conforming to the hypothesized slot constraints yield lower surprisal values, whereas constraint-violating NPs trigger higher surprisal, aligning with the observed collostructional strengths. These findings provide empirical support for the view that constructional compatibility shapes predictive processing and contributes to integrating Construction Grammar (CxG) with prediction-based models of language processing.
The conceptual design of mission-tailored aircraft is increasingly shifting towards system of systems (SoS) perspectives that account for system interactions using a holistic view. Agent-based modelling and simulation (ABMS) is a common approach for analysing an SoS, but the behaviour of its agents tends to be defined by rigid behaviour trees. The present work aims to evaluate the suitability of a prompt-engineered large language model (LLM) acting as the Incident Commander (IC), replacing the fixed behaviour trees that govern the agents’ decisions. The research contributes by developing a prompting framework for operational guidelines, constraints, and priorities to obtain an LLM commander within a wildfire suppression, SoS capable of replicating human decisions. By enabling agents in a simulation model with decision-making capabilities closer to those expected from humans, the commander’s decisions and potential emergent patterns can be translated into more defined requirements for aircraft conceptual design (ACD) (e.g., endurance, payload, sensors, communications, or turnaround requirements). Results showed that an LLM commander facilitated adaptive and context-aware decisions that can be analysed via decision logs. The results allow designers to derive aircraft requirements for their specific roles from operational outcomes rather than a priori assumptions, linking SoS mission needs and ACD parameters.
The exploration and retrieval of information from large, unstructured document collections remain challenging. Unsupervised techniques, such as clustering and topic modeling, provide only a coarse overview of thematic structure, while traditional keyword searches often require extensive manual effort. Recent advances in large language models and retrieval-augmented generation (RAG) introduce new opportunities by enabling focused retrieval of relevant documents or chunks tailored to a user’s query. This allows for dynamic, chat-like interactions that streamline exploration and improve access to pertinent information. This article introduces Topic-RAG, a chat engine that integrates topic modeling with RAG to support interactive and exploratory document retrieval. Topic-RAG uses BERTopic to identify the most relevant topics for a given query and restricts retrieval to documents or chunks within those topics. This targeted strategy enhances retrieval relevance by narrowing the search space to thematically aligned content. We utilize the pipeline on 4,711 articles related to nuclear energy from the Impresso historical Swiss newspaper corpus. Our experimental results demonstrate that Topic-RAG outperforms a baseline RAG architecture that does not incorporate topic modeling, as measured by widely recognized metrics, such as BERTScore (including Precision, Recall and F1), ROUGE and UniEval. Topic-RAG also achieves improvements in computational efficiency for both single and batch query processing. In addition, we performed a qualitative analysis in collaboration with domain experts, who assessed the system’s effectiveness in supporting historically grounded research. Although our evaluation is focused on historical newspaper articles, the proposed approach more generally integrates topic information to enhance retrieval performance within a transparent and user-configurable pipeline effectively. It supports the targeted retrieval of contextually rich and semantically relevant content while also allowing users to adjust key parameters such as the number of documents retrieved. This flexibility provides greater control and adaptability to meet diverse research needs in historical inquiry, literary analysis and cultural studies. Due to copyright restrictions, the raw data cannot be publicly shared. Data access instructions are provided in the repository, and the replication code is available on GitHub: https://github.com/KeerthanaMurugaraj/Topic-RAG-for-Historical-Newspapers.
Systematic reviews play a critical role in evidence-based research but are labor-intensive, especially during title and abstract screening. Compact large language models (LLMs) offer potential to automate this process, balancing time/cost requirements and accuracy. The aim of this study is to assess the feasibility, accuracy, and workload reduction by three compact LLMs (GPT-4o mini, Llama 3.1 8B, and Gemma 2 9B) in screening titles and abstracts. Records were sourced from three previously published systematic reviews and LLMs were requested to rate each record from 0 to 100 for inclusion, using a structured prompt. Predefined 25-, 50-, 75-rating thresholds were used to compute performance metrics (balanced accuracy, sensitivity, specificity, positive and negative predictive value, and workload-saving). Processing time and costs were registered. Across the systematic reviews, LLMs achieved high sensitivity (up to 100%) and low precision (below 10%) for records included by full text. Specificity and workload savings improved at higher thresholds, with the 50- and 75-rating thresholds offering optimal trade-offs. GPT-4o-mini, accessed via application programming interface, was the fastest model (~40 minutes max.) and had usage costs ($0.14–$1.93 per review). Llama 3.1-8B and Gemma 2-9B were run locally in longer times (~4 hours max.) and were free to use. LLMs were highly sensitive tools for the title/abstract screening process. High specificity values were reached, allowing for significant workload savings, at reasonable costs and processing time. Conversely, we found them to be imprecise. However, high sensitivity and workload reduction are key factors for their usage in the title/abstract screening phase of systematic reviews.
The use of large language models (LLMs) has exploded since November 2022, but there is sparse evidence regarding LLM use in health, medical, and research contexts. We aimed to summarise the current uses of and attitudes towards LLMs across our campus’ clinical, research, and teaching sites. We administered a survey about LLM uses and attitudes. We conducted summary quantitative analysis and inductive qualitative analysis of free text responses. In August–September 2023, we circulated the survey amongst all staff and students across our three campus sites (approximately n = 7500), comprising a paediatric academic hospital, research institute, and paediatric university department. We received 281 anonymous survey responses. We asked about participants’ knowledge of LLMs, their current use of LLMs in professional or learning contexts, and perspectives on possible future uses, opportunities, and risks of LLM use. Over 90% of respondents have heard of LLM tools and about two-thirds have used them in their work on our campus. Respondents reported using LLMs for various uses, including generating or editing text and exploring ideas. Many, but not necessarily all, respondents seem aware of the limitations and potential risks of LLMs, including privacy and security risks. Various respondents expressed enthusiasm about the opportunities of LLM use, including increased efficiency. Our findings show LLM tools are already widely used on our campus. Guidelines and governance are needed to keep up with practice. Insights from this survey were used to develop recommendations for the use of LLMs on our campus.
This chapter explores the potential and limitations of AI in the legal field, with a focus on its application in legal research through tools like Lexis+ AI. It critically evaluates Lexis+ AI’s capability in case retrieval, a crucial function for legal professionals who rely on accurate and comprehensive legal sources to inform their work. The study provides an empirical analysis of Lexis+ AI’s performance on cryptocurrency-related legal queries, revealing that while the tool can generate accurate responses, it often falls short in terms of relevance and completeness. This chapter concludes by discussing the implications for legal professionals and legal tech companies, emphasizing the need for ongoing refinement of AI technologies, the importance of keeping legal professionals involved in decision-making processes, and the necessity of further collaboration between the legal and tech sectors.
This study investigates unintended information flow in large language models (LLMs) by proposing a computational linguistic framework for detecting and analyzing domain anchorage. Domain anchorage is a phenomenon potentially caused by in-context learning or latent “cache” retention of prior inputs, which enables language models to infer and reinforce shared latent concepts across interactions, leading to uniformity in responses that can persist across distinct users or prompts. Using GPT-4 as a case study, our framework systematically quantifies the lexical, syntactic, semantic, and positional similarities between inputs and outputs to detect these domain anchorage effects. We introduce a structured methodology to evaluate the associated risks and highlight the need for robust mitigation strategies. By leveraging domain-aware analysis, this work provides a scalable framework for monitoring information persistence in LLMs, which can inform enterprise guardrails to ensure response consistency, privacy, and safety in real-world deployments.
Advances in natural language processing (NLP) and Big Data techniques have allowed us to learn about the human mind through one of its richest outputs – language. In this chapter, we introduce the field of computational linguistics and go through examples of how to find natural language and how to interpret the complexities that are present within it. The chapter discusses the major state-of-the-art methods being applied in NLP and how they can be applied to psychological questions, including statistical learning, N-gram models, word embedding models, large language models, topic modeling, and sentiment analysis. The chapter concludes with ethical discussions on the proliferation of chat “bots” that pervade our social networks, and the importance of balanced training sets for NLP models.