To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Emphasizing how and why machine learning algorithms work, this introductory textbook bridges the gap between the theoretical foundations of machine learning and its practical algorithmic and code-level implementation. Over 85 thorough worked examples, in both Matlab and Python, demonstrate how algorithms are implemented and applied whilst illustrating the end result. Over 75 end-of-chapter problems empower students to develop their own code to implement these algorithms, equipping them with hands-on experience. Matlab coding examples demonstrate how a mathematical idea is converted from equations to code, and provide a jumping off point for students, supported by in-depth coverage of essential mathematics including multivariable calculus, linear algebra, probability and statistics, numerical methods, and optimization. Accompanied online by instructor lecture slides, downloadable Python code and additional appendices, this is an excellent introduction to machine learning for senior undergraduate and graduate students in Engineering and Computer Science.
Artificial Intelligence is an area of law where legal frameworks are still in early stages. The chapter discusses some of the core HCI-related concerns with AI, including deepfakes, bias and discrimination, and concepts within AI and intellectual property including AI infringement and AI protection
This chapter defines data-intensive research in the context of the English language and explores its prospects. It argues that data intensiveness extends beyond a single digital method or the use of advanced statistical tools; rather it encompasses a broader transformation and fuller integration of digital tools and methods throughout the research process. We also address the potential pitfalls of data fetishism and over-reliance on data, and we draw parallels with the digital transformation in another discipline, specifically biosciences, to illustrate the fundamental changes proposed as a result of digitalization. The lessons learned from other fields underscore the need for increased multi- and interdisciplinary collaboration and the development of broader digital infrastructures. This includes investments in enhanced computing power, robust data management processes, and a greater emphasis on replicability and transparency in reporting methods, data, and analytical techniques.
Sentiment analysis and stance detection are key tasks in text analysis, with applications ranging from understanding political opinions to tracking policy positions. Recent advances in large language models (LLMs) offer significant potential to enhance sentiment analysis techniques and to evolve them into the more nuanced task of detecting stances expressed toward specific subjects. In this study, we evaluate lexicon-based models, supervised models, and LLMs for stance detection using two corpuses of social media data—a large corpus of tweets posted by members of the U.S. Congress on Twitter and a smaller sample of tweets from general users—which both focus on opinions concerning presidential candidates during the 2020 election. We consider several fine-tuning strategies to improve performance—including cross-target tuning using an assumption of congressmembers’ stance based on party affiliation—and strategies for fine-tuning LLMs, including few shot and chain-of-thought prompting. Our findings demonstrate that: 1) LLMs can distinguish stance on a specific target even when multiple subjects are mentioned, 2) tuning leads to notable improvements over pretrained models, 3) cross-target tuning can provide a viable alternative to in-target tuning in some settings, and 4) complex prompting strategies lead to improvements over pretrained models but underperform tuning approaches.
Social scientists have quickly adopted large language models (LLMs) for their ability to annotate documents without supervised training, an ability known as zero-shot classification. However, due to their computational demands, cost, and often proprietary nature, these models are frequently at odds with open science standards. This article introduces the Political Domain Enhanced BERT-based Algorithm for Textual Entailment (DEBATE) language models: Foundation models for zero-shot, few-shot, and supervised classification of political documents. As zero-shot classifiers, the models are designed to be used for common, well-defined tasks, such as topic and opinion classification. When used in this context, the DEBATE models are not only as good as state-of-the-art LLMs at zero-shot classification, but are orders of magnitude more efficient and completely open source. We further demonstrate that the models are effective few-shot learners. With a simple random sample of 10–25 documents, they can outperform supervised classifiers trained on hundreds or thousands of documents and state-of-the-art generative models. Additionally, we release the PolNLI dataset used to train these models—a corpus of over 200,000 political documents with highly accurate labels across over 800 classification tasks.
Large language models (LLMs) are increasingly used to address real-world design problems, especially during the design ideation phase. Although LLMs hold substantial promise for concept generation, the understanding of how they can effectively assist designers in enhancing the diversity of design concepts is still limited. In this study, we set up different strategies for prompting multiple professional personas to the LLM for design concept generation, including (1) multiple prompts for concept generation in parallel, each with a professional persona, (2) a single prompt for concept generation with multiple professional personas, and (3) a sequence of prompts for concept generation and update, each with a professional persona. We formulate and test several hypotheses on the effectiveness of different strategies. All hypotheses are tested by constructing professional knowledge bases, selecting design problems and personas, and designing the prompts. The results suggest that LLMs can facilitate the design ideation process and provide more diverse design concepts when they are given multiple prompts in parallel, each with a professional persona, or given a sequence of prompts with multiple professional personas to generate and update design concepts gradually.
This study investigates the English of-NP (noun phrase) evaluation construction (e.g., It’s nice of you to help me plan this wedding), hypothesizing that its constructional meaning encodes socially mediated evaluation and imposes semantic constraints on the NP slot. We adopt a dual methodological approach, combining collostructional analysis to identify lexeme–construction associations with surprisal analysis using a large language model (LLM) (GPT-2) to assess predictive processing difficulty. The two methods complement each other, capturing both static distributional patterns and dynamic expectancy profiles. Three experimental manipulations were implemented: preposition alternation, variation in NP agentivity and variation in NP intentionality. Results show that NPs conforming to the hypothesized slot constraints yield lower surprisal values, whereas constraint-violating NPs trigger higher surprisal, aligning with the observed collostructional strengths. These findings provide empirical support for the view that constructional compatibility shapes predictive processing and contributes to integrating Construction Grammar (CxG) with prediction-based models of language processing.
The conceptual design of mission-tailored aircraft is increasingly shifting towards system of systems (SoS) perspectives that account for system interactions using a holistic view. Agent-based modelling and simulation (ABMS) is a common approach for analysing an SoS, but the behaviour of its agents tends to be defined by rigid behaviour trees. The present work aims to evaluate the suitability of a prompt-engineered large language model (LLM) acting as the Incident Commander (IC), replacing the fixed behaviour trees that govern the agents’ decisions. The research contributes by developing a prompting framework for operational guidelines, constraints, and priorities to obtain an LLM commander within a wildfire suppression, SoS capable of replicating human decisions. By enabling agents in a simulation model with decision-making capabilities closer to those expected from humans, the commander’s decisions and potential emergent patterns can be translated into more defined requirements for aircraft conceptual design (ACD) (e.g., endurance, payload, sensors, communications, or turnaround requirements). Results showed that an LLM commander facilitated adaptive and context-aware decisions that can be analysed via decision logs. The results allow designers to derive aircraft requirements for their specific roles from operational outcomes rather than a priori assumptions, linking SoS mission needs and ACD parameters.
The exploration and retrieval of information from large, unstructured document collections remain challenging. Unsupervised techniques, such as clustering and topic modeling, provide only a coarse overview of thematic structure, while traditional keyword searches often require extensive manual effort. Recent advances in large language models and retrieval-augmented generation (RAG) introduce new opportunities by enabling focused retrieval of relevant documents or chunks tailored to a user’s query. This allows for dynamic, chat-like interactions that streamline exploration and improve access to pertinent information. This article introduces Topic-RAG, a chat engine that integrates topic modeling with RAG to support interactive and exploratory document retrieval. Topic-RAG uses BERTopic to identify the most relevant topics for a given query and restricts retrieval to documents or chunks within those topics. This targeted strategy enhances retrieval relevance by narrowing the search space to thematically aligned content. We utilize the pipeline on 4,711 articles related to nuclear energy from the Impresso historical Swiss newspaper corpus. Our experimental results demonstrate that Topic-RAG outperforms a baseline RAG architecture that does not incorporate topic modeling, as measured by widely recognized metrics, such as BERTScore (including Precision, Recall and F1), ROUGE and UniEval. Topic-RAG also achieves improvements in computational efficiency for both single and batch query processing. In addition, we performed a qualitative analysis in collaboration with domain experts, who assessed the system’s effectiveness in supporting historically grounded research. Although our evaluation is focused on historical newspaper articles, the proposed approach more generally integrates topic information to enhance retrieval performance within a transparent and user-configurable pipeline effectively. It supports the targeted retrieval of contextually rich and semantically relevant content while also allowing users to adjust key parameters such as the number of documents retrieved. This flexibility provides greater control and adaptability to meet diverse research needs in historical inquiry, literary analysis and cultural studies. Due to copyright restrictions, the raw data cannot be publicly shared. Data access instructions are provided in the repository, and the replication code is available on GitHub: https://github.com/KeerthanaMurugaraj/Topic-RAG-for-Historical-Newspapers.
Systematic reviews play a critical role in evidence-based research but are labor-intensive, especially during title and abstract screening. Compact large language models (LLMs) offer potential to automate this process, balancing time/cost requirements and accuracy. The aim of this study is to assess the feasibility, accuracy, and workload reduction by three compact LLMs (GPT-4o mini, Llama 3.1 8B, and Gemma 2 9B) in screening titles and abstracts. Records were sourced from three previously published systematic reviews and LLMs were requested to rate each record from 0 to 100 for inclusion, using a structured prompt. Predefined 25-, 50-, 75-rating thresholds were used to compute performance metrics (balanced accuracy, sensitivity, specificity, positive and negative predictive value, and workload-saving). Processing time and costs were registered. Across the systematic reviews, LLMs achieved high sensitivity (up to 100%) and low precision (below 10%) for records included by full text. Specificity and workload savings improved at higher thresholds, with the 50- and 75-rating thresholds offering optimal trade-offs. GPT-4o-mini, accessed via application programming interface, was the fastest model (~40 minutes max.) and had usage costs ($0.14–$1.93 per review). Llama 3.1-8B and Gemma 2-9B were run locally in longer times (~4 hours max.) and were free to use. LLMs were highly sensitive tools for the title/abstract screening process. High specificity values were reached, allowing for significant workload savings, at reasonable costs and processing time. Conversely, we found them to be imprecise. However, high sensitivity and workload reduction are key factors for their usage in the title/abstract screening phase of systematic reviews.
The use of large language models (LLMs) has exploded since November 2022, but there is sparse evidence regarding LLM use in health, medical, and research contexts. We aimed to summarise the current uses of and attitudes towards LLMs across our campus’ clinical, research, and teaching sites. We administered a survey about LLM uses and attitudes. We conducted summary quantitative analysis and inductive qualitative analysis of free text responses. In August–September 2023, we circulated the survey amongst all staff and students across our three campus sites (approximately n = 7500), comprising a paediatric academic hospital, research institute, and paediatric university department. We received 281 anonymous survey responses. We asked about participants’ knowledge of LLMs, their current use of LLMs in professional or learning contexts, and perspectives on possible future uses, opportunities, and risks of LLM use. Over 90% of respondents have heard of LLM tools and about two-thirds have used them in their work on our campus. Respondents reported using LLMs for various uses, including generating or editing text and exploring ideas. Many, but not necessarily all, respondents seem aware of the limitations and potential risks of LLMs, including privacy and security risks. Various respondents expressed enthusiasm about the opportunities of LLM use, including increased efficiency. Our findings show LLM tools are already widely used on our campus. Guidelines and governance are needed to keep up with practice. Insights from this survey were used to develop recommendations for the use of LLMs on our campus.
This chapter explores the potential and limitations of AI in the legal field, with a focus on its application in legal research through tools like Lexis+ AI. It critically evaluates Lexis+ AI’s capability in case retrieval, a crucial function for legal professionals who rely on accurate and comprehensive legal sources to inform their work. The study provides an empirical analysis of Lexis+ AI’s performance on cryptocurrency-related legal queries, revealing that while the tool can generate accurate responses, it often falls short in terms of relevance and completeness. This chapter concludes by discussing the implications for legal professionals and legal tech companies, emphasizing the need for ongoing refinement of AI technologies, the importance of keeping legal professionals involved in decision-making processes, and the necessity of further collaboration between the legal and tech sectors.
This study investigates unintended information flow in large language models (LLMs) by proposing a computational linguistic framework for detecting and analyzing domain anchorage. Domain anchorage is a phenomenon potentially caused by in-context learning or latent “cache” retention of prior inputs, which enables language models to infer and reinforce shared latent concepts across interactions, leading to uniformity in responses that can persist across distinct users or prompts. Using GPT-4 as a case study, our framework systematically quantifies the lexical, syntactic, semantic, and positional similarities between inputs and outputs to detect these domain anchorage effects. We introduce a structured methodology to evaluate the associated risks and highlight the need for robust mitigation strategies. By leveraging domain-aware analysis, this work provides a scalable framework for monitoring information persistence in LLMs, which can inform enterprise guardrails to ensure response consistency, privacy, and safety in real-world deployments.
Advances in natural language processing (NLP) and Big Data techniques have allowed us to learn about the human mind through one of its richest outputs – language. In this chapter, we introduce the field of computational linguistics and go through examples of how to find natural language and how to interpret the complexities that are present within it. The chapter discusses the major state-of-the-art methods being applied in NLP and how they can be applied to psychological questions, including statistical learning, N-gram models, word embedding models, large language models, topic modeling, and sentiment analysis. The chapter concludes with ethical discussions on the proliferation of chat “bots” that pervade our social networks, and the importance of balanced training sets for NLP models.
In the realm of data-to-text generation tasks, the use of large language models (LLMs) has become common practice, yielding fluent and coherent outputs. Existing literature highlights that the quality of in-context examples significantly influences the empirical performance of these models, making the efficient selection of high-quality examples crucial. We hypothesize that the quality of these examples is primarily determined by two properties: their similarity to the input data and their diversity from one another. Based on this insight, we introduce a novel approach, Double Clustering-based In-Context Example Selection, specifically designed for data-to-text generation tasks. Our method involves two distinct clustering stages. The first stage aims to maximize the similarity between the in-context examples and the input data. The second stage ensures diversity among the selected in-context examples. Additionally, we have developed a batched generation method to enhance the token usage efficiency of LLMs. Experimental results demonstrate that, compared to traditional methods of selecting in-context learning samples, our approach significantly improves both time efficiency and token utilization while maintaining accuracy.
Everyone is talking about bots. Much of the discussion has focused on downsides. It is too easy to use bots to cheat, but there are also many ways to use bots to improve your writing. Good writers use thesauruses. It is not cheating to use bots as a modern version of a thesaurus. It is also not cheating to use recommendation systems in a responsible way.
A critical step in systematic reviews involves the definition of a search strategy, with keywords and Boolean logic, to filter electronic databases. We hypothesize that it is possible to screen articles in electronic databases using large language models (LLMs) as an alternative to search equations. To investigate this matter, we compared two methods to identify randomized controlled trials (RCTs) in electronic databases: filtering databases using the Cochrane highly sensitive search and an assessment by an LLM.
We retrieved studies indexed in PubMed with a publication date between September 1 and September 30, 2024 using the sole keyword “diabetes.” We compared the performance of the Cochrane highly sensitive search and the assessment of all titles and abstracts extracted directly from the database by GPT-4o-mini to identify RCTs. Reference standard was the manual screening of retrieved articles by two independent reviewers.
The search retrieved 6377 records, of which 210 (3.5%) were primary reports of RCTs. The Cochrane highly sensitive search filtered 2197 records and missed one RCT (sensitivity 99.5%, 95% CI 97.4% to100%; specificity 67.8%, 95% CI 66.6% to 68.9%). Assessment of all titles and abstracts from the electronic database by GPT filtered 1080 records and included all 210 primary reports of RCTs (sensitivity 100%, 95% CI 98.3% to100%; specificity 85.9%, 95% CI 85.0% to 86.8%).
LLMs can screen all articles in electronic databases to identify RCTs as an alternative to the Cochrane highly sensitive search. This calls for the evaluation of LLMs as an alternative to rigid search strategies.
Informal caregivers such as family members or friends provide much care to people with physical or cognitive impairment. To address challenges in care, caregivers often seek information online via social media platforms for their health information wants (HIWs), the types of care-related information that caregivers wish to have. Some efforts have been made to use Artificial Intelligence (AI) to understand caregivers’ information behaviors on social media. In this chapter, we present achievements of research with a human–AI collaboration approach in identifying caregivers’ HIWs, focusing on dementia caregivers as one example. Through this collaboration, AI techniques such as large language models (LLMs) can be used to extract health-related domain knowledge for building classification models, while human experts can benefit from the help of AI to further understand caregivers’ HIWs. Our approach has implications for the caregiving of various groups. The outcomes of human–AI collaboration can provide smart interventions to help caregivers and patients.
Cadastral data reveal key information about the historical organization of cities but are often non-standardized due to diverse formats and human annotations, complicating large-scale analysis. We explore as a case study Venice’s urban history during the critical period from 1740 to 1808, capturing the transition following the fall of the ancient Republic and the Ancien Régime. This era’s complex cadastral data, marked by its volume and lack of uniform structure, presents unique challenges that our approach adeptly navigates, enabling us to generate spatial queries that bridge past and present urban landscapes. We present a text-to-programs framework that leverages large language models to process natural language queries as executable code for analyzing historical cadastral records. Our methodology implements two complementary techniques: a SQL agent for handling structured queries about specific cadastral information, and a coding agent for complex analytical operations requiring custom data manipulation. We propose a taxonomy that classifies historical research questions based on their complexity and analytical requirements, mapping them to the most appropriate technical approach. This framework is supported by an investigation into the execution consistency of the system, alongside a qualitative analysis of the answers it produces. By ensuring interpretability and minimizing hallucination through verifiable program outputs, we demonstrate the system’s effectiveness in reconstructing past population information, property features and spatiotemporal comparisons in Venice.
This article explores the potential of large language models (LLMs), particularly through the use of contextualized word embeddings, to trace the evolution of scientific concepts. It thus aims to extend the potential of LLMs, currently transforming much of humanities research, to the specialized field of history and philosophy of science. Using the concept of the virtual particle – a fundamental idea in understanding elementary particle interactions – as a case study, we domain-adapted a pretrained Bidirectional Encoder Representations from Transformers model on nearly a century of Physical Review publications. By employing semantic change detection techniques, we examined shifts in the meaning and usage of the term “virtual.” Our analysis reveals that the dominant meaning of “virtual” stabilized after the 1950s, aligning with the formalization of the virtual particle concept, while the polysemy of “virtual” continued to grow. Augmenting these findings with dependency parsing and qualitative analysis, we identify pivotal historical transitions in the term’s usage. In a broader methodological discussion, we address challenges such as the complex relationship between words and concepts, the influence of historical and linguistic biases in datasets, and the exclusion of mathematical formulas from text-based approaches.