To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Neural Machine Translation (NMT), a subfield of Natural Language Processing, has seen significant advancements with the emergence of transformer architectures and generative artificial intelligence, demonstrating remarkable performance in various languages. However, translating Arabic dialects remains a notable challenge that becomes very pronounced primarily due to their morphological complexity and divergence from standardised grammatical rules. In this paper, we present a hybrid approach for translating the Maghrebi dialects into/from Modern Standard Arabic (MSA). The approach takes advantage of the strengths of the transformer architecture and the BERT language model for transfer learning of representations. To achieve this, we incorporated BERT embeddings into the encoder and decoder stacks of the transformer architecture. The BERT architecture, which we utilised, was trained in a self-supervised manner on Maghrebi dialects and Arabic corpora. The resulting BLEU/BERTScore/ChrF/METEOR scores for the approach were 14.148/79.414/28.885/28.428 and 8.961/20.994/19.465 (BLEU/ChrF/METEOR) for the translation in both directions using the raw data, demonstrating competitive performance compared to ChatGPT and Gemini Large Language Models (LLMs). Furthermore, we evaluated the approach using an ablation study with fine-tuned NLLB-200 and against three combinations of tokeniser techniques used in conjunction with the transformer architecture: Byte-Pair Encoding (BPE) tokeniser, WordPiece tokeniser, and BERT tokeniser. Both evaluations, including human evaluation, confirm the efficacy of our method.
Building energy management (BEM) tasks require processing and learning from a variety of time-series data. Existing solutions rely on bespoke task- and data-specific models to perform these tasks, limiting their broader applicability. Inspired by the transformative success of Large Language Models (LLMs), Time-Series Foundation Models (TSFMs), trained on diverse datasets, have the potential to change this. Were TSFMs to achieve a level of generalizability across tasks and contexts akin to LLMs, they could fundamentally address the scalability challenges pervasive in BEM. To understand where they stand today, we evaluate TSFMs across four dimensions: (1) generalizability in zero-shot univariate forecasting, (2) forecasting with covariates for thermal behavior modeling, (3) zero-shot representation learning for classification tasks, and (4) robustness to performance metrics and varying operational conditions. Our results reveal that TSFMs exhibit limited generalizability, performing only marginally better than statistical models on unseen datasets and modalities for univariate forecasting. Similarly, inclusion of covariates in TSFMs does not yield performance improvements, and their performance remains inferior to conventional models that utilize covariates. While TSFMs generate effective zero-shot representations for downstream classification tasks, they may remain inferior to statistical models in forecasting when statistical models perform test-time fitting. Moreover, TSFMs’ forecasting performance is sensitive to evaluation metrics, and they struggle in more complex building environments compared to statistical models. These findings underscore the need for targeted advancements in TSFM design, particularly their handling of covariates and incorporating context and temporal dynamics into prediction mechanisms, to develop more adaptable and scalable solutions for BEM.
Inter-party communication is crucial in representative democracies, facilitating information exchange and dialogue among political parties. Despite its importance, research on this topic remains limited due to lacking conceptual clarity and challenges in large-scale measurement. This article offers a comprehensive definition of inter-party communication as public communication by parties about others, with a positive, neutral, or negative stance, focusing on collaboration, policy, or personal issues. To effectively measure this phenomenon, we introduce a novel transformer-based approach capable of automatically classifying large volumes of text. Case studies on coalition signals in Germany and negative campaigning in Austria demonstrate its effectiveness. The study deepens our understanding of party competition, advances methods of automated text classification, and enables new research on political communication.
This chapter offers a comprehensive overview of large language models (LLMs), examining their theoretical foundations, core mechanisms, and broad-ranging implications. We begin by situating LLMs within the domain of natural language processing (NLP), tracing the evolution of language modeling from early statistical approaches to modern deep learning methods.</p>The focus then shifts to the transformative impact of the Transformer architecture, introduced in the seminal paper Attention Is All You Need. By leveraging self-attention and parallel computation, Transformers have enabled unprecedented scalability and efficiency in training large models.</p>We explore the pivotal role of transfer learning in NLP, emphasizing how pretraining on large text corpora followed by task-specific fine-tuning allows LLMs to generalize across a wide range of linguistic tasks. The chapter also discusses reinforcement learning with human feedback (RLHF)—a crucial technique for refining model outputs to better align with human preferences and values.</p>Key theoretical developments are introduced, including scaling laws, which describe how model performance improves predictably with increased data, parameters, and compute resources, and emergence, the surprising appearance of complex behaviors in sufficiently large models.</p>Beyond technical aspects, the chapter engages with deeper conceptual questions: Do LLMs genuinely "understand" language? Could advanced AI systems one day exhibit a form of consciousness, however rudimentary or speculative? These discussions draw from perspectives in cognitive science, philosophy of mind, and AI safety.</p>Finally, we explore future directions in the field, including the application of Transformer architectures beyond NLP, and the development of generative methods that extend beyond Transformer-based models, signaling a dynamic and rapidly evolving landscape in artificial intelligence.
In machine learning-based mortality models, interpretation methods are well established, and they can reveal structures resembling the age or time effects in traditional mortality models. However, in the reverse direction, using such traditional components to guide the initialization of a neural network remains highly challenging due to information loss during model interpretation. This study addresses this gap by exploring how components from pre-fitted traditional mortality models can be used to initialize neural networks, enabling structural information to be incorporated into a deep learning framework. We introduce Kolmogorov–Arnold Networks (KAN) and first construct two shallow models, KAN[2,1] and ARIMAKAN, to examine their applicability to mortality modeling. We then extend the Combined Actuarial Neural Network (CANN) into a KAN-based Actuarial Neural Network (KANN), in which classical model components calibrated via generalized nonlinear models or generalized additive models are naturally used for initialization. Three KANN variants, namely KANN[2,1], KANNLC, and KANNAPC, are proposed. In these models, neural networks assist in improving the accuracy of traditional models and help refine the original parameter estimates. All KANN-based models can also produce smooth mortality curves as well as smooth age, period, and cohort effects through simple regularization. Experiments on 34 populations demonstrate that KAN-based approaches achieve stable performance while balancing interpretability, smoothness, and predictive accuracy.
Underwater robots conducting inspections require autonomous obstacle avoidance capabilities to ensure safe operations. Training methods based on reinforcement learning (RL) can effectively develop autonomous obstacle avoidance strategies for underwater robots; however, training in real environments carries significant risks and can easily result in robot damage. This paper proposes a Sim-to-Real pipeline for RL-based training of autonomous obstacle avoidance in underwater robots, addressing the challenges associated with training and deploying RL methods for obstacle avoidance in this context. We establish a simulation model and environment for underwater robot training based on the mathematical model of the robot, comprehensively reducing the gap between simulation and reality in terms of system inputs, modeling, and outputs. Experimental results demonstrate that our high-fidelity simulation system effectively facilitates the training of autonomous obstacle avoidance algorithms, achieving a 94% success rate in obstacle avoidance and collision-free operation exceeding 5000 steps in virtual environments. Directly transferring the trained strategy to a real robot successfully performed obstacle avoidance experiments in a pool, validating the effectiveness of our method for autonomous strategy training and sim-to-real transfer in underwater robots.
In recent years, passive motion paradigms (PMPs), derived from the equilibrium point hypothesis and impedance control, have been utilised as manipulation methods for humanoid robots and robotic manipulators. These paradigms are typically achieved by creating a kinematic chain that enables the manipulator to perform goal-directed actions without explicitly solving the inverse kinematics. This approach leverages a kinematic model constructed through the training of artificial neural networks, aligning well with principles of cybernetics and cognitive computation by enabling adaptive and flexible control. Specifically, these networks model the relationship between joint angles and end-effector positions, facilitating the computation of the Jacobian matrix. Although this method does not require an accurate robot model, traditional neural networks often suffer from drawbacks such as overfitting and inefficient training, which can compromise the accuracy of the final PMP model. In this paper, we implement the method using a deep neural network and investigate the impact of activation functions and network depth on the performance of the kinematic model. Additionally, we propose a transfer learning approach to fine-tune the pre-trained model, enabling it to be transferred to other manipulator arms with different kinematic properties. Finally, we implement and evaluate the deep neural network-based PMP on the Universal Robots, comparing it with traditional kinematic controllers and assessing its physical interaction capabilities and accuracy.
Processing and extracting actionable information, such as fault or anomaly indicators originating from vibration telemetry, is both challenging and critical for an accurate assessment of mechanical system health and subsequent predictive maintenance. In the setting of predictive maintenance for populations of similar assets, the knowledge gained from any single asset should be leveraged to provide improved predictions across the entire population. In this paper, a novel approach to population-level health monitoring is presented adopting a transfer learning approach. The new methodology is applied to monitor multiple rotating plant assets in a power generation scenario. The focus is on the detection of statistical anomalies as a means of identifying deviations from the typical operating regime from a time series of telemetry data. This is a challenging task because the machine is observed under different operating regimes. The proposed methodology can effectively transfer information across different assets, automatically identifying segments with common statistical characteristics and using them to enrich the training of the local supervised learning models. The proposed solution leads to a substantial reduction in mean square error relative to a baseline model.
Prediction of dynamic environmental variables in unmonitored sites remains a long-standing challenge for water resources science. The majority of the world’s freshwater resources have inadequate monitoring of critical environmental variables needed for management. Yet, the need to have widespread predictions of hydrological variables such as river flow and water quality has become increasingly urgent due to climate and land use change over the past decades, and their associated impacts on water resources. Modern machine learning methods increasingly outperform their process-based and empirical model counterparts for hydrologic time series prediction with their ability to extract information from large, diverse data sets. We review relevant state-of-the art applications of machine learning for streamflow, water quality, and other water resources prediction and discuss opportunities to improve the use of machine learning with emerging methods for incorporating watershed characteristics and process knowledge into classical, deep learning, and transfer learning methodologies. The analysis here suggests most prior efforts have been focused on deep learning frameworks built on many sites for predictions at daily time scales in the United States, but that comparisons between different classes of machine learning methods are few and inadequate. We identify several open questions for time series predictions in unmonitored sites that include incorporating dynamic inputs and site characteristics, mechanistic understanding and spatial context, and explainable AI techniques in modern machine learning frameworks.
Surrogate models of turbulent diffusive flames could play a strategic role in the design of liquid rocket engine combustion chambers. The present article introduces a method to obtain data-driven surrogate models for coaxial injectors, by leveraging an inductive transfer learning strategy over a U-Net with available multifidelity Large Eddy Simulations (LES) data. The resulting models preserve reasonable accuracy while reducing the offline computational cost of data-generation. First, a database of about 100 low-fidelity LES simulations of shear-coaxial injectors, operating with gaseous oxygen and gaseous methane as propellants, has been created. The design of experiments explores three variables: the chamber radius, the recess-length of the oxidizer post, and the mixture ratio. Subsequently, U-Nets were trained upon this dataset to provide reasonable approximations of the temporal-averaged two-dimensional flow field. Despite the fact that neural networks are efficient non-linear data emulators, in purely data-driven approaches their quality is directly impacted by the precision of the data they are trained upon. Thus, a high-fidelity (HF) dataset has been created, made of about 10 simulations, to a much greater cost per sample. The amalgamation of low and HF data during the the transfer-learning process enables the improvement of the surrogate model’s fidelity without excessive additional cost.
In practice, nondestructive testing (NDT) procedures tend to consider experiments (and their respective models) as distinct, conducted in isolation, and associated with independent data. In contrast, this work looks to capture the interdependencies between acoustic emission (AE) experiments (as meta-models) and then use the resulting functions to predict the model hyperparameters for previously unobserved systems. We utilize a Bayesian multilevel approach (similar to deep Gaussian Processes) where a higher-level meta-model captures the inter-task relationships. Our key contribution is how knowledge of the experimental campaign can be encoded between tasks as well as within tasks. We present an example of AE time-of-arrival mapping for source localization, to illustrate how multilevel models naturally lend themselves to representing aggregate systems in engineering. We constrain the meta-model based on domain knowledge, then use the inter-task functions for transfer learning, predicting hyperparameters for models of previously unobserved experiments (for a specific design).
Transfer learning has been highlighted as a promising framework to increase the accuracy of the data-driven model in the case of data sparsity, specifically by leveraging pretrained knowledge to the training of the target model. The objective of this study is to evaluate whether the number of requisite training samples can be reduced with the use of various transfer learning models for predicting, for example, the chemical source terms of the data-driven reduced-order modeling (ROM) that represents the homogeneous ignition of a hydrogen/air mixture. Principal component analysis is applied to reduce the dimensionality of the hydrogen/air mixture in composition space. Artificial neural networks (ANNs) are used to regress the reaction rates of principal components, and subsequently, a system of ordinary differential equations is solved. As the number of training samples decreases in the target task, the ROM fails to predict the ignition evolution of a hydrogen/air mixture. Three transfer learning strategies are then applied to the training of the ANN model with a sparse dataset. The performance of the ROM with a sparse dataset is remarkably enhanced if the training of the ANN model is restricted by a regularization term that controls the degree of knowledge transfer from source to target tasks. To this end, a novel transfer learning method is introduced, Parameter control via Partial Initialization and Regularization (PaPIR), whereby the amount of knowledge transferred is systemically adjusted in terms of the initialization and regularization schemes of the ANN model in the target task.
With global wind energy capacity ramping up, accurately predicting damage equivalent loads (DELs) and fatigue across wind turbine populations is critical, not only for ensuring the longevity of existing wind farms but also for the design of new farms. However, the estimation of such quantities of interests is hampered by the inherent complexity in modeling critical underlying processes, such as the aerodynamic wake interactions between turbines that increase mechanical stress and reduce useful lifetime. While high-fidelity computational fluid dynamics and aeroelastic models can capture these effects, their computational requirements limits real-world usage. Recently, fast machine learning-based surrogates which emulate more complex simulations have emerged as a promising solution. Yet, most surrogates are task-specific and lack flexibility for varying turbine layouts and types. This study explores the use of graph neural networks (GNNs) to create a robust, generalizable flow and DEL prediction platform. By conceptualizing wind turbine populations as graphs, GNNs effectively capture farm layout-dependent relational data, allowing extrapolation to novel configurations. We train a GNN surrogate on a large database of PyWake simulations of random wind farm layouts to learn basic wake physics, then fine-tune the model on limited data for a specific unseen layout simulated in HAWC2Farm for accurate adapted predictions. This transfer learning approach circumvents data scarcity limitations and leverages fundamental physics knowledge from the source low-resolution data. The proposed platform aims to match simulator accuracy, while enabling efficient adaptation to new higher-fidelity domains, providing a flexible blueprint for wake load forecasting across varying farm configurations.
The disassembly of power batteries poses significant challenges due to their complex sources, diverse types, variations in design and manufacturing processes, and diverse service conditions. Human memory capacity and robot cognitive and understanding capabilities are limited when faced with different dismantling tasks for end-of-life power batteries. Insufficient human-computer interaction capabilities greatly hinder the efficiency of human-robot collaboration (HRC) operations. The existing HRC relies heavily on the experience of operators, while the existing disassembly system fails to update new disassembly strategies in real time when facing new battery varieties. Therefore, this paper proposes an augmented reality-assisted human-robot collaboration (AR-HRC) power battery dismantling system based on transfer learning. It consists of three modules: AR-HRC knowledge modeling, dismantling subgraph similarity assessment, and strategy transfer update. The AR-HRC knowledge modeling module aims to establish an intelligent mapping from tasks to collaborative strategies based on part features. Based on the evaluation of task similarity, the mobility assessment model divides subtasks into similar and dissimilar classes. For similar subtasks, the original dismantling strategy can be applied to the current task. However, for different subtasks, operators can issue instructions to the AR-HRC system through the human-computer interaction function of AR and develop new collaborative strategies based on actual conditions. Finally, a case study of power battery dismantling is conducted, and the results show that compared to traditional pre-programmed assembly, this system can improve dismantling efficiency and reduce cognitive burden.
This manuscript introduces deep learning models that simultaneously describe the dynamics of several yield curves. We aim to learn the dependence structure among the different yield curves induced by the globalization of financial markets and exploit it to produce more accurate forecasts. By combining the self-attention mechanism and nonparametric quantile regression, our model generates both point and interval forecasts of future yields. The architecture is designed to avoid quantile crossing issues affecting multiple quantile regression models. Numerical experiments conducted on two different datasets confirm the effectiveness of our approach. Finally, we explore potential extensions and enhancements by incorporating deep ensemble methods and transfer learning mechanisms.
Traditionally, electricity distribution networks were designed for unidirectional power flow without the need to accommodate generation installed at the point of use. However, with the increase in Distributed Energy Resources and other Low Carbon Technologies, the role of distribution networks is changing. This shift brings challenges, including the need for intensive metering and more frequent reconfiguration to identify threats from voltage and thermal violations. Mitigating action through reconfiguration is informed by State Estimation, which is especially challenging for low voltage distribution networks where the constraints of low observability, non-linear load relationships, and highly unbalanced systems all contribute to the difficulty of producing accurate state estimates. To counter low observability, this paper proposes the application of a novel transfer learning methodology, based upon the concept of conditional online Bayesian transfer, to make forward predictions of bus pseudo-measurements. Day ahead load forecasts at a fully observed point on the network are adjusted using the intraday residuals at other points in the network to provide them with load forecasts without the need for a complete set of forecast models at all substations. These form pseudo-measurements that then inform the state estimates at future time points. This methodology is demonstrated on both a representative IEEE Test network and on an actual GB 11 kV feeder network.
Developing an artificial design agent that mimics human design behaviors through the integration of heuristics is pivotal for various purposes, including advancing design automation, fostering human-AI collaboration, and enhancing design education. However, this endeavor necessitates abundant behavioral data from human designers, posing a challenge due to data scarcity for many design problems. One potential solution lies in transferring learned design knowledge from one problem domain to another. This article aims to gather empirical evidence and computationally evaluate the transferability of design knowledge represented at a high level of abstraction across different design problems. Initially, a design agent grounded in reinforcement learning (RL) is developed to emulate human design behaviors. A data-driven reward mechanism, informed by the Markov chain model, is introduced to reinforce prominent sequential design patterns. Subsequently, the design agent transfers the acquired knowledge from a source task to a target task using a problem-agnostic high-level representation. Through a case study involving two solar system designs, one dataset trains the design agent to mimic human behaviors, while another evaluates the transferability of these learned behaviors to a distinct problem. Results demonstrate that the RL-based agent outperforms a baseline model utilizing the first-order Markov chain model in both the source task without knowledge transfer and the target task with knowledge transfer. However, the model’s performance is comparatively lower in predicting the decisions of low-performing designers, suggesting caution in its application, as it may yield unsatisfactory results when mimicking such behaviors.
Domain adaptation is important in agriculture because agricultural systems have their own individual characteristics. Applying the same treatment practices (e.g., fertilization) to different systems may not have the desired effect due to those characteristics. Domain adaptation is also an inherent aspect of digital twins. In this work, we examine the potential of transfer learning for domain adaptation in pasture digital twins. We use a synthetic dataset of grassland pasture simulations to pretrain and fine-tune machine learning metamodels for nitrogen response rate prediction. We investigate the outcome in locations with diverse climates, and examine the effect on the results of including more weather and agricultural management practices data during the pretraining phase. We find that transfer learning seems promising to make the models adapt to new conditions. Moreover, our experiments show that adding more weather data on the pretraining phase has a small effect on fine-tuned model performance compared to adding more management practices. This is an interesting finding that is worth further investigation in future studies.
Based on Chapter 6, in this chapter we expand the discussion of neural networks to include networks that have more than one hidden layer. Common structures such as the convolutional neural network (CNN) or the Long Short-Term Memory network (LSTM) are explained and used along with Matlab’s Deep Network Designer App as well as Matlab script to implement and train such networks. Issues such as the vanishing or exploding gradient, normalization, and training strategies are discussed. Concepts that address overfitting and the vanishing or exploding gradient are introduced, including dropout and regularization. Transfer learning is discussed and showcased using Matlab’s DND App.
Supervised machine learning is an increasingly popular tool for analyzing large political text corpora. The main disadvantage of supervised machine learning is the need for thousands of manually annotated training data points. This issue is particularly important in the social sciences where most new research questions require new training data for a new task tailored to the specific research question. This paper analyses how deep transfer learning can help address this challenge by accumulating “prior knowledge” in language models. Models like BERT can learn statistical language patterns through pre-training (“language knowledge”), and reliance on task-specific data can be reduced by training on universal tasks like natural language inference (NLI; “task knowledge”). We demonstrate the benefits of transfer learning on a wide range of eight tasks. Across these eight tasks, our BERT-NLI model fine-tuned on 100 to 2,500 texts performs on average 10.7 to 18.3 percentage points better than classical models without transfer learning. Our study indicates that BERT-NLI fine-tuned on 500 texts achieves similar performance as classical models trained on around 5,000 texts. Moreover, we show that transfer learning works particularly well on imbalanced data. We conclude by discussing limitations of transfer learning and by outlining new opportunities for political science research.