We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Language models can produce fluent, grammatical text. Nonetheless, some maintain that language models don’t really learn language and also that, even if they did, that would not be informative for the study of human learning and processing. On the other side, there have been claims that the success of LMs obviates the need for studying linguistic theory and structure. We argue that both extremes are wrong. LMs can contribute to fundamental questions about linguistic structure, language processing, and learning. They force us to rethink arguments and ways of thinking that have been foundational in linguistics. While they do not replace linguistic structure and theory, they serve as model systems and working proofs of concept for gradient, usage-based approaches to language. We offer an optimistic take on the relationship between language models and linguistics.
This chapter covers regression and classification, where the goal is to estimate a quantity of interest (the response) from observed features. In regression, the response is a numerical variable. In classification, it belongs to a finite set of predetermined classes. We begin with a comprehensive description of linear regression and discuss how to leverage it to perform causal inference. Then, we explain under what conditions linear models tend to overfit or to generalize robustly to held-out data. Motivated by the threat of overfitting, we introduce regularization and ridge regression, and discuss sparse regression, where the goal is to fit a linear model that only depends on a small subset of the available features. Then, we introduce two popular linear models for binary and multiclass classification: Logistic and softmax regression. At this point, we turn our attention to nonlinear models. First, we present regression and classification trees and explain how to combine them via bagging, random forests, and boosting. Second, we explain how to train neural networks to perform regression and classification. Finally, we discuss how to evaluate classification models.
The engineering-to-order (ETO) sector, driven by the demands of new energy transition markets, is witnessing rapid innovation, especially in the design of complex systems of turbomachinery components. ETO involves tailoring products to meet specific customer requirements, often posing coordination challenges in integrating engineering and production. Meeting customer demands for short lead times without imposing high price premiums is a key industry challenge. This article explores the application of artificial neural networks as an enabler for design automation to deliver a first tentative optimal design solution in a short period of time with respect to more computationally demanding optimization methods. The research, conducted in collaboration with an energy company operating in the Oil & Gas and energy transition markets, focuses on the design process of reciprocating compressors as a means of study to develop and validate the developed methodology. Three case studies corresponding to as many representative jobs related to reciprocating compressor cylinders have been analyzed. The results indicate that the proposed method performs well within its training boundaries, delivering optimal solutions and providing reasonably accurate predictions for target configurations beyond these boundaries. However, in cases requiring a creative redesign using artificial neural networks may lead to errors that exceed acceptable tolerance levels. In any case, this methodology can significantly assist design engineers in the efficient design of complex systems of components, resulting in reduced operating and lead times.
Automatic precision herbicide application offers significant potential for reducing herbicide use in turfgrass weed management. However, developing accurate and reliable neural network models is crucial for achieving optimal precision weed control. The reported neural network models in previous research have been limited by specific geographic regions, weed species, and turfgrass management practices, restricting their broader applicability. The objective of this research was to evaluate the feasibility of deploying a single, robust model for weed classification across a diverse range of weed species, considering variations in species, ecotypes, densities, and growth stages in bermudagrass turfgrass systems across different regions in both China and the United States. Among the models tested, ResNeXt152 emerged as the top performer, demonstrating strong weed detection capabilities across 24 geographic locations and effectively identifying 14 weed species under varied conditions. Notably, the ResNeXt152 model achieved an F1 score and recall exceeding 0.99 across multiple testing scenarios, with a Matthews correlation coefficient (MCC) value surpassing 0.98, indicating its high effectiveness and reliability. These findings suggest that a single neural network model can reliably detect a wide range of weed species in diverse turf regimes, significantly reducing the costs associated with model training and confirming the feasibility of using one model for precision weed control across different turf settings and broad geographic regions.
Beginning with linear programming and ending with neural network training, this chapter features seven applications of the divide-and-concur approach to solving problems with RRR.
How areas of land are allocated for different uses, such as forests, urban areas, and agriculture, has a large effect on the terrestrial carbon balance and, therefore, climate change. Based on available historical data on land-use changes and a simulation of the associated carbon emissions and removals, a surrogate model can be learned that makes it possible to evaluate the different options available to decision-makers efficiently. An evolutionary search process can then be used to discover effective land-use policies for specific locations. Such a system was built on the Project Resilience platform and evaluated with the Land-Use Harmonization dataset LUH2 and the bookkeeping model BLUE. It generates Pareto fronts that trade off carbon impact and amount of land-use change customized to different locations, thus providing a proof-of-concept tool that is potentially useful for land-use planning.
Artificial neural networks are increasingly used for geophysical modeling to extract complex nonlinear patterns from geospatial data. However, it is difficult to understand how networks make predictions, limiting trust in the model, debugging capacity, and physical insights. EXplainable Artificial Intelligence (XAI) techniques expose how models make predictions, but XAI results may be influenced by correlated features. Geospatial data typically exhibit substantial autocorrelation. With correlated input features, learning methods can produce many networks that achieve very similar performance (e.g., arising from different initializations). Since the networks capture different relationships, their attributions can vary. Correlated features may also cause inaccurate attributions because XAI methods typically evaluate isolated features, whereas networks learn multifeature patterns. Few studies have quantitatively analyzed the influence of correlated features on XAI attributions. We use a benchmark framework of synthetic data with increasingly strong correlation, for which the ground truth attribution is known. For each dataset, we train multiple networks and compare XAI-derived attributions to the ground truth. We show that correlation may dramatically increase the variance of the derived attributions, and investigate the cause of the high variance: is it because different trained networks learn highly different functions or because XAI methods become less faithful in the presence of correlation? Finally, we show XAI applied to superpixels, instead of single grid cells, substantially decreases attribution variance. Our study is the first to quantify the effects of strong correlation on XAI, to investigate the reasons that underlie these effects, and to offer a promising way to address them.
This paper utilizes neural networks (NNs) for cycle detection in the insurance industry. The efficacy of NNs is compared on simulated data to the standard methods used in the underwriting cycles literature. The results show that NN models perform well in detecting cycles even in the presence of outliers and structural breaks. The methodology is applied to a granular data set of prices per risk profile from the Brazilian insurance industry.
Machine learning has exhibited substantial success in the field of natural language processing (NLP). For example, large language models have empirically proven to be capable of producing text of high complexity and cohesion. However, at the same time, they are prone to inaccuracies and hallucinations. As these systems are increasingly integrated into real-world applications, ensuring their safety and reliability becomes a primary concern. There are safety critical contexts where such models must be robust to variability or attack and give guarantees over their output. Computer vision had pioneered the use of formal verification of neural networks for such scenarios and developed common verification standards and pipelines, leveraging precise formal reasoning about geometric properties of data manifolds. In contrast, NLP verification methods have only recently appeared in the literature. While presenting sophisticated algorithms in their own right, these papers have not yet crystallised into a common methodology. They are often light on the pragmatical issues of NLP verification, and the area remains fragmented. In this paper, we attempt to distil and evaluate general components of an NLP verification pipeline that emerges from the progress in the field to date. Our contributions are twofold. First, we propose a general methodology to analyse the effect of the embedding gap – a problem that refers to the discrepancy between verification of geometric subspaces, and the semantic meaning of sentences which the geometric subspaces are supposed to represent. We propose a number of practical NLP methods that can help to quantify the effects of the embedding gap. Second, we give a general method for training and verification of neural networks that leverages a more precise geometric estimation of semantic similarity of sentences in the embedding space and helps to overcome the effects of the embedding gap in practice.
Numerical solutions of partial differential equations require expensive simulations, limiting their application in design optimization, model-based control, and large-scale inverse problems. Surrogate modeling techniques aim to decrease computational expense while retaining dominant solution features and characteristics. Existing frameworks based on convolutional neural networks and snapshot-matrix decomposition often rely on lossy pixelization and data-preprocessing, limiting their effectiveness in realistic engineering scenarios. Recently, coordinate-based multilayer perceptron networks have been found to be effective at representing 3D objects and scenes by regressing volumetric implicit fields. These concepts are leveraged and adapted in the context of physical-field surrogate modeling. Two methods toward generalization are proposed and compared: design-variable multilayer perceptron (DV-MLP) and design-variable hypernetworks (DVH). Each method utilizes a main network which consumes pointwise spatial information to provide a continuous representation of the solution field, allowing discretization independence and a decoupling of solution and model size. DV-MLP achieves generalization through the use of a design-variable embedding vector, while DVH conditions the main network weights on the design variables using a hypernetwork. The methods are applied to predict steady-state solutions around complex, parametrically defined geometries on non-parametrically-defined meshes, with model predictions obtained in less than a second. The incorporation of random Fourier features greatly enhanced prediction and generalization accuracy for both approaches. DVH models have more trainable weights than a similar DV-MLP model, but an efficient batch-by-case training method allows DVH to be trained in a similar amount of time as DV-MLP. A vehicle aerodynamics test problem is chosen to assess the method’s feasibility. Both methods exhibit promising potential as viable options for surrogate modeling, being able to process snapshots of data that correspond to different mesh topologies.
Algorithmic automatic item generation can be used to obtain large quantities of cognitive items in the domains of knowledge and aptitude testing. However, conventional item models used by template-based automatic item generation techniques are not ideal for the creation of items for non-cognitive constructs. Progress in this area has been made recently by employing long short-term memory recurrent neural networks to produce word sequences that syntactically resemble items typically found in personality questionnaires. To date, such items have been produced unconditionally, without the possibility of selectively targeting personality domains. In this article, we offer a brief synopsis on past developments in natural language processing and explain why the automatic generation of construct-specific items has become attainable only due to recent technological progress. We propose that pre-trained causal transformer models can be fine-tuned to achieve this task using implicit parameterization in conjunction with conditional generation. We demonstrate this method in a tutorial-like fashion and finally compare aspects of validity in human- and machine-authored items using empirical data. Our study finds that approximately two-thirds of the automatically generated items show good psychometric properties (factor loadings above .40) and that one-third even have properties equivalent to established and highly curated human-authored items. Our work thus demonstrates the practical use of deep neural networks for non-cognitive automatic item generation.
Utilizing technology for automated item generation is not a new idea. However, test items used in commercial testing programs or in research are still predominantly written by humans, in most cases by content experts or professional item writers. Human experts are a limited resource and testing agencies incur high costs in the process of continuous renewal of item banks to sustain testing programs. Using algorithms instead holds the promise of providing unlimited resources for this crucial part of assessment development. The approach presented here deviates in several ways from previous attempts to solve this problem. In the past, automatic item generation relied either on generating clones of narrowly defined item types such as those found in language free intelligence tests (e.g., Raven’s progressive matrices) or on an extensive analysis of task components and derivation of schemata to produce items with pre-specified variability that are hoped to have predictable levels of difficulty. It is somewhat unlikely that researchers utilizing these previous approaches would look at the proposed approach with favor; however, recent applications of machine learning show success in solving tasks that seemed impossible for machines not too long ago. The proposed approach uses deep learning to implement probabilistic language models, not unlike what Google brain and Amazon Alexa use for language processing and generation.
Several neural networks have been proposed in the general literature for pattern recognition and clustering, but little empirical comparison with traditional methods has been done. The results reported here compare neural networks using Kohonen learning with a traditional clustering method (K-means) in an experimental design using simulated data with known cluster solutions. Two types of neural networks were examined, both of which used unsupervised learning to perform the clustering. One used Kohonen learning with a conscience and the other used Kohonen learning without a conscience mechanism. The performance of these nets was examined with respect to changes in the number of attributes, the number of clusters, and the amount of error in the data. Generally, the K-means procedure had fewer points misclassified while the classification accuracy of neural networks worsened as the number of clusters in the data increased from two to five.
Component-based approaches have been regarded as a tool for dimension reduction to predict outcomes from observed variables in regression applications. Extended redundancy analysis (ERA) is one such component-based approach which reduces predictors to components explaining maximum variance in the outcome variables. In many instances, ERA can be extended to capture nonlinearity and interactions between observed and components, but only by specifying a priori functional form. Meanwhile, machine learning methods like neural networks are typically used in a data-driven manner to capture nonlinearity without specifying the exact functional form. In this paper, we introduce a new method that integrates neural networks algorithms into the framework of ERA, called NN-ERA, to capture any non-specified nonlinear relationships among multiple sets of observed variables for constructing components. Simulations and empirical datasets are used to demonstrate the usefulness of NN-ERA. The conclusion is that in social science datasets with unstructured data, where we expect nonlinear relationships that cannot be specified a priori, NN-ERA with its neural network algorithmic structure can serve as a useful tool to specify and test models otherwise not captured by the conventional component-based models.
Neural Network models are commonly used for cluster analysis in engineering, computational neuroscience, and the biological sciences, although they are rarely used in the social sciences. In this study we compare the classification capabilities of the 1-dimensional Kohonen neural network with two partitioning (Hartigan and Späth k-means) and three hierarchical (Ward's, complete linkage, and average linkage) cluster methods in 2,580 data sets with known cluster structure. Overall, the performance of the Kohonen networks was similar to, or better than, the performance of the other methods.
The popularity of green, social and sustainability-linked bonds (GSS bonds) continues to rise, with circa US$939 billion of such bonds issued globally in 2023. Given the rising popularity of ESG-related investment solutions, their relatively recent emergence, and limited research in this field, continued investigation is essential. Extending non-traditional techniques such as neural networks to these fields creates a good blend of innovation and potential. This paper follows on from our initial publication, where we aim to replicate the S&P Green Bond Index (i.e. this is a time series problem) over a period using non-traditional techniques (neural networks) predicting 1 day ahead. We take a novel approach of applying an N-BEATS model architecture. N-BEATS is a complex feedforward neural network architecture, consisting of basic building blocks and stacks, introducing the novel doubly residual stacking of backcasts and forecasts. In this paper, we also revisit the neural network architectures from our initial publication, which include DNNs, CNNs, GRUs and LSTMs. We continue the univariate time series problem, increasing the data input window from 1 day to 2 and 5 days respectively, whilst still aiming to predict 1 day ahead.
With the emerging developments in millimeter-wave/5G technologies, the potential for wireless Internet of things devices to achieve widespread sensing, precise localization, and high data-rate communication systems becomes increasingly viable. The surge in interest surrounding virtual reality (VR) and augmented reality (AR) technologies is attributed to the vast array of applications they enable, ranging from surgical training to motion capture and daily interactions in VR spaces. To further elevate the user experience, and real-time and accurate orientation detection of the user, the authors proposes the utilization of a frequency-modulated continuous-wave (FMCW) radar system coupled with an ultra-low-power, sticker-like millimeter-wave identification (mmID). The mmID features four backscattering elements, multiplexed in amplitude, frequency, and spatial domains. This design utilizes the training of a supervised learning classification convolutional neural network, enabling accurate real-time three-axis orientation detection of the user. The proposed orientation detection system exhibits exceptional performance, achieving a noteworthy accuracy of 90.58% over three axes at a distance of 8 m. This high accuracy underscores the precision of the orientation detection system, particularly tailored for medium-range VR/AR applications. The integration of the FMCW-based mmID system with machine learning proves to be a promising advancement, contributing to the seamless and immersive interaction within virtual and augmented environments.
Machine vision–based herbicide applications relying on object detection or image classification deep convolutional neural networks (DCNNs) demand high memory and computational resources, resulting in lengthy inference times. To tackle these challenges, this study assessed the effectiveness of three teacher models, each trained on datasets of varying sizes, including D-20k (comprising 10,000 true-positive and true-negative images) and D-10k (comprising 5,000 true-positive and true-negative images). Additionally, knowledge distillation was performed on their corresponding student models across a range of temperature settings. After the process of student–teacher learning, the parameters of all student models were reduced. ResNet18 not only achieved higher accuracy (ACC ≥ 0.989) but also maintained higher frames per second (FPS ≥ 742.9) under its optimal temperature condition (T = 1). Overall, the results suggest that employing knowledge distillation in the machine vision models enabled accurate and reliable weed detection in turf while reducing the need for extensive computational resources, thereby facilitating real-time weed detection and contributing to the development of smart, machine vision–based sprayers.
The standard two-step scheme for modeling extracellular signals is to first compute the neural membrane currents using multicompartment neuron models (step 1) and next use the volume-conductor theory to compute the extracellular potential resulting from these membrane currents (step 2). We here give a brief introduction to the multicompartment modeling of neurons in step 1. The formalism presented, which has become the gold standard within the field, combines a Hodgkin-Huxley-type description of membrane mechanisms with the cable theory description of the membrane potential in dendrites and axons.