To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Fake news detection is an emerging topic that has attracted a lot of attention among researchers and in the industry. This paper focuses on fake news detection as a text classification problem: on the basis of five publicly available corpora with documents labeled as true or fake, the task was to automatically distinguish both classes without relying on fact-checking. The aim of our research was to test the feasibility of a universal model: one that produces satisfactory results on all data sets tested in our article. We attempted to do so by training a set of classification models on one collection and testing them on another. As it turned out, this resulted in a sharp performance degradation. Therefore, this paper focuses on finding the most effective approach to utilizing information in a transferable manner. We examined a variety of methods: feature selection, machine learning approaches to data set shift (instance re-weighting and projection-based), and deep learning approaches based on domain transfer. These methods were applied to various feature spaces: linguistic and psycholinguistic, embeddings obtained from the Universal Sentence Encoder, and GloVe embeddings. A detailed analysis showed that some combinations of these methods and selected feature spaces bring significant improvements. When using linguistic data, feature selection yielded the best overall mean improvement (across all train-test pairs) of 4%. Among the domain adaptation methods, the greatest improvement of 3% was achieved by subspace alignment.
In this paper, a hydraulic soft gripper for underwater applications is designed to provide a solution for improving the gripping force as well as the sensing capability of the soft gripper. The soft gripper is made of silicone and has an integrated semi-circular hydraulic network inside. To enhance the rigidity and grasping performance of the soft gripper, we have integrated a restriction layer consisting of a spring steel plate in the soft gripper. Meanwhile, to enhance the sensing capability of this soft gripper, we have designed a water pressure sensor based on resistance strain gauges and integrated it on the spring steel plate. Before fabrication, we determined the structural parameters of the soft gripper by geometric analysis. Then we experimentally evaluated its pressure-bearing capacity, bending performance, the role of spring steel plates, and the accuracy of the sensor.The experimental results show that the spring steel plate improves the gripping force of the soft gripper, the sensor also has high accuracy, and the built four-finger gripping system has good adaptability to objects of different shapes and weights.Compared with the existing solutions, this solution takes a simpler structural form while improving the gripping force and sensing ability of the soft gripper, and integrates the issues of improving the gripping force and sensing ability. The spring steel plate used in this paper not only improves the gripping force of the soft gripper but also provides a stable and reliable platform for installing sensors.
A growing trend in requirements elicitation is the use of machine learning (ML) techniques to automate the cumbersome requirement handling process. This literature review summarizes and analyzes studies that incorporate ML and natural language processing (NLP) into demand elicitation. We answer the following research questions: (1) What requirement elicitation activities are supported by ML? (2) What data sources are used to build ML-based requirement solutions? (3) What technologies, algorithms, and tools are used to build ML-based requirement elicitation? (4) How to construct an ML-based requirements elicitation method? (5) What are the available tools to support ML-based requirements elicitation methodology? Keywords derived from these research questions led to 975 records initially retrieved from 7 scientific search engines. Finally, 86 articles were selected for inclusion in the review. As the primary research finding, we identified 15 ML-based requirement elicitation tasks and classified them into four categories. Twelve different data sources for building a data-driven model are identified and classified in this literature review. In addition, we categorized the techniques for constructing ML-based requirement elicitation methods into five parts, which are Data Cleansing and Preprocessing, Textual Feature Extraction, Learning, Evaluation, and Tools. More specifically, 3 categories of preprocessing methods, 3 different feature extraction strategies, 12 different families of learning methods, 2 different evaluation strategies, and various off-the-shelf publicly available tools were identified. Furthermore, we discussed the limitations of the current studies and proposed eight potential directions for future research.
Complexity in product design increases with little understanding of cause and effect. As a consequence, the impact of design decisions (or changes) on the product is difficult to predict and control. This article presents a model of cause and effect for design decisions that avoid circular dependencies: the so-called attribute dependency graph (ADG) models complex system behaviour and properties, and increases transparency by carefully distinguishing between what is realised and what is required. An ADG is a polyhierarchy, with design variables (directly controllable) at the bottom, quantities of interest (not directly controllable) on the top, and intermediate attributes. The dependencies represent causality in a simple sense: assigning values to design variables, representing the cause, will determine the values of the dependent attributes, representing the effect. ADGs do not account for what is required, but for what effects emerge by design activity. A set of rules makes them independent of designers’ views. They provide the structure for so-called INUS conditions, that is, insufficient but necessary parts of unnecessary but sufficient conditions that can be used for requirement development. The modelling approach is applied to one simple synthetic and then to two real-world design problems, the design of a water hose box and a passenger vehicle.
Mappings play an important role in environmental science applications by allowing practitioners to monitor changes at national and global scales. Over the last decade, it has become increasingly popular to use satellite imagery data and machine learning techniques (MLTs) to construct such maps. Given the black-box nature of many of these MLTs though, quantifying uncertainty in these maps often relies on sampling reference data under stricter conditions. However, practical constraints can sampling such data expensive, which forces stakeholders to make a trade-off between the degree of uncertainty in predictions and the costs of collecting appropriately sampled reference data. Furthermore, quantifying any trade-off is often difficult, as it will depend on many interdependent factors that cannot be fully understood until more data is collected. This paper investigates how a combination of Bayesian inference and an adaptive approach to sampling reference data can offer a generalizable way of managing such trade-offs. The approach is illustrated and evaluated using a woodland mapping of England as a case study in which reference data is collected under constraints motivated by COVID-19 travel restrictions. The key findings of this paper are as follows: (a) an adaptive approach to sampling reference data allows an informed approach when quantifying this trade-off; and (b) Bayesian inference is naturally suited to adaptive sampling and can make use of Monte Carlo methods when dealing with more advanced problems and analytical techniques.
Nigeria commenced its national foundational digital identity project in 2007 and had enrolled 60 million people by July 2021. The project, led by the National Identity Management Commission (NIMC), seeks to unify the country’s public and private functional identity databases, and aims to improve government services and national security. Although the enrolment process had encountered initial challenges such as the absence of enrolment centers in some communities across the country, enrolment for the biometric ID had proceeded without any significant public objection to its objectives. Following the EndSARS protests of October 2020, where youths protesting police violence and perceived poor governance were shot at by government security forces and protesters placed under surveillance, the government announced an updated national identity policy mandating citizens link their National Identity Number (NIN) with their SIM card information. For the first time, significant pockets of resistance arose against the national ID project by sections of the public who perceived the EndSARS violence as signaling a change in government behavior, and the updated ID policy as a mechanism for empowering government surveillance and authoritarianism. The resistance to the ID project marked a shift in public perception which threatens its future. This paper argues that mistrust in government data collection projects grows when data collection is perceived to be increasing government power to the detriment of human rights and freedom. It also puts forward a proposal on how to restore trust within the low-trust environment in Nigeria including the passage of a data protection law and amendments to the NIMC Act and Policies/Regulations, establishing Federated identity providers which give choices to end-users, and delinking the NIN from functional identity databases.
This paper proposes an approach to diagnosing the skill of a machine-learning prediction model based on finding combinations of variables that minimize the normalized mean square error of the predictions. This technique is attractive because it compresses the positive skill of a forecast model into the smallest number of components. The resulting components can then be analyzed much like principal components, including the construction of regression maps for investigating sources of skill. The technique is illustrated with a machine-learning model of week 3–4 predictions of western US wintertime surface temperatures. The technique reveals at least two patterns of large-scale temperature variations that are skillfully predicted. The predictability of these patterns is generally consistent between climate model simulations and observations. The predictability is determined largely by sea surface temperature variations in the Pacific, particularly the region associated with the El Nino-Southern Oscillation. This result is not surprising, but the fact that it emerges naturally from the technique demonstrates that the technique can be helpful in “explaining” the source of predictability in machine-learning models.
Border artificial intelligence (AI)—biometrics-based AI systems used in border control contexts—proliferates as common tools in border securitization projects. Such systems classify some migrants as posing risks like identity fraud, other forms of criminality, or terrorism. From a human rights perspective, using such risk framings for algorithmically facilitated evaluations of migrants’ biometrics systematically calls into question whether these kinds of systems can be built to be trustworthy for migrants. This article provides a thought experiment; we use a bottom-up responsible design lens—the guidance-ethics approach—to evaluate if responsible, trustworthy Border AI might constitute an oxymoron. The proposed European AI Act only limits the use of Border AI systems by classifying such systems as high risk. In parallel with these AI regulatory developments, large-scale civic movements have emerged throughout Europe to ban the use of facial recognition technologies in public spaces to defend EU citizens’ privacy. The fact that such systems remain acceptable for states’ usage to evaluate migrants, we argue, insufficiently protects migrants’ lives. In part, we argue that this is due to regulations and ethical frameworks being top-down and technology driven by focusing more on the safety of AI systems than on the safety of migrants. We conclude that bordering technologies developed from a responsible design angle would entail the development of entirely different technologies. These would refrain from harmful sorting based on biometric identifications but would start from the premise that migration is not a societal problem.
With the rapid development of the national economy, the demand for electricity is also growing. Thermal power generation accounts for the highest proportion of power generation, and coal is the most commonly used combustion material. The massive combustion of coal has led to serious environmental pollution. It is significant to improve energy conversion efficiency and reduce pollutant emissions effectively. In this paper, an extreme learning machine model based on improved Kalman particle swarm optimization (ELM-IKPSO) is proposed to establish the boiler combustion model. The proposed modeling method is applied to the combustion modeling process of a 300 MWe pulverized coal boiler. The simulation results show that compared with the same type of modeling method, ELM-IKPSO can better predict the boiler thermal efficiency and NOx emission concentration and also show better generalization performance. Finally, multi-objective optimization is carried out on the established model, and a set of mutually non-dominated boiler combustion solutions is obtained.
Variational Bayesian inference is an important machine learning tool that finds application from statistics to robotics. The goal is to find an approximate probability density function (PDF) from a chosen family that is in some sense “closest” to the full Bayesian posterior. Closeness is typically defined through the selection of an appropriate loss functional such as the Kullback-Leibler (KL) divergence. In this paper, we explore a new formulation of variational inference by exploiting the fact that (most) PDFs are members of a Bayesian Hilbert space under careful definitions of vector addition, scalar multiplication, and an inner product. We show that, under the right conditions, variational inference based on KL divergence can amount to iterative projection, in the Euclidean sense, of the Bayesian posterior onto a subspace corresponding to the selected approximation family. We work through the details of this general framework for the specific case of the Gaussian approximation family and show the equivalence to another Gaussian variational inference approach. We furthermore discuss the implications for systems that exhibit sparsity, which is handled naturally in Bayesian space, and give an example of a high-dimensional robotic state estimation problem that can be handled as a result. We provide some preliminary examples of how the approach could be applied to non-Gaussian inference and discuss the limitations of the approach in detail to encourage follow-on work along these lines.
This chapter explores the regulatory regimes that may be applied to data science. They could be legally mandated, established by voluntary trade groups, or established for business rationale (e.g., to minimize insurance costs). Perhaps, growing societal norms will become de facto standards. The authors focus on just a few topics and refer the reader to a vast and growing literature on regulation coming from public policy, economic, technology, and legal perspectives.
This chapter summarizes societal concerns surrounding data and automation. It is informed not only by what is being discussed by politicians and journalists, but also by the challenges discussed in previous chapters. Examples include societal concerns over data science’s impact on inequality, the scale of large, data-science-oriented companies and the influence of social networks.
This chapter makes two R&D recommendations to address concerns raised in Chapter 15: increasing focused and transdisciplinary research; and fostering innovation.
This chapter discusses the importance of education and rigor in the definition and use of vocabulary surrounding automation and data. More education helps individuals by enhancing their ability to understand data and data science’s growing impact, and to both contribute to and benefit from the field. A more knowledgeable public and a clear vocabulary for discourse is needed to have better communication and debate.