To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Community detection is one of the most important methodological fields of network science, and one which has attracted a significant amount of attention over the past decades. This area deals with the automated division of a network into fundamental building blocks, with the objective of providing a summary of its large-scale structure. Despite its importance and widespread adoption, there is a noticeable gap between what is arguably the state-of-the-art and the methods which are actually used in practice in a variety of fields. The Elements attempts to address this discrepancy by dividing existing methods according to whether they have a 'descriptive' or an 'inferential' goal. While descriptive methods find patterns in networks based on context-dependent notions of community structure, inferential methods articulate a precise generative model, and attempt to fit it to data. In this way, they are able to provide insights into formation mechanisms and separate structure from noise. This title is also available as open access on Cambridge Core.
Cirrus clouds are key modulators of Earth’s climate. Their dependencies on meteorological and aerosol conditions are among the largest uncertainties in global climate models. This work uses 3 years of satellite and reanalysis data to study the link between cirrus drivers and cloud properties. We use a gradient-boosted machine learning model and a long short-term memory network with an attention layer to predict the ice water content and ice crystal number concentration. The models show that meteorological and aerosol conditions can predict cirrus properties with R2 = 0.49. Feature attributions are calculated with SHapley Additive exPlanations to quantify the link between meteorological and aerosol conditions and cirrus properties. For instance, the minimum concentration of supermicron-sized dust particles required to cause a decrease in ice crystal number concentration predictions is 2 × 10−4 mg/m3. The last 15 hr before the observation predict all cirrus properties.
This work aims to classify catchments through the lens of causal inference and cluster analysis. In particular, it uses causal effects (CEs) of meteorological variables on river discharge while only relying on easily obtainable observational data. The proposed method combines time series causal discovery with CE estimation to develop features for a subsequent clustering step. Several ways to customize and adapt the features to the problem at hand are discussed. In an application example, the method is evaluated on 358 European river catchments. The found clusters are analyzed using the causal mechanisms that drive them and their environmental attributes.
A common observation in the field of pattern recognition for atmospheric phenomena using supervised machine learning is that recognition performance decreases for events with few observed cases, such as extreme weather events. Here, we aimed to mitigate this issue by using numerical simulation and satellite observational data for training. However, as simulation and observational data possess distinct characteristics, we employed neural style transformation learning to transform the simulation data to more closely resemble the observational data. The resulting transformed cloud images of the simulation data were found to possess physical features comparable to those of the observational data. By utilizing the transformed data for training, we successfully improved the classification performance of cloud images of tropical cyclone precursors 7, 5, and 3 days before their formation by 40.5, 90.3, and 41.3%, respectively.
Streamflow predictions are vital for detecting flood and drought events. Such predictions are even more critical to Sub-Saharan African regions that are vulnerable to the increasing frequency and intensity of such events. These regions are sparsely gaged, with few available gaging stations that are often plagued with missing data due to various causes, such as harsh environmental conditions and constrained operational resources. This work presents a novel workflow for predicting streamflow in the presence of missing gage observations. We leverage bias correction of the Group on Earth Observations Global Water and Sustainability Initiative ECMWF streamflow service (GESS) forecasts for missing data imputation and predict future streamflow using the state-of-the-art temporal fusion transformers (TFTs) at 10 river gaging stations in the Benin Republic. We show by simulating missingness in a testing period that GESS forecasts have a significant bias that results in poor imputation performance over the 10 Beninese stations. Our findings suggest that overall bias correction by Elastic Net and Gaussian Process regression achieves superior performance relative to traditional imputation by established methods. We also show that the TFT yields high predictive skill and further provides explanations for predictions through the weights of its attention mechanism. The findings of this work provide a basis for integrating Global streamflow prediction model data and the state-of-the-art machine learning models into operational early-warning decision-making systems in resource-constrained countries vulnerable to drought and flooding due to extreme weather events.
Undirected, binary network data consist of indicators of symmetric relations between pairs of actors. Regression models of such data allow for the estimation of effects of exogenous covariates on the network and for prediction of unobserved data. Ideally, estimators of the regression parameters should account for the inherent dependencies among relations in the network that involve the same actor. To account for such dependencies, researchers have developed a host of latent variable network models; however, estimation of many latent variable network models is computationally onerous and which model is best to base inference upon may not be clear. We propose the probit exchangeable (PX) model for undirected binary network data that is based on an assumption of exchangeability, which is common to many of the latent variable network models in the literature. The PX model can represent the first two moments of any exchangeable network model. We leverage the EM algorithm to obtain an approximate maximum likelihood estimator of the PX model that is extremely computationally efficient. Using simulation studies, we demonstrate the improvement in estimation of regression coefficients of the proposed model over existing latent variable network models. In an analysis of purchases of politically aligned books, we demonstrate political polarization in purchase behavior and show that the proposed estimator significantly reduces runtime relative to estimators of latent variable network models, while maintaining predictive performance.
The predictive methods of tool wear are usually based on different algorithm predictors, different source data, and different sensing devices for remaining useful life (RUL). In general, it has challenges to model and ensure all of the cutting conditions that are suitable in the actual cutting process for sustainable manufacturing. In order to overcome the doing large amount of experimental data and predict different tool RULs, this study combines the analytical force modeling, the back-propagation neural network (BPNN) machine learning, and the current sensor which all are integrated in smart machine box (SMB) to realize the practical RUL prediction for on-line and real-time applications. The analytical model of the cutting force coefficients of shear and ploughing was established from average cutting forces, it could reduce the experimental number and predict the different cutting conditions. In general, the loading current of the cutting tool from a spindle motor is relatively easier acquired than the resultant forces. Thus, the loading currents of the spindle are used to train and predict the cutting forces using the BPNN model during intelligent manufacturing. The SMB architecture mainly performed the autonomous actions based on the edge layer, the fog layer, and the cloud layer via the TCP/IP, the MQTT protocol, and the unified communication library. Results showed that a predictive error for the ends of the tool life is about 3–10% that are based on the calculating of the cumulative current ratio.
The effects of anthropogenic aerosol, solid or liquid particles suspended in the air, are the biggest contributor to uncertainty in current climate perturbations. Heavy industry sites, such as coal power plants and steel manufacturers, large sources of greenhouse gases, also emit large amounts of aerosol in a small area. This makes them ideal places to study aerosol interactions with radiation and clouds. However, existing data sets of heavy industry locations are either not public, or suffer from reporting gaps. Here, we develop a supervised deep learning algorithm to detect unreported industry sites in high-resolution satellite data, using the existing data sets for training. For the pipeline to be viable at global scale, we employ a two-step approach. The first step uses 10 m resolution data, which is scanned for potential industry sites, before using 1.2 m resolution images to confirm or reject detections. On held-out test data, the models perform well, with the lower resolution one reaching up to 94% accuracy. Deployed to a large test region, the first stage model yields many false positive detections. The second stage, higher resolution model shows promising results at filtering these out, while keeping the true positives, improving the precision to 42% overall, so that human review becomes feasible. In the deployment area, we find five new heavy industry sites which were not in the training data. This demonstrates that the approach can be used to complement existing data sets of heavy industry sites.
Climate models are primary tools for investigating processes in the climate system, projecting future changes, and informing decision makers. The latest generation of models provides increasingly complex and realistic representations of the real climate system, while there is also growing awareness that not all models produce equally plausible or independent simulations. Therefore, many recent studies have investigated how models differ from observed climate and how model dependence affects model output similarity, typically drawing on climatological averages over several decades. Here, we show that temperature maps of individual days drawn from datasets never used in training can be robustly identified as “model” or “observation” using the CMIP6 model archive and four observational products. An important exception is a prototype storm-resolving simulation from ICON-Sapphire which cannot be unambiguously assigned to either category. These results highlight that persistent differences between simulated and observed climate emerge at short timescales already, but very high-resolution modeling efforts may be able to overcome some of these shortcomings. Moreover, temporally out-of-sample test days can be assigned their dataset name with up to 83% accuracy. Misclassifications occur mostly between models developed at the same institution, suggesting that effects of shared code, previously documented only for climatological timescales, already emerge at the level of individual days. Our results thus demonstrate that the use of machine learning classifiers, once trained, can overcome the need for several decades of data to evaluate a given model. This opens up new avenues to test model performance and independence on much shorter timescales.
This article fragments and processes Debris, a project developed to formalise the creative recycling of digital audio byproducts. Debris began as an open call for electronic compositions that take as their point of departure gigabytes of audio material generated through training and calibrating Demiurge, an audio synthesis platform driven by machine learning. The Debris project led us down rabbitholes of structural analysis: what does it mean to work with digital waste, how is it qualified, and what new relationships and methodologies do this foment? To chart the fluid boundaries of Debris and pin down its underlying conceptualisation of sound, this article introduces a framework ranging from archaeomusicology to intertextuality, from actor-network theory to Deleuzian assemblage, from Adornian constellation to swarm intelligence to platform and network topology. This diversity of approaches traces connective frictions that may allow us to understand, from the perspective of Debris, what working with sound means under the regime of machine intelligence. How has machine intelligence fundamentally altered the already shaky diagram connecting humans, creativity and history? We advise the reader to approach the text as a multisensory experience, listening to Debris while navigating the circuitous theoretical alleys below.
We generalize the notion of consequence relation standard in abstract treatments of logic to accommodate intuitions of relevance. The guiding idea follows the use criterion, according to which in order for some premises to have some conclusion(s) as consequence(s), the premises must each be used in some way to obtain the conclusion(s). This relevance intuition turns out to require not just a failure of monotonicity, but also a move to considering consequence relations as obtaining between multisets. We motivate and state basic definitions of relevant consequence relations, both in single conclusion (asymmetric) and multiple conclusion (symmetric) settings, as well as derivations and theories, guided by the use intuitions, and prove a number of results indicating that the definitions capture the desired results (at least in many cases).
Uncertainty quantification (UQ) plays a crucial role in data assimilation (DA) since it impacts both the quality of the reconstruction and near-future forecast. However, traditional UQ approaches are often limited in their ability to handle complex datasets and may have a large computational cost. In this paper, we present a new ensemble-based approach to extend the 4DVarNet framework, an end-to-end deep learning scheme backboned on variational DA used to estimate the mean of the state along a given DA window. We use conditional 4DVarNet simulations compliant with the available observations to estimate the 4DVarNet probability density function. Our approach enables to combine both the efficiency of 4DVarNet in terms of computational cost and validation performance with a fast and memory-saving Monte-Carlo based post-processing of the reconstruction, leading to the so-called En4DVarNet estimation of the state pdf. We demonstrate our approach in a case study involving the sea surface height: 4DVarNet is pretrained on an idealized Observation System Simulation Experiment (OSSE), then used on real-world dataset (OSE). The sampling of independent realizations of the state is made among the catalogue of model-based data used during training. To illustrate our approach, we use a nadir altimeter constellation in January 2017 and show how the uncertainties retrieved by combining 4DVarNet with the statistical properties of the training dataset lead to a relevant information providing in most cases a confidence interval compliant with the Cryosat-2 nadir alongtrack dataset kept for validation.
Mobile robots are a key component for the automation of many tasks that either require high precision or are deemed too hazardous for human personnel. One of the typical duties for mobile robots in the industrial sector is to perform trajectory tracking, which involves pursuing a specific path through both space and time. In this paper, an iterative learning-based procedure for highly accurate tracking is proposed. This contribution shows how data-based techniques, namely Gaussian process regression, can be used to tailor a motion model to a specific reoccurring reference. The procedure is capable of explorative behavior meaning that the robot automatically explores states around the prescribed trajectory, enriching the data set for learning and increasing the robustness and practical training accuracy. The trade-off between highly accurate tracking and exploration is done automatically by an optimization-based reference generator using a suitable cost function minimizing the posterior variance of the underlying Gaussian process model. While this study focuses on omnidirectional mobile robots, the scheme can be applied to a wide range of mobile robots. The effectiveness of this approach is validated in meaningful real-world experiments on a custom-built omnidirectional mobile robot where it is shown that explorative behavior can outperform purely exploitative approaches.
Coarse spatial resolution in gridded precipitation datasets, reanalysis, and climate model outputs restricts their ability to characterize the localized extreme rain events and limits the use of the coarse resolution information for local to regional scale climate management strategies. Deep learning models have recently been developed to rapidly downscale the coarse resolution precipitation to the high local scale resolution at a much lower cost than dynamic downscaling. However, these existing super-resolution deep learning modeling studies have not rigorously evaluated the model’s skill in producing fine-scale spatial variability, particularly over topographic features. These current deep-learning models also have difficulty predicting the complex spatial structure of extreme events. Here, we develop a model based on super-resolution deconvolution neural network (SRDN) to downscale the hourly precipitation and evaluate the predictions. We apply three versions of the SRDN model: (a) SRDN (no orography), (b) SRDN-O (orography only at final resolution enhancement), and (c) SRDN-SO (orography at each step of resolution enhancement). We assess the ability of SRDN-based models to reproduce the fine-scale spatial variability and compare it with the previously used deep learning model (DeepSD). All the models are trained and tested using the Conformal Cubic Atmospheric Model (CCAM) data to perform a 100 to 12.5 km of hourly precipitation downscaling over the Australian region. We found that SRDN-based models, including orography, deliver better fine-scale spatial structures of both climatology and extremes, and significantly improved the deep-learning downscaling. The SRDN-SO model performs well both qualitatively and quantitatively in reconstructing the fine-scale spatial variability of climatology and rainfall extremes over complex orographic regions.
In this paper, we propose a novel way of improving named entity recognition (NER) in the Korean language using its language-specific features. While the field of NER has been studied extensively in recent years, the mechanism of efficiently recognizing named entities (NEs) in Korean has hardly been explored. This is because the Korean language has distinct linguistic properties that present challenges for modeling. Therefore, an annotation scheme for Korean corpora by adopting the CoNLL-U format, which decomposes Korean words into morphemes and reduces the ambiguity of NEs in the original segmentation that may contain functional morphemes such as postpositions and particles, is proposed herein. We investigate how the NE tags are best represented in this morpheme-based scheme and implement an algorithm to convert word-based and syllable-based Korean corpora with NEs into the proposed morpheme-based format. Analyses of the results of traditional and neural models reveal that the proposed morpheme-based format is feasible, and the varied performances of the models under the influence of various additional language-specific features are demonstrated. Extrinsic conditions were also considered to observe the variance of the performances of the proposed models, given different types of data, including the original segmentation and different types of tagging formats.
Temporal prepositions trigger various temporal relations over events and times. In this chapter, I categorize such temporal relators into five types: (i) anchoring (at, in), (ii) ordering (before, after),(iii) metric (for, in), (iv) bounding (from -- till), and (v) orienting (time interval + before, after). These temporal relators are analyzed with respect to the tripartite temporal configurations <E,R,T>, where E is a set of eventualities, R is a set of temporal relators, and T is a set of associated temporal structures, which subsume metric structures. Temporal relators combine with temporal expressions to form temporal adjuncts, either simple or complex. Complex temporal adjuncts introduce time intervals as nonconsuming tag in annotation, while relating eventualities to temporal structures. Each temporal relator r in R combines with a temporal structure t in T as its argument to form a temporal adjunct, while relating an eventuality e in E of various aspectual types such as state, process, or transition to an appropriate temporal structure t in T. This chapter clarifies such temporal relations by annotating and interpreting event and temporal base structures and their relations.
In this chapter, I explain how TimeML, a specification language for the annotation of event-associated temporal expressions in language, was normalized as an ISO international standard, known as ISO-TimeML, with some modifications. ISO’s working group developed TimeML into an ISO standard on event-associated temporal annotation by making four modifications of TimeML: (i) abstract specification of the annotation scheme, (ii) adoption of standoff annotation, (iii) merging of two tags, <EVENT/> and <MAKEINSTNACE/>, to a single tag <EVENT/>, and (iv) treating duration (e.g., two hours) as measurement. Following Bunt’s (2010) proposal for the construction of semantic annotation schemes and his subsequent work, I then formalize ISO-TimeML by presenting a metamodel for its design, an abstract syntax for formulating its specification language in set-theoretic terms, and an XML-based concrete syntax. I also analyze base structures as consisting of two substructures, anchoring and content structures, into the annotation structures of the normalized TimeML.
This chapter formulates an annotation-based semantics (ABS) for the annotation and interpretation of temporal and spatial information in language. It consists of two modules, one for representation and another for interpretation. The representation module consists of a type-theoretic first-order logic with a small set of merge operators. The theory of types is based on the extended list of basic types, which treats eventualities and points of time and space as basic types besides the two basic types e for entities and t for truth-values. These types extend the Neo-Davidsonian semantics to all types of objects including paths and vectors (trajectories) triggered by motions. The merge operators in ABS allow the compositional process of combining the semantic representation of base structures into that of the link structures that combine them without depending on complex lambda operations. ABS adopts shallow semantics to represent complex structures of eventuality or quantificaiton with simple logical predicates defined as part of admissible interpretation models.