To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This chapter introduces continuous random variables which enable us to model uncertain continuous quantities. We again begin with a formal definition, but quickly move on to describe how to manipulate continuous random variables in practice. We define the cumulative distribution function and quantiles (including the median) and explain how to estimate them from data. We then introduce the concept of probability density and describe its main properties. We present two approaches to obtain nonparametric models of probability densities from data: The histogram and kernel density estimation. Next, we define two celebrated continuous parametric distributions – the exponential and the Gaussian – and show how to fit them to data using maximum-likelihood estimation. We use these distributions to model the interarrival time of calls at a call center, and height in a population, respectively. Finally, we discuss how to simulate continuous random variables via inverse transform sampling.
This chapter examines the related objectives of defining spatial clusters and delineating spatial boundaries in discontinuous data. The former often proceeds by grouping together adjacent locations when they have the most similar characteristics; the latter proceeds by estimating boundaries between locations that are most different. For this, there are several methods available that suggest ’boundary elements’ as possible components of a final division or complete boundary, depending on the kind of data (e.g. binary versus qualitative versus continuous quantitative) and the arrangement of the measured locations (e.g. regular lattice versus irregular spatial network). Once boundaries have been established, statistics are available to evaluate them, including boundary overlap measures. Clusters and boundaries represent two aspects of the same phenomenon, with the same challenge of formalizing similarity and difference in continuous spatial data.
The presence of autocorrelation in data violates the usual assumption of independence in the data for evaluating inferential statistics. We describe several models of autocorrelation in spatial data (both positive and negative). Given two serial variables, x and y, autocorrelation observed in y can be due to inherent autoregression in the variable itself, autoregression induced by its dependence on x, which has its own autocorrelation, or doubly autoregressive, with autocorrelation in both variables. This effect can be addressed by estimating the effective sample size (number of independent observations equivalent in information content to the n that are autocorrelated). We present the calculation of the effective sample size for many inferential statistics, including correlation, partial correlation, t-tests and ANOVA. The use of restricted randomization is explained as a method for testing when other approaches are not available. We also provide recommendations for sampling and experimental design in the presence of spatial autocorrelation.
Quantifying the relationships between variables is affected by the spatial structure in which they occur and the scales of the processes that affect them. First, this chapter covers the topics of spatial regression, spatial causal inference and the Mantel and partial Mantel statistics. These are all methods designed to assess the relationships between variables of interest within a spatial structure. Then, multiscale analysis is presented because it is key to understanding how ecological processes and patterns change with the scale of observation. Indeed, multiscale analysis has become increasingly important as ecologists address studies at larger and larger scales with increasing probability of significant spatial heterogeneity. We describe several approaches, including multiscale ordination (MSO), Morán’s eigenvector maps (MEMs) and wavelet decomposition.
This chapter introduces probability. We begin with an informal definition which enables us to build intuition about the properties of probability. Then, we present a more rigorous definition, based on the mathematical framework of probability spaces. Next, we describe conditional probability, a concept that makes it possible to update probabilities when additional information is revealed. In our first encounter with statistics, we explain how to estimate probabilities and conditional probabilities from data, as illustrated by an analysis of votes in the United States Congress. Building upon the concept of conditional probability, we define independence and conditional independence, which are critical concepts in probabilistic modeling. The chapter ends with a surprising twist: In practice, probabilities are often impossible to compute analytically! Fortunately, the Monte Carlo method provides a pragmatic solution to this challenge, allowing us to approximate probabilities very accurately using computer simulations. We apply w 3 × 3 basketball tournament from the 2020 Tokyo Olympics.
This first chapter sets the context for the topics covered throughout the book by introducing the relationship between ecological processes and spatial structure, and by clarifying terminology related to both. These processes and spatial analysis methods are classified by several criteria, including static versus dynamic data and one versus several species. The concept of scale is applied to spatial, temporal and organizational contexts. The chapter provides a discussion regarding the background and motivation for spatial analysis in ecological research.
Case-control studies can provide attribution estimates of the likely sources of zoonotic pathogens. We applied a meta-analytical model within a Bayesian estimation framework to pool population attributable fractions (PAFs) from European case-control studies of sporadic campylobacteriosis and salmonellosis. The input data were obtained from two existing systematic reviews, supplemented with additional literature searches, covering the period 2000–2021. In total, 12 studies on Campylobacter providing data for 180 PAFs referring to 5983 cases and 13213 controls, and five studies on Salmonella providing data for 75 PAFs referring to 2908 cases and 5913 controls, were included. All these studies were conducted in Western or Northern European countries. Both pathogens were estimated as being predominantly linked to food- and waterborne transmission, which explained nearly half of the cases, with Campylobacter being mainly attributable to poultry (meat), and Salmonella to poultry (eggs and meat) and pig (meat), as specific foodborne exposures. When also considering contact with animals, around 60% of cases could be explained by the larger group of zoonotic transmission pathways. While environmental transmission was also sizeable (around 10%), about a quarter of cases could be explained by factors such as travel, underlying diseases/medicine use, person-to-person transmission and occupational exposure.
Thermal integrity profiling (TIP) is a nondestructive testing technique that takes advantage of the concrete heat of hydration (HoH) to detect inclusions during the casting process. This method is becoming more popular due to its ease of application, as it can be used to predict defects in most concrete foundation structures requiring only the monitoring of temperatures. Despite its advantages, challenges remain with regard to data interpretation and analysis, as temperature is only known at discrete points within a given cross-section. This study introduces a novel method for the interpretation of TIP readings using neural networks. Training data are obtained through numerical finite element simulation spanning an extensive range of soil, concrete, and geometrical parameters. The developed algorithm first classifies concrete piles, establishing the presence or absence of defects. This is followed by a regression algorithm that predicts the defect size and its location within the cross-section. In addition, the regression model provides reliable estimates for the reinforcement cage misalignment and concrete hydration parameters. To make these predictions, the proposed methodology only requires temperature data in the form standard in TIP, so it can be seamlessly incorporated within the TIP workflows. This work demonstrates the applicability and robustness of machine learning algorithms in enhancing nondestructive TIP testing of concrete foundations, thereby improving the safety and efficiency of civil engineering projects.
The Hawkes process is a popular candidate for researchers to model phenomena that exhibit a self-exciting nature. The classical Hawkes process assumes the excitation kernel takes an exponential form, thus suggesting that the peak excitation effect of an event is immediate and the excitation effect decays towards 0 exponentially. While the assumption of an exponential kernel makes it convenient for studying the asymptotic properties of the Hawkes process, it can be restrictive and unrealistic for modelling purposes. A variation on the classical Hawkes process is proposed where the exponential assumption on the kernel is replaced by integrability and smoothness type conditions. However, it is substantially more difficult to conduct asymptotic analysis under this setup since the intensity process is non-Markovian when the excitation kernel is non-exponential, rendering techniques for studying the asymptotics of Markov processes inappropriate. By considering the Hawkes process with a general excitation kernel as a stationary Poisson cluster process, the intensity process is shown to be ergodic. Furthermore, a parametric setup is considered, under which, by utilising the recently established ergodic property of the intensity process, consistency of the maximum likelihood estimator is demonstrated.
The assessment of soil–structure interaction (SSI) under dynamic loading conditions remains a challenging task due to the complexities of modeling this system and the interplay of SSI effects, which is also characterized by uncertainties across varying loading scenarios. This field of research encompasses a wide range of engineering structures, including underground tunnels. In this study, a surrogate model based on a regression ensemble model has been developed for real-time assessment of underground tunnels under dynamic loads. The surrogate model utilizes synthetic data generated using Latin hypercube sampling, significantly reducing the required dataset size while maintaining accuracy. The synthetic dataset is constructed using an accurate numerical model that integrates the two-and-a-half-dimensional singular boundary method for modeling wave propagation in the soil with the finite element method for structural modeling. This hybrid approach allows for a precise representation of the dynamic interaction between tunnels and the surrounding soil. The validation and optimization algorithms are evaluated for two problems: underground railway tunnels with circular and rectangular cross-sections, both embedded in a homogenous full-space medium. Both geometrical and material characteristics of the underground tunnel are incorporated into the optimization process. The optimization target is to minimize elastic wave propagation in the surrounding soil. The results demonstrate that the proposed optimization framework, which combines the Bayesian optimization algorithm with surrogate models, effectively explores trade-offs among multiple design parameters. This enables the design of underground railway tunnels that achieve an optimal balance between elastic wave propagation performance, material properties, and geometric constraints.
Distribution channels such as bancassurance, brokers, agents, direct online sales, and insurance aggregators have been key to ensuring premium growth for both life and non-life insurers. However in recent years, an emerging channel known as embedded insurance has started to provide insurers with a brand-new growth driver. In this paper, we first present an introduction to embedded insurance – what it is and how it will shape insurance distribution in the industry. We then introduce a framework to classify embedded insurance recommendation system. Finally, we propose a novel insurance recommendation system using supervised learning algorithms that can be applied to e-commerce platforms. This needs-based collaborative filtering technique recommends one of three insurance products that would be most appropriate for each buyer on the Olist e-commerce platform based on order-level data. Our work is relevant for actuaries in this field interested in the pricing of embedded insurance risk as well as insurers seeking to improve insurance penetration on such platforms.
Monkeypox (mpox) has re-emerged as global public health concern including in several non-endemic countries. This study aims to characterize monkeypox virus (MPXV) genomes in Indonesia, to explore viral evolution and transmission. Genomic analysis was conducted on 53 isolates from Indonesian mpox patients between 2023 and 2024. All sequences belonged to Clade IIb, with identified sub-clades including A.1.1, B.1, B.1.3, and C.1 – of which C.1 became dominant during this period. Out of 87 mpox-confirmed cases, 60.9% (53/87) were successfully sequenced and submitted to the Global Initiative on Sharing All Influenza Data (GISAID). The majority of cases in Indonesia occurred among males (95.4%), men who have sex with men (59.8%), and people living with HIV/AIDS (71.3%). Notably, a large portion of cases had no travel history, suggesting local transmission. Initially, only clade IIb (B.1) was detected in October 2022. By August 2023, lineage diversity had increased, with B.1.3 and C.1 emerging as the predominant sub-clades. A time–calibrated phylogenetic tree revealed genetic relatedness and shared ancestry within clade IIb. Integrating genomic and epidemiological data offers valuable insights to improve mpox surveillance and public health response in Indonesia and the broader region
The coupling of the disruptive processes of digitalization and the green transformation in a so-called “Twin Transformation” is already being considered a strategic step within the European Union and is discussed in the academic sphere. Strategically, this coupling is necessary and meaningful to realize synergies and to avoid counterproductive effects, such as rebound effects or lock-in effects, particularly given the time constraints imposed by climate change. The European data strategy not only calls for the establishment of various data spaces, such as the data space for the European Green New Deal, but also calls for the opening, integration, and utilization of European data for stakeholders from administration, business, and civil society. Considering this, it is argued that administrative informatics as a discipline could be integrated as an additional analytical perspective into the political science heuristic of the policy cycle. This integration offers substantial added value for analyzing and shaping the policy processes of the European Green transformation. Moreover, this heuristic approach enables the ex-ante prediction of changes in policymaking based on the theories, models, methods, and application areas of administrative informatics. Building on this premise, this article provides insights into the application of the proposed heuristic using the example of the European Green transformation. It analyzes the resulting implications for the analysis of policymaking considering an increasingly digitalized public administration.
We initiate a study of large deviations for block model random graphs in the dense regime. Following [14], we establish an LDP for dense block models, viewed as random graphons. As an application of our result, we study upper tail large deviations for homomorphism densities of regular graphs. We identify the existence of a ‘symmetric’ phase, where the graph, conditioned on the rare event, looks like a block model with the same block sizes as the generating graphon. In specific examples, we also identify the existence of a ‘symmetry breaking’ regime, where the conditional structure is not a block model with compatible dimensions. This identifies a ‘reentrant phase transition’ phenomenon for this problem – analogous to one established for Erdős–Rényi random graphs [13, 14]. Finally, extending the analysis of [34], we identify the precise boundary between the symmetry and symmetry breaking regimes for homomorphism densities of regular graphs and the operator norm on Erdős–Rényi bipartite graphs.
Dengue, the most prevalent urban arbovirus in the world, has triggered recurrent epidemics in Rio de Janeiro, Brazil, since the 1980s. This study aimed to describe the spatial–temporal patterns of dengue spread during the epidemic years of 2002, 2008, 2011, 2012, 2013, and 2024 in Rio de Janeiro. This is an ecological study using secondary data on notified confirmed dengue cases aggregated by neighbourhood. The incidence rates were estimated via the local empirical Bayes method. The local spatial autocorrelation indicators assessed incidence clusters, and the monthly geographic trajectory was outlined for each year. The results revealed changes in the spatial distribution of dengue over time, with clusters of high incidences predominating in the northern and central neighbourhoods in 2002 and 2008, and in the western zone in 2011, 2012, and 2013. In 2024, the distribution was predominant throughout the city, with emphasis in the central and western zones. The monthly geographic centre of dengue cases shifted from the west to the north during the peak of the epidemic. These results highlight the heterogeneous nature of dengue transmission in Rio de Janeiro. The incorporation of spatial and temporal analyses in epidemiological studies can enhance targeted and localized dengue control strategies.