To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The term moderate deviations is often used in the literature to mean a class of large deviation principles that, in some sense, fills the gap between a convergence in probability of some random variables to a constant, and a weak convergence to a centered Gaussian distribution (when such random variables are properly centered and rescaled). We talk about noncentral moderate deviations when the weak convergence is towards a non-Gaussian distribution. In this paper we prove a noncentral moderate deviation result for the bivariate sequence of sums and maxima of independent and identically distributed random variables bounded from above. We also prove a result where the random variables are not bounded from above, and the maxima are suitably normalized. Finally, we prove a moderate deviation result for sums of partial minima of independent and identically distributed exponential random variables.
Physics-informed neural networks (PINNs), which are a recent development and incorporate physics-based knowledge into neural networks (NNs) in the form of constraints (e.g., displacement and force boundary conditions, and governing equations) or loss function, offer promise for generating digital twins of physical systems and processes. Although recent advances in PINNs have begun to address the challenges of structural health monitoring, significant issues remain unresolved, particularly in modeling the governing physics through partial differential equations (PDEs) under temporally variable loading. This paper investigates potential solutions to these challenges. Specifically, the paper will examine the performance of PINNs enforcing boundary conditions and utilizing sensor data from a limited number of locations within it, demonstrated through three case studies. Case Study 1 assumes a constant uniformly distributed load (UDL) and analyzes several setups of PINNs for four distinct simulated measurement cases obtained from a finite element model. In Case Study 2, the UDL is included as an input variable for the NNs. Results from these two case studies show that the modeling of the structure’s boundary conditions enables the PINNs to approximate the behavior of the structure without requiring satisfaction of the PDEs across the whole domain of the plate. In Case Study (3), we explore the efficacy of PINNs in a setting resembling real-world conditions, wherein the simulated measurment data incorporate deviations from idealized boundary conditions and contain measurement noise. Results illustrate that PINNs can effectively capture the overall physics of the system while managing deviations from idealized assumptions and data noise.
The advent of smart and digital cities is bringing data to the forefront as a critical resource for addressing the multifaceted transitions faced by African cities from rapid urbanization to the climate crisis. However, this commentary highlights the formidable considerations that must be addressed to realize the potential of data-driven urban planning and management. We argue that data should be viewed as a tool, not a panacea, drawing from our experience in modeling and mapping the accessibility of transport systems in Accra and Kumasi, Ghana. We identify five key considerations, including data choice, imperfections, resource intensity, validation, and data market dynamics, and propose three actionable points for progress: local data sharing, centralized repositories, and capacity-building. While our focus is on Kumasi and Accra, the considerations discussed are relevant to cities across the African continent.
This study uses anonymized GPS traces to explore travel patterns within six suburban zones and a central area in Mexico City. The descriptive analysis presented in this paper profiles trips by distance and investigates their distribution within each zone. It examines the prevalence of local trips, walkability, and the availability and spread of entertainment sites within 15-min isochrones accessible by foot, bicycle, transit, and private vehicle. Notably, the central zone boasts diverse entertainment offerings, commendable walkability, and a substantial proportion of short and long trips. It is found that GPS traces are within their home. However, the share of long trips for the inhabitants of central zones is considerably more significant than that for the suburbs. The study highlights suburban zones that could benefit from governmental intervention to enhance transportation and pedestrian conditions. Additionally, it identifies other suburban zones that resemble the central areas in terms of walkability, trip distribution by distances, and the accessibility of entertainment places.
There has been substantial interest in developing Markov chain Monte Carlo algorithms based on piecewise deterministic Markov processes. However, existing algorithms can only be used if the target distribution of interest is differentiable everywhere. The key to adapting these algorithms so that they can sample from densities with discontinuities is to define appropriate dynamics for the process when it hits a discontinuity. We present a simple condition for the transition of the process at a discontinuity which can be used to extend any existing sampler for smooth densities, and give specific choices for this transition which work with popular algorithms such as the bouncy particle sampler, the coordinate sampler, and the zigzag process. Our theoretical results extend and make rigorous arguments that have been presented previously, for instance constructing samplers for continuous densities restricted to a bounded domain, and we present a version of the zigzag process that can work in such a scenario. Our novel approach to deriving the invariant distribution of a piecewise deterministic Markov process with boundaries may be of independent interest.
China faces challenges in meeting the World Health Organization (WHO)’s target of reducing hepatitis B virus (HBV) infections by 95% using 2015 as the baseline. Using Global Burden of Disease (GBD) 2019 data, joinpoint regression models were used to analyse the temporal trends in the crude incidence rates (CIRs) and age-standardized incidence rates (ASIRs) of acute HBV (AHBV) infections in China from 1990 to 2019. The age–period–cohort model was used to estimate the effects of age, period, and birth cohort on AHBV infection risk, while the Bayesian age–period–cohort (BAPC) model was applied to predict the annual number and ASIRs of AHBV infections in China through 2030. The joinpoint regression model revealed that CIRs and ASIRs decreased from 1990 to 2019, with a faster decline occurring among males and females younger than 20 years. According to the age–period–cohort model, age effects showed a steep increase followed by a gradual decline, whereas period effects showed a linear decline, and cohort effects showed a gradual rise followed by a rapid decline. The number of cases of AHBV infections in China was predicted to decline until 2030, but it is unlikely to meet the WHO’s target. These findings provide scientific support and guidance for hepatitis B prevention and control.
Researchers theorize that many real-world networks exhibit community structure where within-community edges are more likely than between-community edges. While numerous methods exist to cluster nodes into different communities, less work has addressed this question: given some network, does it exhibit statistically meaningful community structure? We answer this question in a principled manner by framing it as a statistical hypothesis test in terms of a general and model-agnostic community structure parameter. Leveraging this parameter, we propose a simple and interpretable test statistic used to formulate two separate hypothesis testing frameworks. The first is an asymptotic test against a baseline value of the parameter while the second tests against a baseline model using bootstrap-based thresholds. We prove theoretical properties of these tests and demonstrate how the proposed method yields rich insights into real-world datasets.
Tuberculosis (TB) contact tracing and TB preventive treatment are key tools in preventing the transmission of TB with the aim of eliminating the disease. Our study seeks to demonstrate how the infection spread from an individual patient to the entire community and how proactive contact tracing facilitated prompt diagnosis and treatment. Our work was conducted as a retrospective analysis of the spread of TB infection within the Roma community in the Czech Republic, following the case of an index patient who succumbed to pulmonary TB. Several levels of care and preventive and treatment measures are outlined. Confirming the identity of the Mycobacterium tuberculosis strain was achieved using molecular methods. Among the 39 individuals examined, TB disease was detected in eight patients and TB infection was detected in six patients. The investigation of contacts within this group yielded positive results in 36% of cases, necessitating treatment. The study’s findings provide evidence that actively tracing individuals at risk can lead to early detection of cases, prompt treatment, and prevention of further disease transmission. The study also indicates that the highest risk of infection occurs within the sick person’s household and that young children under the age of 5 are most susceptible to falling ill.
The market for green bonds, and environmentally aligned investment solutions, is increasing. As of 2022, the market of green bonds exceeded USD 2 trillion in issuance, with India, for example, having issued its first-ever sovereign green bonds totally R80bn (c.USD1bn) in January 2023. This paper lays the foundation for future papers and summarises the initial stages of our analysis, where we try to replicate the S&P Green Bond Index (i.e. this is a time series problem) over a period using non-traditional techniques. The models we use include neural networks such as CNNs, LSTMs and GRUs. We extend our analysis and use an open-source decision tree model called XGBoost. For the purposes of this paper, we use 1 day’s prior index information to predict today’s value and repeat this over a period of time. We ignore for example stationarity considerations and extending the input window/output horizon in our analysis, as these will be discussed in future papers. The paper explains the methodology used in our analysis, gives details of general underlying background information to the architecture models (CNNs, LSTMs, GRUs and XGBoost), as well as background to regularisation techniques specifically L2 regularisation, loss curves and hyperparameter optimisation, in particular, the open-source library Optuna.
We propose a new approach to the semiparametric analysis of panel data binary choice models with fixed effects and dynamics (lagged dependent variables). The model under consideration has the same random utility framework as in Honoré and Kyriazidou (2000, Econometrica 68, 839–874). We demonstrate that, with additional serial dependence conditions on the process of deterministic utility and tail restrictions on the error distribution, the (point) identification of the model can proceed in two steps, and requires matching only the value of an index function of explanatory variables over time, rather than the value of each explanatory variable. Our identification method motivates an easily implementable, two-step maximum score (2SMS) procedure – producing estimators whose rates of convergence, in contrast to Honoré and Kyriazidou’s (2000, Econometrica 68, 839–874) methods, are independent of the model dimension. We then analyze the asymptotic properties of the 2SMS procedure and propose bootstrap-based distributional approximations for inference. Evidence from Monte Carlo simulations indicates that our procedure performs satisfactorily in finite samples.
We study a variant of the color-avoiding percolation model introduced by Krause et al., namely we investigate the color-avoiding bond percolation setup on (not necessarily properly) edge-colored Erdős–Rényi random graphs. We say that two vertices are color-avoiding connected in an edge-colored graph if, after the removal of the edges of any color, they are in the same component in the remaining graph. The color-avoiding connected components of an edge-colored graph are maximal sets of vertices such that any two of them are color-avoiding connected. We consider the fraction of vertices contained in color-avoiding connected components of a given size, as well as the fraction of vertices contained in the giant color-avoidin g connected component. It is known that these quantities converge, and the limits can be expressed in terms of probabilities associated to edge-colored branching process trees. We provide explicit formulas for the limit of the fraction of vertices contained in the giant color-avoiding connected component, and we give a simpler asymptotic expression for it in the barely supercritical regime. In addition, in the two-colored case we also provide explicit formulas for the limit of the fraction of vertices contained in color-avoiding connected components of a given size.
Empirical articles vary considerably in how they measure child and adolescent friendship networks. This meta-analysis examines four methodological moderators of children’s and adolescents’ average outdegree centrality in friendship networks: boundary specification, operational definition of friendship, unlimited vs. fixed choice design, and roster vs. free recall design. Specifically, multi-level random effects models were conducted using 261 average outdegree centrality estimates from 71 English-language peer-reviewed articles and 55 unique datasets. There were no significant differences in average outdegree centrality for child and adolescent friendship networks bounded at the classroom, grade, and school-levels. Using a name generator focused on best/close friends yielded significantly lower average outdegree centrality estimates than using a name generator focused on friends. Fixed choice designs with under 10 nominations were associated with significantly lower estimates of average outdegree centrality while fixed choice designs with 10 or more nominations were associated with significantly higher estimates of average outdegree centrality than unlimited choice designs. Free recall designs were associated with significantly lower estimates of average outdegree centrality than roster designs. Results are discussed within the context of their implications for the future measurement of child and adolescent friendship networks.
This book provides statistics instructors and students with complete classroom material for a one- or two-semester course on applied regression and causal inference. It is built around 52 stories, 52 class-participation activities, 52 hands-on computer demonstrations, and 52 discussion problems that allow instructors and students to explore in a fun way the real-world complexity of the subject. The book fosters an engaging 'flipped classroom' environment with a focus on visualization and understanding. The book provides instructors with frameworks for self-study or for structuring the course, along with tips for maintaining student engagement at all levels, and practice exam questions to help guide learning. Designed to accompany the authors' previous textbook Regression and Other Stories, its modular nature and wealth of material allow this book to be adapted to different courses and texts or be used by learners as a hands-on workbook.
This paper demonstrates workflows to incorporate text data into actuarial classification and regression tasks. The main focus is on methods employing transformer-based models. A dataset of car accident descriptions with an average length of 400 words, available in English and German, and a dataset with short property insurance claims descriptions, are used to demonstrate these techniques. The case studies tackle challenges related to a multilingual setting and long input sequences. They also show ways to interpret model output and to assess and improve model performance, by fine-tuning the models to the domain of application or to a specific prediction task. Finally, the paper provides practical approaches to handle classification tasks in situations with no or only few labelled data. The results achieved by using the language-understanding skills of off-the-shelf natural language processing (NLP) models with only minimal pre-processing and fine-tuning clearly demonstrate the power of transfer learning for practical applications.
Population-based structural health monitoring (PBSHM) systems use data from multiple structures to make inferences of health states. An area of PBSHM that has recently been recognized for potential development is the use of multitask learning (MTL) algorithms that differ from traditional single-task learning. This study presents an application of the MTL approach, Joint Feature Selection with LASSO, to provide automatic feature selection. The algorithm is applied to two structural datasets. The first dataset covers a binary classification between the port and starboard side of an aircraft tailplane, for samples from two aircraft of the same model. The second dataset covers normal and damaged conditions for pre- and postrepair of the same aircraft wing. Both case studies demonstrate that the MTL results are interpretable, highlighting features that relate to structural differences by considering the patterns shared between tasks. This is opposed to single-task learning, which improved accuracy at the cost of interpretability and selected features, which failed to generalize in previously unobserved experiments.
This chapter uses history of polling to explain how pollsters have dealt with challenges of nonresponse. It tells the tale of three polling paradigms: large-scale polling, quota sampling, and random sampling. The first two paradigms came crashing down after pollsters made poor predictions for presidential elections. The third paradigm remains vibrant intellectually, but is increasingly difficult to implement. We do not yet know if the bad polling predictions in 2016 and 2020 will push the field to a new paradigm, but certainly they raised doubts about the current state of the field.
This chapter focuses on next-generation selection models that allow us to expand on the Heckman model using copula and control function models that allow one to estimate selection models for a large range of other statistical distributions. This chapter also shows how to generate weights that account for nonignorable nonresponse; not only do these weights increase the weight on demographic groups that respond with lower probabilities, they also increase weights on people with opinions that may make them less inclined to respond. This chapter also shows how to modify a Heckman model to allow for estimation of a nonignorable nonresponse selection model when we have a response-related variable that is available only for people in the survey sample.