To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
We study a family of Crump–Mode–Jagers branching processes in a random environment that explode, i.e. that grow infinitely large in finite time with positive probability. Building on recent work of Iyer and the author (‘On the structure of genealogical trees associated with explosive Crump–Mode–Jagers branching processes’, arXiv:2311.14664, 2023), we weaken certain assumptions required to prove that the branching process, at the time of explosion, contains a (unique) individual with infinite offspring. We then apply these results to super-linear preferential attachment models. In particular, we fill gaps in some of the cases analysed in Appendix A of the work of Iyer and the author and study a large range of previously unattainable cases.
Genomic epidemiology was essential for characterizing SARS-CoV-2 transmission during the early COVID-19 pandemic. This systematic review examined how whole-genome sequencing was used in local outbreak investigations published between March 2020 and March 2021. Searches of PubMed, Scopus, and Web of Science identified 32 studies from 18 countries that integrated genomic and epidemiological data for local outbreak investigations. Most studies were conducted in healthcare settings or in high-income countries. A limited number of studies were conducted in low- and middle-income countries, except for China and Vietnam. Illumina or Oxford Nanopore platforms and tiled-amplicon protocols were the most common sequencing methods. Phylogenetic trees were the most common genomic epidemiology analytical approach. Genomic data enabled confirmation of suspected transmission links, detection of multiple introductions, and identification of asymptomatic or presymptomatic transmission. Important enablers of early implementation included open-access genomics databases, standardized protocols (e.g. ARTIC), open-source tools (e.g. Nextstrain), and cross-sector partnerships and funding. Study quality and adherence to common observational study reporting guidelines varied widely. Familiarity with the STROME-ID guidelines for molecular epidemiology studies would have improved overall quality. These findings highlight the utility of genomic epidemiology in outbreak response and support its continued integration into public health surveillance systems.
In this paper we derive identities for the upward and downward exit problems and resolvents for a process whose motion changes between two Lévy processes if it is above (or below) a barrier b and coincides with a Poissonian arrival time. This can be expressed in the form of a (hybrid) stochastic differential equation, for which the existence of its solution is also discussed. All identities are given in terms of new generalisations of scale functions (counterparts of the scale functions from the theory of Lévy processes). To illustrate the applicability of our results, the probability of ruin is obtained for a risk process with delays in the dividend payments.
This study assesses whether a hybrid prediction–optimisation workflow can be used as an exploratory exercise for Brazilian federal budget allocation under severe data constraints. Using executed expenditure by budgetary function (2000–2023; N = 24), a multi-output XGBoost model is estimated to link spending profiles to GDP growth, inflation, and the Gini index; Bayesian optimisation (Tree-structured Parzen Estimator/Optuna) is then applied to search, within explicit bounds and penalties, for allocation vectors that maximise a stated objective function favouring higher growth and lower inflation and inequality. To mitigate data scarcity, the short series is augmented with 1048 synthetic observations generated through controlled noise injection, bootstrapped resampling and variational autoencoder reconstruction. Under randomised K-fold cross-validation on the augmented dataset, the model achieves mean R2 = 0.97 and mean MSE = 0.04, while diagnostics indicate larger errors at extreme values and a persistent training–validation gap. A secondary robustness check uses an anti-leakage design by applying cross-validation to the 24 real observations and generating synthetic data only within each training fold. This yields markedly weaker generalisation for GDP growth and inflation (overall mean MSE = 1.03; overall mean R2 = −0.45), with positive performance remaining only for the Gini index (R2 = 0.60). Under these conditions, the optimisation step identifies a scenario that satisfies the objective function on standardised outputs (GDP growth = 1.15; inflation = −0.04; Gini = −0.17). The results support the use of the workflow to compare scenarios under explicit assumptions, rather than to produce prescriptive budget guidance.
We study the number of triangles $T_n$ in the sparse $\beta$-model on n vertices, a random graph model that captures degree heterogeneity in real-world networks. Using the norms of the heterogeneity parameter vector, we first determine the asymptotic mean and variance of $T_n$. Next, by applying the Malliavin–Stein method, we derive a non-asymptotic upper bound on the Kolmogorov distance between the normalized $T_n$ and the standard normal distribution. Under an additional assumption on degree heterogeneity, we further prove the asymptotic normality for $T_n$ as $n\to\infty$.
Failure extropy, introduced by Nair and Sathar Nair [(2020). On dynamic failure extropy. J. Indian Soc. Probab. Stat. 21: 287–-313], provides a complementary perspective to entropy for quantifying uncertainty in lifetime distributions. However, it becomes mathematically invalid for distributions with unbounded support. To overcome this limitation, Tahmasebi and Toomaj [(2022). On negative cumulative extropy with applications. Commun. Stat. Theory Methods 51(15): 5025-–5047] proposed the concept of negative cumulative extropy (NCEx), offering a bounded and interpretable alternative. In this paper, we extend the notion of NCEx to the bivariate dynamic setting, where uncertainty is assessed for systems whose components have failed at specified times. The proposed formulation effectively captures the uncertainty associated with past lifetimes under dependence, which the existing NCEx cannot address. The measure is further generalized to a vector-valued form, and its fundamental properties are established, including monotonicity, invariance, bounds expressed in terms of the expected inactivity time, and key characterizations. A new stochastic ordering based on the proposed measure is also established. To facilitate practical implementation, a nonparametric estimator is developed and its performance evaluated through extensive Monte Carlo simulations. The practical relevance of the proposed measure is demonstrated using a real dataset, and its superiority over existing entropy-based approaches is shown on an additional dataset.
Well-posedness is established for multi-dimensional mean-field stochastic Volterra equations with Lipschitz-continuous coefficients, allowing for singular kernels as well as for one-dimensional mean-field stochastic Volterra equations with Hölder-continuous diffusion coefficients and sufficiently regular kernels. In these different settings, quantitative, pointwise propagation of chaos results are derived for the associated Volterra-type interacting particle systems.
We describe a prolonged outbreak of Salmonella enterica serotype Poona (S. Poona) sequence type (ST) 308, which comprised 13 cases occurring intermittently in North West England between 2016 and 2021. Whole genome sequencing (WGS) results indicated potential exposure to a single source but a lack of good quality data from routine surveillance questionnaires initially made it challenging to identify the cause. Continuing identification of cases in young children in a small geographical area prompted further public health actions, including trawling interviews which identified that ten cases attended the same nursery. As part of enhanced case finding in this nursery, childcare staff were asked to submit faecal samples. One asymptomatic staff member was positive for S. Poona and had worked at another nursery, attended at the time by the first S. Poona child case in this outbreak. Further investigations revealed that the case had previously undergone a cholecystectomy. We report an outbreak caused by persistent carriage and shedding of S. Poona in an asymptomatic individual working with vulnerable groups, which necessitated introduction of risk management measures similar to that for Typhoidal Salmonella. We also demonstrate the utility of combining epidemiological and WGS data in the public health response to Salmonella outbreaks.
In low-prevalence settings, the epidemiological yield of screening strategies for controlling vancomycin-resistant enterococci (VRE) outbreaks has not been fully established. We retrospectively analysed a prolonged VRE outbreak at a 536-bed tertiary-care hospital in Japan from 2010 to 2021 to evaluate sequential screening strategies across epidemic phases and to identify risk factors for VRE acquisition. Hospital-wide, admission-based, antimicrobial exposure-based, passive, and haemodialysis-targeted screening strategies were implemented over time. Screening yields were compared longitudinally, and a retrospective case–control study was performed using data from the initial hospital-wide screening phase. Molecular epidemiology was assessed by pulsed-field gel electrophoresis (PFGE). In total, 169 VRE-positive patients were identified, including seven infections and 162 asymptomatic carriers. Hospital-wide screening in the early epidemic phase showed the highest positivity rate (0.91%), whereas targeted strategies consistently yielded lower rates (0.09–0.34%). Haemodialysis, specific oral care practices, and prior exposure to carbapenems, glycopeptides, and piperacillin/tazobactam were independently associated with VRE acquisition. PFGE revealed substantial genetic diversity, suggesting sustained nosocomial transmission with repeated introductions. Early broad-based screening may be epidemiologically efficient in the initial phase of VRE outbreaks in low-prevalence settings, followed by adaptive refinement for long-term control.
The COVID-19 pandemic has highlighted limitations in case-based surveillance due to inconsistent testing and reporting. Wastewater-based epidemiology (WBE) has emerged as a complementary surveillance approach for tracking SARS-CoV-2 transmission, capturing both symptomatic and asymptomatic infections. The aim of this study was to evaluate the effectiveness of WBE in estimating the effective reproduction number ($ {R}_t $) of SARS-CoV-2 in Georgia, USA. We used a Generalized Linear Mixed Model (GLMM) to analyse viral concentration data from multiple wastewater treatment plants (WWTPs) collected between 1 June 2022 and 15 December 2022. After controlling for flow rates and site-level heterogeneity, model residuals were transformed into a non-negative incidence-like series used to estimate wastewater-based $ {R}_t $. Wastewater-based $ {R}_t $was compared with case-based $ {R}_t $estimates using Spearman correlation. The two $ {R}_t $ estimates showed concordant temporal patterns across most sites, with stronger correlations in areas with higher case counts (Spearman correlations ranging from 0.39 to 0.84, $ p<0.001 $). Wastewater-based $ {R}_t $ tracked increases and decreases in transmission over similar time scales as case-based estimates, while exhibiting reduced sensitivity to short-term changes in clinical testing and reporting behaviour. These findings suggest that WBE can support estimation of transmission trends and complement traditional case-based surveillance for public health monitoring.
Let $X_1,\ldots, X_n$ be independent integers distributed uniformly on [M], $M\ge 2$. A partition S of [n] into $\nu$ non-empty subsets $S_1,\ldots, S_{\nu}$ is called perfect if all $\nu$ values $\sum_{j\in S_{\alpha}}X_j$ are equal. For a perfect partition to exist, $\sum_j X_j$ has to be divisible by $\nu$. In 2001, for $\nu=2$, Christian Borgs, Jennifer Chayes, and the author proved that, conditioned on $\sum_j X_j$ being even, with high probability a perfect partition exists if $\kappa\;:\!=\; \lim {{n}/{\log M}}>{{1}/{\log 2}}$, and that with high probability no perfect partition exists if $\kappa<{{1}/{\log 2}}$. Responding to a question by George Varghese, we prove that for $\nu\ge 3$ with high probability no perfect partition exists if $\kappa<{{2}/{\log \nu}}$, which is twice as large as the naive threshold $1/\log 3$ for $\nu=3$. We identify the range of $\kappa$ where the expected number of perfect partitions is exponentially high. We show that for $\kappa> {{2(\nu-1)}/{\log[(1-2\nu^{-2})^{-1}]}}$ the total number of perfect partitions is exponentially high with probability $\gtrsim (1+\nu^2)^{-1}$, i.e. below $1/\nu$, the limiting probability that $\sum_j X_j$ is divisible by $\nu$.
This article proposes sequential randomized tests to locate the presence of jumps on the paths of efficient asset prices in a continuous-time model. The randomized statistics are generated by artificially adding randomness to the robust approximations of the locally averaged returns of the efficient price. In the case of finite activity jumps, we derive the asymptotic distribution of the maximum of all the local statistics unaffected by jumps, which makes it feasible to control the limiting probability of the global type I error and demonstrate the power of the test. We also present the theoretical results to illustrate the behaviors of the test statistics in the presence of infinite activity jumps. Simulation studies indicate the favorable performance of the proposed test in finite samples, and we also apply the test to the stock price data of Apple and Microsoft.
Over the past years, the concept of open research data (ORD) has gained traction as part of broader Open Science initiatives. The benefits of ORD, such as increased cost-effectiveness, transparency, and visibility, are well documented. However, researchers face barriers, which may be perceived rather than real, hindering the adoption of ORD practices. To address this challenge, we propose using ORD support services as sustainable enablers to stimulate cultural change around ORD. We engaged stakeholders across the University of Zurich and the Swiss ORD community, differentiating between researchers and ORD experts, to identify which services would best serve as sustainable enablers. After defining ORD support services and categorizing them into six key areas, we conducted surveys and interviews to gather insights on service preferences and barriers to ORD adoption. Among researchers, we identified a trend toward simpler and lower-resource services, highlighting the need for user-friendly and easily accessible support. ORD experts emphasized the importance of professional data stewardship, robust research data management (RDM) practices, and customized support to address discipline-specific needs. By combining survey and interview results, we provide a detailed overview of stakeholders’ ideas and suggestions for each proposed support area. Our study results in recommendations for academic institutions aiming to stimulate a cultural shift toward ORD. By focusing on findable, accessible, and user-friendly services, equipping researchers with fundamental RDM skills, and professionalizing data stewardship to provide customized support, institutions can foster the adoption of ORD practices. Ultimately, these measures can enhance the impact and reproducibility of scientific research.
Marco Lippi was born in Rome in 1943. An indefatigable and inspiring pedagogue, he has been teaching mathematics, economics, the history of economic thought, and econometrics to generations of students at the Universities of Perugia, Rome (La Sapienza, Tor Vergata, and LUISS), Modena, the Scuola Superiore Sant’Anna in Pisa, and the European Center for Advanced Research in Economics and Statistics (ECARES) in Brussels. As a fellow of the Einaudi Institute for Economics and Finance (EIEF), he still teaches, with the indomitable enthusiasm that has become legendary among his students and colleagues, Master and Ph.D. courses offered by this renowned Roman institution.
Influenza increases the risk of secondary diseases, but other than pneumonia, many of these diseases (e.g., sinusitis, otitis media, acute myocardial infarctions) are not consistently considered in estimates of influenza burden. We used the Merative Marketscan database (2001–2019) and time-series methods to identify age-specific categories of diseases that were temporally associated with patterns of influenza activity. Next, we estimated hypothetical reductions in the incidence and costs of these diseases if influenza incidence were reduced. Of 282 different disease categories evaluated, 23 (8.2%) were strongly associated with influenza (e.g., acute bronchitis, otitis media, myocardial infarctions, sinusitis, COPD) in at least one age group. For example, we estimated a 20% decrease in peak influenza incidence could decrease acute bronchitis cases by 6.5% and pneumonia cases by 5.3%, corresponding to a $1.6 billion reduction in healthcare costs. Excluding secondary diseases associated with influenza may lead to substantial underestimates of influenza’s burden and costs.
We study the asymptotic properties, in the weak sense, of regenerative processes and Markov renewal processes. For the latter, we derive both renewal-type results, also concerning the related counting process, and ergodic-type results, including the so-called $\varphi$-mixing property. This theoretical framework permits us to study the weak limit of the integral of a semi-Markov process, which can be interpreted as the position of a particle moving with finite velocities, taken for a random time according to the Markov renewal process underlying the semi-Markov one. Under mild conditions, we obtain the weak convergence to scaled Brownian motion. As a particular case, this result establishes the weak convergence of the classical generalized telegraph process.
This study assessed whether systematically using finetype data in national surveillance of invasive meningococcal disease serogroup B (IMD-B) in the Netherlands could improve cluster detection in order to prevent further cases through public health actions. We analysed 2005–2023 data, including 1,642 IMD-B cases with complete finetype and municipality information (95%; N = 1729). Using a generalized linear model, we calculated expected baselines for each finetype, including temporal trends. Using SaTScan™, we applied Poisson scan-statistics with a 365-day window to identify spatiotemporal clusters, comparing results to epidemiological and core-genome multi-locus sequence typing (cgMLST) data. Of 453 finetypes, 308 (68%) occurred once; diversity was high (Gini-Simpson index 0.96). We identified 42 spatiotemporal clusters across 37 finetypes, comprising 132 cases (8%), with a median cluster size of two (range 2–21) and duration of 45 days (range 6–356). Between zero and five clusters were detected yearly. Among 18 cases with known epidemiological links, 14 (78%) were within detected spatiotemporal clusters. CgMLST data from eight clusters supported some clusters but rejected others. Systematic cluster detection using finetype could reveal missed epidemiological links, potentially enabling public health action. However, its impact in preventing additional IMD-B cases is likely limited due to small cluster sizes, though meaningful given the severity of IMD-B. Simple finetype mapping may provide a resource-efficient alternative to SaTScan™.
Construction safety inspections typically involve a human inspector identifying safety concerns on-site. With the rise of powerful vision language models (VLMs), researchers are exploring their use for tasks such as detecting safety rule violations from on-site images. However, there is a lack of open datasets to comprehensively evaluate and further fine-tune VLMs in construction safety inspection. Current applications of VLMs use small, supervised datasets, limiting their applicability in tasks they are not directly trained for. In this article, we propose the ConstructionSite 10 k, featuring 10,000 construction site images with annotations for three inter-connected tasks, including image captioning, safety rule violation visual question answering (VQA), and construction element visual grounding. Our subsequent evaluation of current state-of-the-art large pre-trained VLMs shows notable generalization abilities in zero-shot and few-shot settings, while additional training is needed to make them applicable to actual construction sites. This dataset allows researchers to train and evaluate their own VLMs with new architectures and techniques, providing a valuable benchmark for construction safety inspection.
Global mortality rates continue to decline, and life expectancy continues its upward trend. Besides mortality levels, policymakers and providers of financial and health services would also be interested in disability prevalence and its potential future trajectories. The length of time in good health versus the duration with major disabilities or long-term illnesses has significant financial implications for both individuals and society. In this paper, we develop Bayesian common factor models to analyse Australian age- and sex-specific disability prevalence rates. In particular, there are one or more common factors shared by both sexes, as well as specific factors for each sex. Retirement villages are purpose-built residential complexes designed for relatively healthy retirees to live as neighbours and share a communal lifestyle. We apply the model forecasts and simulations to valuate a typical retirement village contract. The cost of this accommodation service is determined by the resident’s total length of stay, which can be estimated using forecasted and simulated disability prevalence rates and mortality rates from our proposed models.