To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Diagnostic test accuracy studies assess a diagnostic test’s performance against a reference standard. In this review, we explore and compare statistical methods used in meta-analyses of diagnostic test accuracy studies. Specifically, we evaluate two frequentist methods – split component synthesis (SCS) and bivariate model (BM) – alongside two Bayesian approaches: Bayesian hierarchical summary receiver operating characteristic (BHSROC) and Bayesian bivariate model (BBM). We also include their latent class variants (LC-BHSROC and LC-BBM). Using a meta-analysis of various multiplex nucleic acid amplification tests (NAATs/PCRs) against Campylobacter spp. as a case study we illustrate the practical applications of these methods. The reference standard was culture, and due to differences in cut-off values and primers among the NAAT/PCR brands, substantial heterogeneity was anticipated. Our findings reveal that the BM and BBM methods tend to estimate higher sensitivities than the other approaches, even when the number of studies is substantial, and heterogeneity is moderate – as observed in this case study. In such scenario, the SCS method or the BHSROC model may offer more robust and reliable outcomes. While our review is based on a real-life meta-analysis rather than simulations, it offers practical insights into the strengths and limitations of these statistical approaches for diagnostic test accuracy studies.
Herpes zoster (HZ) and its complication, postherpetic neuralgia (PHN), disproportionately affect older adults and impose a considerable disease and economic burden. Although the Korea Disease Control and Prevention Agency recommends vaccination for adults aged ≥50 years, uptake remains limited. This study assessed the epidemiologic and economic impact of expanding vaccine coverage through inclusion of the live-attenuated zoster vaccine (ZVL) and the recombinant zoster vaccine (RZV) in the National Immunization Program (NIP). Using the incidence data from the Health Insurance Review and Assessment Service Open Statistics, along with vaccine effectiveness, coverage rates, and treatment cost estimates adjusted to 2025 values, we projected cases averted and associated cost savings. At the current 10% coverage, ZVL and RZV were estimated to prevent 36269 and 56681 HZ cases and 15988 and 20712 PHN cases, yielding societal cost savings of USD 78.47 million and USD 107.14 million, respectively. Expanding NIP coverage to 70% amplified benefits approximately sevenfold, yielding cost savings of USD 549.34 million (ZVL) and USD 749.98 million (RZV). These results demonstrate the substantial value of zoster vaccination and underscore the need for policy measures to improve vaccine coverage among older adults in South Korea.
In order to mitigate a life annuity provider’s (insurer’s) longevity risk exposure, we propose a general longevity risk transfer policy between the insurer and a reinsurer. The reinsurance premium is calculated according to the expected premium principle. Under an expected utility maximization framework, we apply the variational method to derive the necessary and sufficient condition for the optimal longevity risk transfer policy. We find that the optimal strategy takes the form of excess of age policy, which means that the insurer is only liable for the benefit payment up to the optimal deductible age, and the remaining benefit payment is covered by the reinsurer. Furthermore, we assess the viability of the reinsurer underwriting the optimal longevity risk transfer policy. Numerical examples show that the optimal longevity risk transfer policy can effectively improve the insurer’s relative gains and reduce the insurer’s longevity risk exposure.
Citizen-generated data (CGD) is increasingly embraced as a strategy for filling data gaps to achieve the United Nations Sustainable Development Goals (SDGs), including SDG 5, Gender Equality and the Empowerment of Women and Girls. Existing frameworks to guide the design and use of CGD, however, do not reflect the unique considerations of CGD projects addressing issues of gender inequality. Answering recent calls for study of “data practices,” this article analyzes common CGD principles through findings from an action research project to address gender-based violence at the Colombia–Venezuela border. We suggest that while existing frameworks provide generative pathways forward, several principles around which consensus appears to be emerging are too rigid or insufficiently nuanced to account for the dynamics that many gender-focused CGD projects confront. These include dynamics such as physical security, the role of emotion in shaping project implementation, unequal access to resources, and political will. We suggest that this rigidity truncates the utility of these frameworks for CGD actors navigating highly sensitive issues in risky environments where serious violations of women’s human rights are taking place—and where the generation of gender data is one of several motivating factors for the work being done. This article reads gender into these frameworks to broaden the range of sustainable development issues for which CGD can help catalyze progress.
This work is devoted to the theoretical and numerical derivation of the moving average coefficients for a first-order autoregressive random field $\{X(\mathbf{t}),\, \mathbf{t}\in \mathbb{Z}^{d}\}$ where $d\geq 2$ and $\mathbb{Z}^d$ is the lattice of points with integer coordinates in the d-dimensional Euclidean space. We develop formulations for the autocorrelation function, the forecast recipe and the forecast error. A sufficient condition for causality is also provided, in addition to an algorithm and corresponding Wolfram Mathematica code for the numerical computation of the moving average coefficients.
Recent investigations have argued that there is a simple explicit representation for the Kolmogorov constant c associated with the subcritical Galton–Watson branching process. We exhibit examples showing that although this representation can be valid, it more often is not. Our work is presented in terms of the limiting conditional mean population size $\mu=c^{-1}$. The analogous quantity for the Markov branching process is denoted by $\widehat\mu$. We show that the simple representation put forward for $\widehat\mu$ in fact is an upper bound that is attained only if the offspring-number probability-generating function is quadratic. The conditional mean $\mu$ is the limit of a computable increasing sequence $(\mu_n$). Estimates of n are determined ensuring that, for any small positive number $\varepsilon$, $0\lt\mu-\mu_n\le \varepsilon$.
In the classical model of diffusion limited aggregation (DLA), introduced by Witten and Sander, the process begins with a single-particle cluster placed at the origin of a space. Then, one at a time, particles make a random walk from infinity until they halt by colliding with the existing cluster. We consider an analogous version of this process on large but finite graphs with a designated source and sink vertex. Initially the cluster of halted particles contains a single particle at the sink vertex. Starting one at a time from the source, each particle makes a random walk in the direction of the sink vertex. The particle halts at the last unoccupied vertex before the walk enters the cluster for the first time, thus increasing the size of the cluster. This continues until the source vertex becomes occupied, at which point the process ends. We study this DLA process on several classes of layered graphs, including Cayley trees of branching factor at least two with a sink vertex attached to the leaves. We determine the finish time of the process for the given classes of graphs and show that the subcomponent of the final cluster linking source to sink is essentially a unique path.
Web-enabled large language models (LLMs) frequently answer queries without crediting the web pages they consume, creating an “attribution gap” in responsible artificial intelligence (AI) usage—defined as the difference between relevant URLs read and those actually cited. Drawing on approximately 14,000 real-world LMArena conversation logs with search-enabled LLM systems, we document three exploitation patterns: (1) no search: 34% of Google Gemini and 24% of OpenAI GPT-4o responses are generated without explicitly fetching any online content; (2) no citation: Gemini provides no clickable citation source in 92% of answers; (3) high-volume, low-credit: Perplexity’s Sonar visits approximately 10 relevant pages per query but cites only three to four. A negative binomial hurdle model shows that the average query answered by Gemini or Sonar leaves about three relevant websites uncited, whereas GPT-4o’s tiny uncited gap is best explained by its selective log disclosures rather than by better attribution. Citation efficiency—extra citations provided per additional relevant web page visited—varies widely across models, from 0.19 to 0.45 on identical queries, underscoring that retrieval design, not technical limits, shapes ecosystem impact. To advance auditing and monitoring of AI systems, we recommend a transparent LLM search architecture based on standardized telemetry and full disclosure of search traces and citation logs.
A fieldworker got more involved in research than intended when he contracted a Giardia duodenalis infection shortly after collecting faecal samples from wild Norwegian reindeer. Almost 50% of the reindeer samples showed heavy infections with G. duodenalis assemblage AI. Molecular comparison with the fieldworker’s infection revealed identical sequences at the loci successfully amplified. Although causality is inherently difficult to establish in wildlife-associated infections, the worker’s long history without previous infection, his intense exposure during sampling, absence of alternative known risk factors, and onset of symptoms consistent with exposure indicate that the reindeer samples were the most plausible source. These findings suggest a rare case of well-supported wildlife-associated Giardia transmission.
Ferry quays experience rapid deterioration due to their exposure to harsh maritime environments and operational ferry impacts. Vibration-based structural health monitoring offers a valuable approach to assessing structural integrity and enhancing the understanding of the structural implications of these impacts. However, practical limitations often restrict sensor placement at critical locations. Consequently, virtual sensing techniques become essential for establishing a Digital Twin and estimating the structural response. This study investigates the application of the Gaussian Process Latent Force Model (GPLFM) for virtual sensing through Bayesian inference on the Magerholm ferry quay, combining in-operation experimental data collected during a ferry impact with a detailed physics-based model. The proposed Physics-Encoded Machine Learning model integrates a reduced-order structural model with a data-driven GPLFM representing the unknown impact forces via their modal contributions. This formulation enables joint probabilistic inference of system states and unmeasured responses while accounting for modeling and measurement uncertainties. Results show that the GPLFM provides accurate posterior mean acceleration response estimates at most locations, even under simplifying modeling assumptions such as linear time-invariant behavior during the impact phase. Lower accuracy was observed at locations in the impact zone. A numerical study was conducted to explore an optimal real-world sensor placement strategy using a Backward Sequential Sensor Placement approach. Sensitivity analyses examined the influence of measurement noise, sensor types, incorrectly assumed damping ratios, and sampling frequencies. The results suggest that the GP latent forces can help accommodate modeling and measurement uncertainties, maintaining acceptable estimation accuracy across scenarios.
Consider n points independently sampled from a density p of class $\mathcal{C}^2$ on a smooth compact d-dimensional submanifold $\mathcal{M}$ of $\mathbb{R}^m$, and consider the random walk visiting these points according to a transition kernel K. We study the almost sure uniform convergence of the generator of this process to the diffusive Laplace–Beltrami operator when n tends to infinity, from which we establish the convergence of the random walk to a diffusion process on the manifold. In contrast to known results, our result does not require the kernel K to be continuous, which covers the cases of walks exploring k-nearest neighbor (kNN) and geometric graphs, and convergence rates are given. The distance between the random walk generator and the limiting operator is separated into several terms: a statistical term, related to the law of large numbers, is treated with concentration tools and an approximation term that we control with tools from differential geometry. The case of kNN Laplacians is detailed. The convergence of the stochastic processes having these operators as generators is also studied, by establishing additional tightness results of their distributions on the space of càdlàg functions.
This paper extends the traditional group self-annuitisation framework by explicitly incorporating mortality heterogeneity among participants. Heterogeneity stems from multiple factors that lead individuals to age at different paces, despite being born in the same year. Ageing is modelled as a finite-state continuous-time Markov process where each state represents a distinct phase of physiological deterioration, and transitions capture the stochastic progression towards death. Benefits are differentiated by ageing state and, after issue, they are dynamically adjusted in response to the realised evolution of both ageing and mortality. Our design is novel in its use of the Markov ageing framework within a risk-sharing scheme and in how benefits are updated. Indeed, both benefits and their respective adjustment coefficients are state-specific. Through the explicit modelling of cross-subsidies across states, the design ensures that actuarial equivalence between benefits and available resources is preserved both at the pool level and within each ageing state. However, we find that benefit adjustments based on actuarial equivalence may display undesirable patterns in some ageing classes, when their size shrinks substantially; this happens, in particular, in the younger ageing states, which are likely to empty out. To contrast such effects, we introduce a design preserving a target level of differentiation across states that mitigates the unfavourable impact of a declining size for younger ages. In our analysis, we point out that such a design (which is desirable in many respects) implies solidarity effects across states. Such effects can be identified by comparing benefit amounts under the two assumptions (i.e., benefits adjusted according to actuarial equivalence or so to preserve a predefined level of differentiation). The proposed framework is tested using Australian mortality data.
The distribution of a random initial age (age composition) of an item is crucial for obtaining its remaining lifetime. A random initial age naturally arises when, for example, an item is drawn at random from a population of continuously manufactured and incepted into operation items. We consider heterogeneous populations of items with lifetime distributions indexed by a frailty parameter. We study different stochastic comparisons for the random age and the remaining (residual) lifetime for items from these populations. The ageing properties for the age composition and remaining lifetime are also discussed. Some examples are provided.
Understanding change over time is a critical component of social science. However, data measured over time – time series – requires their own set of statistical and inferential tools. In this book, Suzanna Linn, Matthew Lebo, and Clayton Webb explain the most commonly used time series models and demonstrate their applications using examples. The guide outlines the steps taken to identify a series, make determinations about exogeneity/endogeneity, and make appropriate modelling decisions and inferences. Detailing challenges and explanations of key techniques not covered in most time series textbooks, the authors show how navigating between data and models, deliberately and transparently, allows researchers to clearly explain their statistical analyses to a broad audience.
Hamiltonian Monte Carlo (HMC) is a very popular collection of Markov chain Monte Carlo (MCMC) algorithms. One explanation for the popularity of HMC algorithms is their excellent performance as the dimension d of the target becomes large: theoretical analyses show that popular versions of HMC can have a running time that scales as well as $d^{0.25}$ in good conditions, while even an optimally tuned random-walk metropolis (RWM) algorithm will not do better than d. In this paper, we investigate a different scaling question: does HMC beat RWM for targets with well-separated modes? We find that the answer is often no. Our main tool for answering this question is a novel and simple formula for the conductance of HMC based on Liouville’s theorem, and we also show how this new formula can be used to give very short proofs of results that seem tedious to show with the usual formula. We also use this result to compute the spectral gap of HMC algorithms, for both the classical HMC with isotropic momentum and the recent Riemannian HMC, for multimodal targets. While we focus on the concrete comparison of RWM and HMC, we expect qualitatively similar conclusions to hold for other gradient-based algorithms.
An intricate landscape of bias permeates biomedical research. In this groundbreaking exploration the myriad sources of bias shaping research outcomes, from cognitive biases inherent in researchers to the selection of study subjects and data interpretation, are examined in detail. With a focus on randomized controlled trials, pharmacologic studies, genetic research, animal studies, and pandemic analyses, it illuminates how bias distorts the quest for scientific truth. Historical and contemporary examples vividly illustrate the impact of biases across research domains. Offering insights on recognizing and mitigating bias, this comprehensive work equips scientists and research teams with tools to navigate the complex terrain of biased research practices. A must-read for anyone seeking a deeper understanding of the critical role biases play in shaping the reliability and reproducibility of biomedical research.
We investigate why conservative online news media are often seen as niche, whereas liberal outlets have ideologically broader audiences. We examine two explanatory mechanisms for this asymmetry. The behavioral explanation focuses on differences in homophily, where one ideological camp would be exposed to more cross-cutting content due to more diverse networking preferences. The structural explanation highlights how a platform’s user base places some in the minority, naturally exposing them to more cross-cutting content. We analyze network exposure and sharing of news media content among 420,000 US Twitter users in 2022, prior to Musk’s acquisition of the platform. We find that conservative users, as the minority, were overexposed to cross-cutting media content through their network contacts, while liberal users, as the majority, were underexposed. Consequently, liberal media were shared across party lines, while conservative media were overlooked by liberals and circulated mostly within a tight network of conservative accounts. This apparent paradox suggests that although conservatives primarily engage with their own media, liberal outlets attract a broader audience, including many conservatives. By combining observational data with simulated benchmarks, we find that the structural mechanism plays a primary role in the observed asymmetry, as exposure to liberal content extends farther into conservative online communities.