To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
We define a random graph obtained by connecting each point of $\mathbb{Z}^d$ independently and uniformly to a fixed number $1 \leq k \leq 2d$ of its nearest neighbors via a directed edge. We call this graph the directed k-neighbor graph. Two natural associated undirected graphs are the undirected and the bidirectional k-neighbor graph, where we connect two vertices by an undirected edge whenever there is a directed edge in the directed k-neighbor graph between the vertices in at least one, respectively precisely two, directions. For these graphs we study the question of percolation, i.e. the existence of an infinite self-avoiding path. Using different kinds of proof techniques for different classes of cases, we show that for $k=1$ even the undirected k-neighbor graph never percolates, while the directed k-neighbor graph percolates whenever $k \geq d+1$, $k \geq 3$, and $d \geq 5$, or $k \geq 4$ and $d=4$. We also show that the undirected 2-neighbor graph percolates for $d=2$, the undirected 3-neighbor graph percolates for $d=3$, and we provide some positive and negative percolation results regarding the bidirectional graph as well. A heuristic argument for high dimensions indicates that this class of models is a natural discrete analogue of the k-nearest-neighbor graphs studied in continuum percolation, and our results support this interpretation.
Participation is a prevalent topic in many areas, and data-driven projects are no exception. While the term generally has positive connotations, ambiguities in participatory approaches between facilitators and participants are often noted. However, how facilitators can handle these ambiguities has been less studied. In this paper, we conduct a systematic literature review of participatory data-driven projects. We analyse 27 cases regarding their openness for participation and where participation most often occurs in the data life cycle. From our analysis, we describe three typical project structures of participatory data-driven projects, combining a focus on labour and resource participation and/or rule- and decision-making participation with the general set-up of the project as participatory-informed or participatory-at-core. From these combinations, different ambiguities arise. We discuss mitigations for these ambiguities through project policies and procedures for each type of project. Mitigating and clarifying ambiguities can support a more transparent and problem-oriented application of participatory processes in data-driven projects.
Spoken term discovery (STD) is challenging when a large volume of spoken content is generated without annotations. Unsupervised approaches resolve this challenge by directly computing pattern matches from the acoustic feature representation of the speech signal. However, this approach produces a lot of false alarms due to inherent speech variabilities, leading to performance degradation in the STD task. To overcome these challenges and improve performance, we propose a two-stage approach. First, we identify an acoustic feature representation that emphasizes spoken content irrespective of the variability challenge. Second, we employ the proposed diagonal pattern search to capture spoken term matches in an unsupervised way without any transcriptions. The proposed approach validated using Microsoft Speech Corpus for Low-Resource languages reveals that an 18% gain in hit ratio and 37% reduction in the false alarm ratio was achieved compared with the state-of-the-art methods.
Leptospirosis in NZ has historically been associated with male workers in livestock industries; however, the disease epidemiology is changing. This study identified risk factors amid these shifts. Participants (95 cases:300 controls) were recruited nationwide between 22 July 2019 and 31 January 2022, and controls were frequency-matched by sex (90% male) and rurality (65% rural). Multivariable logistic regression models, adjusted for sex, rurality, age, and season—with one model additionally including occupational sector—identified risk factors including contact with dairy cattle (aOR 2.5; CI: 1.0–6.0), activities with beef cattle (aOR 3.0; 95% CI: 1.1–8.2), cleaning urine/faeces from yard surfaces (aOR 3.9; 95% CI: 1.5–10.3), uncovered cuts/scratches (aOR 4.6; 95% CI: 1.9–11.7), evidence of rodents (aOR 2.2; 95% CI: 1.0–5.0), and work water supply from multiple sources—especially creeks/streams (aOR 7.8; 95% CI: 1.5–45.1) or roof-collected rainwater (aOR 6.6; 95% CI: 1.4–33.7). When adjusted for occupational sector, risk factors remained significant except for contact with dairy cattle, and slaughter without gloves emerged as a risk (aOR 3.3; 95% CI: 0.9–12.9). This study highlights novel behavioural factors, such as uncovered cuts and inconsistent glove use, alongside environmental risks from rodents and natural water sources.
In this paper, we study discrepancy questions for spanning subgraphs of $k$-uniform hypergraphs. Our main result is that, for any integers $k \ge 3$ and $r \ge 2$, any $r$-colouring of the edges of a $k$-uniform $n$-vertex hypergraph $G$ with minimum $(k-1)$-degree $\delta (G) \ge (1/2+o(1))n$ contains a tight Hamilton cycle with high discrepancy, that is, with at least $n/r+\Omega (n)$ edges of one colour. The minimum degree condition is asymptotically best possible and our theorem also implies a corresponding result for perfect matchings. Our tools combine various structural techniques such as Turán-type problems and hypergraph shadows with probabilistic techniques such as random walks and the nibble method. We also propose several intriguing problems for future research.
We describe an outbreak of Legionnaires’ disease linked to an exclusive cold-water source in a private residential setting in Yorkshire. The cold-water source was identified following microbiological testing of clinical and environmental samples. Legionella pneumophila was only detected in the cold-water system. Three cases were identified over the course of the outbreak: two confirmed and one probable. Conditions favourable to bacterial growth included system ‘dead legs’ and significant heat transfer to the cold-water system. We describe challenges in implementing control measures at the venue and highlight the importance of using enforcement powers, where necessary, to reduce risk.
Random matrix theory is at the intersection of linear algebra, probability theory and integrable systems, and has a wide range of applications in physics, engineering, multivariate statistics and beyond. This volume is based on a Fall 2010 MSRI program which generated the solution of long-standing questions on universalities of Wigner matrices and beta-ensembles and opened new research directions especially in relation to the KPZ universality class of interacting particle systems and low-rank perturbations. The book contains review articles and research contributions on all these topics, in addition to other core aspects of random matrix theory such as integrability and free probability theory. It will give both established and new researchers insights into the most recent advances in the field and the connections among many subfields.
For a spectrally negative Lévy process X, consider $g_t$ and its infinitesimal generator. Moreover, with $t\geq 0$, the last time X is below the level zero before time $\{(g_t,t, X_t), t\geq 0 \}$ the length of a current positive excursion, we derive a general formula that allows us to calculate a functional of the whole path of $U_t\,:\!=\,t-g_t$. We use a perturbation method for Lévy processes to derive an Itô formula for the three-dimensional process $ (U, X)=\{(U_t, X_t),t\geq 0\}$ in terms of the positive and negative excursions of the process X. As a corollary, we find the joint Laplace transform of $(U_{\mathbf{e}_q}, X_{\mathbf{e}_q})$, where $\mathbf{e}_q$ is an independent exponential time, and the q-potential measure of the process (U, X). Furthermore, using the results mentioned above, we find a solution to a general optimal stopping problem depending on (U, X) with an application in corporate bankruptcy. Lastly, we establish a link between the optimal prediction of $g_{\infty}$ and optimal stopping problems in terms of (U, X) as per Baurdoux, E. J. and Pedraza, J. M., $L_p$ optimal prediction of the last zero of a spectrally negative Lévy process, Annals of Applied Probability, 34 (2024), 1350–1402.
In this paper, we introduce a non-homogeneous version of the generalized counting process (GCP). We time-change this process by an independent inverse stable subordinator and derive the system of governing differential–integral equations for the marginal distributions of its increments. We then consider the GCP time-changed by a multistable subordinator and obtain its Lévy measure and the distribution of its first passage times. We discuss an application of a time-changed GCP, namely the time-changed generalized counting process-I (TCGCP-I) in ruin theory. A fractional version of the TCGCP-I is studied, and its long-range dependence property is established.
We consider a single server queue that has a threshold to change its arrival process and service speed by its queue length, which is referred to as a two-level GI/G/1 queue. This model is motivated by an energy saving problem for a single server queue whose arrival process and service speed are controlled. To obtain its performance in tractable form, we study the limit of the stationary distribution of the queue length in this two-level queue under scaling in heavy traffic. Except for a special case, this limit corresponds to its diffusion approximation. It is shown that this limiting distribution is truncated exponential (or uniform if the drift is null) below the threshold level and exponential above it under suitably chosen system parameters and generally distributed interarrival times and workloads brought by customers. This result is proved under a mild limitation on arrival parameters using the so-called basic adjoint relationship (BAR) approach studied in Braverman, Dai, and Miyazawa (2017, 2024) and Miyazawa (2017, 2024). We also intuitively discuss about a diffusion process corresponding to the limit of the stationary distribution under scaling.
The Wright–Fisher model, originating in Wright (1931) is one of the canonical probabilistic models used in mathematical population genetics to study how genetic type frequencies evolve in time. In this paper we bound the rate of convergence of the stationary distribution for a finite population Wright–Fisher Markov chain with parent-independent mutation to the Dirichlet distribution. Our result improves the rate of convergence established in Gan et al. (2017) from $\mathrm{O}(1/\sqrt{N})$ to $\mathrm{O}(1/N)$. The results are derived using Stein’s method, in particular, the prelimit generator comparison method.
This paper investigates structural changes in the parameters of first-order autoregressive (AR) models by analyzing the edge eigenvalues of the precision matrices. Specifically, edge eigenvalues in the precision matrix are observed if and only if there is a structural change in the AR coefficients. We show that these edge eigenvalues correspond to the zeros of a determinantal equation. Additionally, we propose a consistent estimator for detecting outliers within the panel time series framework, supported by numerical experiments.
This paper uses a two-step approach to modelling the probability of a policyholder making an auto insurance claim. We perform clustering via Gaussian mixture models and cluster-specific binary regression models. We use telematics information along with traditional auto insurance information and find that the best model incorporates telematics, without the need for dimension reduction via principal components. We also utilise the probabilistic estimates from the mixture model to account for the uncertainty in the cluster assignments. The clustering process allows for the creation of driving profiles and offers a fairer method for policyholder segmentation than when clustering is not used. By fitting separate regression models to the observations from the respective clusters, we are able to offer differential pricing, which recognises that policyholders have different exposures to risk despite having similar covariate information, such as total miles driven. The approach outlined in this paper offers an explainable and interpretable model that can compete with black box models. Our comparisons are based on a synthesised telematics data set that was emulated from a real insurance data set.
In this paper, we study asymptotic behaviors of a subcritical branching Brownian motion with drift $-\rho$, killed upon exiting $(0, \infty)$, and offspring distribution $\{p_k{:}\; k\ge 0\}$. Let $\widetilde{\zeta}^{-\rho}$ be the extinction time of this subcritical branching killed Brownian motion, $\widetilde{M}_t^{-\rho}$ the maximal position of all the particles alive at time t and $\widetilde{M}^{-\rho}:\!=\max_{t\ge 0}\widetilde{M}_t^{-\rho}$ the all-time maximal position. Let $\mathbb{P}_x$ be the law of this subcritical branching killed Brownian motion when the initial particle is located at $x\in (0,\infty)$. Under the assumption $\sum_{k=1}^\infty k ({\log}\; k) p_k <\infty$, we establish the decay rates of $\mathbb{P}_x(\widetilde{\zeta}^{-\rho}>t)$ and $\mathbb{P}_x(\widetilde{M}^{-\rho}>y)$ as t and y respectively tend to $\infty$. We also establish the decay rate of $\mathbb{P}_x(\widetilde{M}_t^{-\rho}> z(t,\rho))$ as $t\to\infty$, where $z(t,\rho)=\sqrt{t}z-\rho t$ for $\rho\leq 0$ and $z(t,\rho)=z$ for $\rho>0$. As a consequence, we obtain a Yaglom-type limit theorem.
In this paper, we study the asymptotic behavior of the generalized Zagreb indices of the classical Erdős–Rényi (ER) random graph G(n, p), as $n\to\infty$. For any integer $k\ge1$, we first give an expression for the kth-order generalized Zagreb index in terms of the number of star graphs of various sizes in any simple graph. The explicit formulas for the first two moments of the generalized Zagreb indices of an ER random graph are then obtained from this expression. Based on the asymptotic normality of the numbers of star graphs of various sizes, several joint limit laws are established for a finite number of generalized Zagreb indices with a phase transition for p in different regimes. Finally, we provide a necessary and sufficient condition for any single generalized Zagreb index of G(n, p) to be asymptotic normal.
We use the framework of multivariate regular variation to analyse the extremal behaviour of preferential attachment models. To this end, we follow a directed linear preferential attachment model for a random, heavy-tailed number of steps in time and treat the incoming edge count of all existing nodes as a random vector of random length. By combining martingale properties, moment bounds and a Breiman type theorem we show that the resulting quantity is multivariate regularly varying, both as a vector of fixed length formed by the edge counts of a finite number of oldest nodes, and also as a vector of random length viewed in sequence space. A Pólya urn representation allows us to explicitly describe the extremal dependence between the degrees with the help of Dirichlet distributions. As a by-product of our analysis we establish new results for almost sure convergence of the edge counts in sequence space as the number of nodes goes to infinity.
The systemic nature of climate risk is well established, but the extent may be more severe than previously understood, particularly with regard to cyber risk and economic security. Cyber security relies on the availability of insurance capital to mitigate economic security sector risks and support the reversibility of attacks. However, the cyber insurance industry is still in its infancy. Pressure on insurance capital from increasing natural disaster activity could consume the resources necessary for economic security in the cyber domain in the near term and create long-term conditions that increase the scarcity of capital to support cyber security risks. This article makes an original contribution by exploring the under-researched connection between the nexus of cyber and economic security and the climate change threat. Although the immediate pressure on economic resources for cyber security is limited, recent natural disaster activity has clearly shown that access to capital for cyber risks could come under significant pressure in the future.