To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Most community detection methods focus on clustering actors with common features in a network. However, clustering edges offers a more intuitive way to understand the network structure in many real-life applications. Among the existing methods for network edge clustering, the majority are algorithmic, with the exception of the latent space edge clustering (LSEC) model proposed by Sewell (Journal of Computational and Graphical Statistics, 30(2), 390–405, 2021). LSEC was shown to have good performance in simulation and real-life data analysis, but fitting this model requires prior knowledge of the number of clusters and latent dimensions, which are often unknown to researchers. Within a Bayesian framework, we propose an extension to the LSEC model using a sparse finite mixture prior that supports automated selection of the number of clusters. We refer to our proposed approach as the automated LSEC or aLSEC. We develop a variational Bayes generalized expectation-maximization approach and a Hamiltonian Monte Carlo-within Gibbs algorithm for estimation. Our simulation study showed that aLSEC reduced run time by 10 to over 100 times compared to LSEC. Like LSEC, aLSEC maintains a computational cost that grows linearly with the number of actors in a network, making it scalable to large sparse networks. We developed the R package aLSEC which implements the proposed methodology.
We consider linear preferential attachment trees with additive fitness, where fitness is the random initial vertex attractiveness. We show that when the fitnesses are independent and identically distributed and have positive bounded support, the local weak limit can be constructed using a sequence of mixed Poisson point processes. We also provide a rate of convergence for the total variation distance between the r-neighbourhoods of a uniformly chosen vertex in the preferential attachment tree and the root vertex of the local weak limit. The proof uses a Pólya urn representation of the model, for which we give new estimates for the beta and product beta variables in its construction. As applications, we obtain limiting results and convergence rates for the degrees of the uniformly chosen vertex and its ancestors, where the latter are the vertices that are on the path between the uniformly chosen vertex and the initial vertex.
We study the asymptotic growth rate of the labels of high-degree vertices in weighted recursive graphs (WRGs) when the weights are independent, identically distributed, almost surely bounded random variables, and as a result confirm a conjecture by Lodewijks and Ortgiese (‘The maximal degree in random recursive graphs with random weights’, preprint, 2020). WRGs are a generalisation of the random recursive tree and directed acyclic graph models, in which vertices are assigned vertex-weights and where new vertices attach to $m\in\mathbb{N}$ predecessors, each selected independently with a probability proportional to the vertex-weight of the predecessor. Prior work established the asymptotic growth rate of the maximum degree of the WRG model, and here we show that there exists a critical exponent $\mu_m$ such that the typical label size of the maximum-degree vertex equals $n^{\mu_m(1+o(1))}$ almost surely as n, the size of the graph, tends to infinity. These results extend results on the asymptotic behaviour of the location of the maximum degree, formerly only known for the random recursive tree model, to the more general weighted multigraph case of the WRG model. Moreover, for the weighted recursive tree model, that is, the WRG model with $m=1$, we prove the joint convergence of the rescaled degree and label of high-degree vertices under additional assumptions on the vertex-weight distribution, and also extend results on the growth rate of the maximum degree obtained by Eslava, Lodewijks, and Ortgiese (Stoch. Process. Appl.158, 2023).
We consider Gaussian approximation in a variant of the classical Johnson–Mehl birth–growth model with random growth speed. Seeds appear randomly in $\mathbb{R}^d$ at random times and start growing instantaneously in all directions with a random speed. The locations, birth times, and growth speeds of the seeds are given by a Poisson process. Under suitable conditions on the random growth speed, the time distribution, and a weight function $h\;:\;\mathbb{R}^d \times [0,\infty) \to [0,\infty)$, we prove a Gaussian convergence of the sum of the weights at the exposed points, which are those seeds in the model that are not covered at the time of their birth. Such models have previously been considered, albeit with fixed growth speed. Moreover, using recent results on stabilization regions, we provide non-asymptotic bounds on the distance between the normalized sum of weights and a standard Gaussian random variable in the Wasserstein and Kolmogorov metrics.
In this chapter, we look at the moments of a random variable. Specifically we demonstrate that moments capture useful information about the tail of a random variable while often being simpler to compute or at least bound. Several well-known inequalities quantify this intuition. Although they are straightforward to derive, such inequalities are surprisingly powerful. Through a range of applications, we illustrate the utility of controlling the tail of a random variable, typically by allowing one to dismiss certain “bad events” as rare. We begin by recalling the classical Markov and Chebyshev’s inequalities. Then we discuss three of the most fundamental tools in discrete probability and probabilistic combinatorics. First, we derive the complementary first and second moment methods, and give several standard applications, especially to threshold phenomena in random graphs and percolation. Then we develop the Chernoff–Cramer method, which relies on the “exponential moment” and is the building block for large deviations bounds. Two key applications in data science are briefly introduced: sparse recovery and empirical risk minimization.
In this chapter, we move on to coupling, another probabilistic technique with a wide range of applications (far beyond discrete stochastic processes). The idea behind the coupling method is deceptively simple: to compare two probability measures, it is sometimes useful to construct a joint probability space with the corresponding marginals. We begin by defining coupling formally and deriving its connection to the total variation distance through the coupling inequality. We illustrate the basic idea on a classical Poisson approximation result, which we apply to the degree sequence of an Erdos–Renyi graph. Then we introduce the concept of stochastic domination and some related correlation inequalities. We develop a key application in percolation theory. Coupling of Markov chains is the next topic, where it serves as a powerful tool to derive mixing time bounds. Finally, we end with the Chen–Stein method for Poisson approximation, a technique that applies in particular in some natural settings with dependent variables.
In this chapter, we develop spectral techniques. We highlight some applications to Markov chain mixing and network analysis. The main tools are the spectral theorem and the variational characterization of eigenvalues, which we review together with some related results. We also give a brief introduction to spectral graph theory and detail an application to community recovery. Then we apply the spectral theorem to reversible Markov chains. In particular we define the spectral gap and establish its close relationship to the mixing time. We also show in that the spectral gap can be bounded using certain isoperimetric properties of the underlying network. We prove Cheeger’s inequality, which quantifies this relationship, and introduce expander graphs, an important family of graphs with good “expansion.” Applications to mixing times are also discussed. One specific technique is the “canonical paths method,” which bounds the spectral graph by formalizing a notion of congestion in the network.
Little information exists concerning the spatial relationship between invasive meningococcal disease (IMD) cases and Neisseria meningitidis (N. meningitidis) carriage. The aim of this study was to examine whether there is a relationship between IMD and asymptomatic oropharyngeal carriage of meningococci by spatial analysis to identify the distribution and patterns of cases and carriage in South Australia (SA). Carriage data geocoded to participants’ residential addresses and meningococcal case notifications using Postal Area (POA) centroids were used to analyse spatial distribution by disease- and non-disease-associated genogroups, as well as overall from 2017 to 2020. The majority of IMD cases were genogroup B with the overall highest incidence of cases reported in infants, young children, and adolescents. We found no clear spatial association between N. meningitidis carriage and IMD cases. However, analyses using carriage and case genogroups showed differences in the spatial distribution between metropolitan and regional areas. Regional areas had a higher rate of IMD cases and carriage prevalence. While no clear relationship between cases and carriage was evident in the spatial analysis, the higher rates of both carriage and disease in regional areas highlight the need to maintain high vaccine coverage outside of the well-resourced metropolitan area.
In this chapter, we describe a few discrete probability models to which we will come back repeatedly throughout the book. While there exists a vast array of well-studied random combinatorial structures (permutations, partitions, urn models, Boolean functions, polytopes, etc.), our focus is primarily on a limited number of graph-based processes, namely percolation, random graphs, Ising models, and random walks on networks. We will not attempt to derive the theory of these models exhaustively here. Instead we will employ them to illustrate some essential techniques from discrete probability. Note that the toolkit developed in this book is meant to apply to other probabilistic models of interest as well, and in fact many more will be encountered along the way. After a brief review of graph basics and Markov chains theory, we formally introduce our main models. We also formulate various key questions about these models that will be answered (at least partially) later on. We assume that the reader is familiar with the measure-theoretic foundations of probability. A refresher of all required concepts and results is provided in the appendix.
Branching processes, which are the focus of this chapter, arise naturally in the study of stochastic processes on trees and locally tree-like graphs. Similarly to martingales, finding a hidden branching process within a probabilistic model can lead to useful bounds and insights into asymptotic behavior. After a review of the extinction theory of branching processes and of a fruitful random-walk perspective, we give a couple examples of applications in discrete probability. In particular we analyze the height of a binary search tree, a standard data structure in computer science. We also give an introduction to phylogenetics, where a “multitype” variant of the Galton–Watson branching process plays an important role; we use the techniques derived in this chapter to establish a phase transition in the reconstruction of ancestral molecular sequences. We end this chapter with a detailed look into the phase transition of the Erdos–Renyi graph model. The random-walk perspective mentioned above allows one to analyze the “exploration” of a largest connected component, leading to information about the “evolution” of its size as edge density increases.
In this chapter, we turn to martingales, which play a central role in probability theory. We illustrate their use in a number of applications to the analysis of discrete stochastic processes. After some background on stopping times and a brief review of basic martingale properties and results, we develop two major directions. We show how martingales can be used to derive a substantial generalization of our previous concentration inequalities – from the sums of independent random variables we focused on previously to nonlinear functions with Lipschitz properties. In particular, we give several applications of the method of bounded differences to random graphs. We also discuss bandit problems in machine learning. In the second thread, we give an introduction to potential theory and electrical network theory for Markov chains. This toolkit in particular provides bounds on hitting times for random walks on networks, with important implications in the study of recurrence among other applications. We also introduce Wilson’s remarkable method for generating uniform spanning trees.