To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Lymphocytic choriomeningitis virus (LCMV) is one of the arenaviruses infecting humans. LCMV infections have been reported worldwide in humans with varying levels of severity. To detect arenavirus RNA and LCMV-reactive antibodies in different geographical regions of Finland, we screened human serum and cerebrospinal fluid (CSF) samples, taken from suspected tick-borne encephalitis (TBE) cases, using reverse transcriptase polymerase chain reaction (RT-PCR) and immunofluorescence assay (IFA). No arenavirus nucleic acids were detected, and the overall LCMV seroprevalence was 4.5%. No seroconversions were detected in paired serum samples. The highest seroprevalence (5.2%) was detected among individuals of age group III (40–59 years), followed by age group I (under-20-year-olds, 4.9%), while the lowest seroprevalence (3.8%) was found in age group IV (60 years or older). A lower LCMV seroprevalence in older age groups may suggest waning of immunity over time. The observation of a higher seroprevalence in the younger age group and the decreasing population size of the main reservoir host, the house mouse, may suggest exposure to another LCMV-like virus in Finland.
We consider inference for possibly misspecified GMM models based on possibly nonsmooth moment conditions. While it is well known that misspecified GMM estimators with smooth moments remain $\sqrt {n}$ consistent and asymptotically normal, globally misspecified nonsmooth GMM estimators are $n^{1/3}$ consistent when either the weighting matrix is fixed or when the weighting matrix is estimated at the $n^{1/3}$ rate or faster. Because the estimator’s nonstandard asymptotic distribution cannot be consistently estimated using the standard bootstrap, we propose an alternative rate-adaptive bootstrap procedure that consistently estimates the asymptotic distribution regardless of whether the GMM estimator is smooth or nonsmooth, correctly or incorrectly specified. Monte Carlo simulations for the smooth and nonsmooth cases confirm that our rate-adaptive bootstrap confidence intervals exhibit empirical coverage close to the nominal level.
To investigate the symptoms of SARS-CoV-2 infection, their dynamics and their discriminatory power for the disease using longitudinally, prospectively collected information reported at the time of their occurrence. We have analysed data from a large phase 3 clinical UK COVID-19 vaccine trial. The alpha variant was the predominant strain. Participants were assessed for SARS-CoV-2 infection via nasal/throat PCR at recruitment, vaccination appointments, and when symptomatic. Statistical techniques were implemented to infer estimates representative of the UK population, accounting for multiple symptomatic episodes associated with one individual. An optimal diagnostic model for SARS-CoV-2 infection was derived. The 4-month prevalence of SARS-CoV-2 was 2.1%; increasing to 19.4% (16.0%–22.7%) in participants reporting loss of appetite and 31.9% (27.1%–36.8%) in those with anosmia/ageusia. The model identified anosmia and/or ageusia, fever, congestion, and cough to be significantly associated with SARS-CoV-2 infection. Symptoms’ dynamics were vastly different in the two groups; after a slow start peaking later and lasting longer in PCR+ participants, whilst exhibiting a consistent decline in PCR- participants, with, on average, fewer than 3 days of symptoms reported. Anosmia/ageusia peaked late in confirmed SARS-CoV-2 infection (day 12), indicating a low discrimination power for early disease diagnosis.
In this paper, we develop methods for statistical inferences in a partially identified nonparametric panel data model with endogeneity and interactive fixed effects. Under some normalization rules, we can concentrate out the large-dimensional parameter vector of factor loadings and specify a set of conditional moment restrictions that are involved with only the finite-dimensional factor parameters along with the infinite-dimensional nonparametric component. For a conjectured restriction on the parameter, we consider testing the null hypothesis that the restriction is satisfied by at least one element in the identified set and propose a test statistic based on a novel martingale difference divergence measure for the distance between a conditional expectation object and zero. We derive a tight asymptotic distributional upper bound for the resultant test statistic under the null and show that it is divergent at rate-N under the global alternative. To obtain the critical values for our test, we propose a version of multiplier bootstrap and establish its asymptotic validity. Simulations demonstrate the finite sample properties of our inference procedure. We apply our method to study Engel curves for major nondurable expenditures in China by using a panel dataset from the China Family Panel Studies.
We study heterogeneously interacting diffusive particle systems with mean-field-type interaction characterized by an underlying graphon and their finite particle approximations. Under suitable conditions, we obtain exponential concentration estimates over a finite time horizon for both 1- and 2-Wasserstein distances between the empirical measures of the finite particle systems and the averaged law of the graphon system.
Most community detection methods focus on clustering actors with common features in a network. However, clustering edges offers a more intuitive way to understand the network structure in many real-life applications. Among the existing methods for network edge clustering, the majority are algorithmic, with the exception of the latent space edge clustering (LSEC) model proposed by Sewell (Journal of Computational and Graphical Statistics, 30(2), 390–405, 2021). LSEC was shown to have good performance in simulation and real-life data analysis, but fitting this model requires prior knowledge of the number of clusters and latent dimensions, which are often unknown to researchers. Within a Bayesian framework, we propose an extension to the LSEC model using a sparse finite mixture prior that supports automated selection of the number of clusters. We refer to our proposed approach as the automated LSEC or aLSEC. We develop a variational Bayes generalized expectation-maximization approach and a Hamiltonian Monte Carlo-within Gibbs algorithm for estimation. Our simulation study showed that aLSEC reduced run time by 10 to over 100 times compared to LSEC. Like LSEC, aLSEC maintains a computational cost that grows linearly with the number of actors in a network, making it scalable to large sparse networks. We developed the R package aLSEC which implements the proposed methodology.
We consider linear preferential attachment trees with additive fitness, where fitness is the random initial vertex attractiveness. We show that when the fitnesses are independent and identically distributed and have positive bounded support, the local weak limit can be constructed using a sequence of mixed Poisson point processes. We also provide a rate of convergence for the total variation distance between the r-neighbourhoods of a uniformly chosen vertex in the preferential attachment tree and the root vertex of the local weak limit. The proof uses a Pólya urn representation of the model, for which we give new estimates for the beta and product beta variables in its construction. As applications, we obtain limiting results and convergence rates for the degrees of the uniformly chosen vertex and its ancestors, where the latter are the vertices that are on the path between the uniformly chosen vertex and the initial vertex.
We study the asymptotic growth rate of the labels of high-degree vertices in weighted recursive graphs (WRGs) when the weights are independent, identically distributed, almost surely bounded random variables, and as a result confirm a conjecture by Lodewijks and Ortgiese (‘The maximal degree in random recursive graphs with random weights’, preprint, 2020). WRGs are a generalisation of the random recursive tree and directed acyclic graph models, in which vertices are assigned vertex-weights and where new vertices attach to $m\in\mathbb{N}$ predecessors, each selected independently with a probability proportional to the vertex-weight of the predecessor. Prior work established the asymptotic growth rate of the maximum degree of the WRG model, and here we show that there exists a critical exponent $\mu_m$ such that the typical label size of the maximum-degree vertex equals $n^{\mu_m(1+o(1))}$ almost surely as n, the size of the graph, tends to infinity. These results extend results on the asymptotic behaviour of the location of the maximum degree, formerly only known for the random recursive tree model, to the more general weighted multigraph case of the WRG model. Moreover, for the weighted recursive tree model, that is, the WRG model with $m=1$, we prove the joint convergence of the rescaled degree and label of high-degree vertices under additional assumptions on the vertex-weight distribution, and also extend results on the growth rate of the maximum degree obtained by Eslava, Lodewijks, and Ortgiese (Stoch. Process. Appl.158, 2023).
We consider Gaussian approximation in a variant of the classical Johnson–Mehl birth–growth model with random growth speed. Seeds appear randomly in $\mathbb{R}^d$ at random times and start growing instantaneously in all directions with a random speed. The locations, birth times, and growth speeds of the seeds are given by a Poisson process. Under suitable conditions on the random growth speed, the time distribution, and a weight function $h\;:\;\mathbb{R}^d \times [0,\infty) \to [0,\infty)$, we prove a Gaussian convergence of the sum of the weights at the exposed points, which are those seeds in the model that are not covered at the time of their birth. Such models have previously been considered, albeit with fixed growth speed. Moreover, using recent results on stabilization regions, we provide non-asymptotic bounds on the distance between the normalized sum of weights and a standard Gaussian random variable in the Wasserstein and Kolmogorov metrics.
In this chapter, we look at the moments of a random variable. Specifically we demonstrate that moments capture useful information about the tail of a random variable while often being simpler to compute or at least bound. Several well-known inequalities quantify this intuition. Although they are straightforward to derive, such inequalities are surprisingly powerful. Through a range of applications, we illustrate the utility of controlling the tail of a random variable, typically by allowing one to dismiss certain “bad events” as rare. We begin by recalling the classical Markov and Chebyshev’s inequalities. Then we discuss three of the most fundamental tools in discrete probability and probabilistic combinatorics. First, we derive the complementary first and second moment methods, and give several standard applications, especially to threshold phenomena in random graphs and percolation. Then we develop the Chernoff–Cramer method, which relies on the “exponential moment” and is the building block for large deviations bounds. Two key applications in data science are briefly introduced: sparse recovery and empirical risk minimization.
In this chapter, we move on to coupling, another probabilistic technique with a wide range of applications (far beyond discrete stochastic processes). The idea behind the coupling method is deceptively simple: to compare two probability measures, it is sometimes useful to construct a joint probability space with the corresponding marginals. We begin by defining coupling formally and deriving its connection to the total variation distance through the coupling inequality. We illustrate the basic idea on a classical Poisson approximation result, which we apply to the degree sequence of an Erdos–Renyi graph. Then we introduce the concept of stochastic domination and some related correlation inequalities. We develop a key application in percolation theory. Coupling of Markov chains is the next topic, where it serves as a powerful tool to derive mixing time bounds. Finally, we end with the Chen–Stein method for Poisson approximation, a technique that applies in particular in some natural settings with dependent variables.
In this chapter, we develop spectral techniques. We highlight some applications to Markov chain mixing and network analysis. The main tools are the spectral theorem and the variational characterization of eigenvalues, which we review together with some related results. We also give a brief introduction to spectral graph theory and detail an application to community recovery. Then we apply the spectral theorem to reversible Markov chains. In particular we define the spectral gap and establish its close relationship to the mixing time. We also show in that the spectral gap can be bounded using certain isoperimetric properties of the underlying network. We prove Cheeger’s inequality, which quantifies this relationship, and introduce expander graphs, an important family of graphs with good “expansion.” Applications to mixing times are also discussed. One specific technique is the “canonical paths method,” which bounds the spectral graph by formalizing a notion of congestion in the network.
Little information exists concerning the spatial relationship between invasive meningococcal disease (IMD) cases and Neisseria meningitidis (N. meningitidis) carriage. The aim of this study was to examine whether there is a relationship between IMD and asymptomatic oropharyngeal carriage of meningococci by spatial analysis to identify the distribution and patterns of cases and carriage in South Australia (SA). Carriage data geocoded to participants’ residential addresses and meningococcal case notifications using Postal Area (POA) centroids were used to analyse spatial distribution by disease- and non-disease-associated genogroups, as well as overall from 2017 to 2020. The majority of IMD cases were genogroup B with the overall highest incidence of cases reported in infants, young children, and adolescents. We found no clear spatial association between N. meningitidis carriage and IMD cases. However, analyses using carriage and case genogroups showed differences in the spatial distribution between metropolitan and regional areas. Regional areas had a higher rate of IMD cases and carriage prevalence. While no clear relationship between cases and carriage was evident in the spatial analysis, the higher rates of both carriage and disease in regional areas highlight the need to maintain high vaccine coverage outside of the well-resourced metropolitan area.
In this chapter, we describe a few discrete probability models to which we will come back repeatedly throughout the book. While there exists a vast array of well-studied random combinatorial structures (permutations, partitions, urn models, Boolean functions, polytopes, etc.), our focus is primarily on a limited number of graph-based processes, namely percolation, random graphs, Ising models, and random walks on networks. We will not attempt to derive the theory of these models exhaustively here. Instead we will employ them to illustrate some essential techniques from discrete probability. Note that the toolkit developed in this book is meant to apply to other probabilistic models of interest as well, and in fact many more will be encountered along the way. After a brief review of graph basics and Markov chains theory, we formally introduce our main models. We also formulate various key questions about these models that will be answered (at least partially) later on. We assume that the reader is familiar with the measure-theoretic foundations of probability. A refresher of all required concepts and results is provided in the appendix.