To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This chapter is not about one particular method (or a family of methods). Instead, it provides a set of tools useful for better pattern recognition, especially for real-world applications. They include the definition of distance metrics, vector norms, a brief introduction to the idea of distance metric learning, and power mean kernels (which is a family of useful metrics). We also establish by examples that proper normalizations of our data are essential, and introduce a few data normalization and transformation methods.
Starting from this chapter, Part III introduces several commonly used algorithms in pattern recognition and machine learning. Support vector machines (SVM) starts from a simple and beautiful idea: large margin. We first show that in order to find such an idea, we may need to simplify our problem setup by assuming a linearly separable binary one. Then we visualize and calculate the margin to reach the SVM formulation, which is complex and difficult to optimize. We practice the simplification procedure again until the formulation becomes viable, briefly mention the primal--dual relationship, but do not go into details of its optimization. We show that the simplification assumptions (linear, separable, and binary) can be relaxed such that SVM will solve more difficult tasks---and the key ideas here are also useful in other tasks: slack variables and kernel methods.
Information theory is developed in the communications community, but it turns out to be very useful for pattern recognition. In this chapter, we start with an example to develop the ideas of uncertainty and its measurement, i.e., entropy. A few core results in information theory are introduced: entropy, joint and conditional entropy, mutual information, and their relationships. We then move to differential entropy for continuous random variables and find distributions with maximum entropy under certain constraints, which are useful for pattern recognition. Finally, we introduce the applications of information theory in our context: maximum entropy learning, minimum cross entropy, feature selection, and decision trees (a widely used family of models for pattern recognition and machine learning).
What does a probabilistic program actually compute? How can one formally reason about such probabilistic programs? This valuable guide covers such elementary questions and more. It provides a state-of-the-art overview of the theoretical underpinnings of modern probabilistic programming and their applications in machine learning, security, and other domains, at a level suitable for graduate students and non-experts in the field. In addition, the book treats the connection between probabilistic programs and mathematical logic, security (what is the probability that software leaks confidential information?), and presents three programming languages for different applications: Excel tables, program testing, and approximate computing. This title is also available as Open Access on Cambridge Core.
Children are important transmitters of infection. Within schools they encounter large numbers of contacts and infections can spread easily causing outbreaks. However, not all schools are affected equally. We conducted a retrospective analysis of school outbreaks to identify factors associated with the risk of gastroenteritis, influenza, rash or other outbreaks. Data on reported school outbreaks in England were obtained from Public Health England and linked with data from the Department for Education and the Office for Standards in Education, Children's Services and Skills (Ofsted). Primary and all-through schools were found to be at increased risk of outbreaks, compared with secondary schools (odds ratio (OR) 5.82, 95% confidence interval (CI) 4.50–7.58 and OR 4.66, 95% CI 3.27–6.61, respectively). School size was also significantly associated with the risk of outbreaks, with higher odds associated with larger schools. Attack rates were higher in gastroenteritis and influenza outbreaks, with lower attack rates associated with rashes (relative risk 0.17, 95% CI 0.15–0.20). Deprivation and Ofsted rating were not associated with either outbreak occurrence or the subsequent attack rate. This study identifies primary and all-through schools as key settings for health protection interventions. Public health teams need to work closely with these schools to encourage early identification and reporting of outbreaks.
Multicomponent polymer systems are of interest in organic photovoltaic and drug delivery applications, among others where diverse morphologies influence performance. An improved understanding of morphology classification, driven by composition-informed prediction tools, will aid polymer engineering practice. We use a modified Cahn–Hilliard model to simulate polymer precipitation. Such physics-based models require high-performance computations that prevent rapid prototyping and iteration in engineering settings. To reduce the required computational costs, we apply machine learning (ML) techniques for clustering and consequent prediction of the simulated polymer-blend images in conjunction with simulations. Integrating ML and simulations in such a manner reduces the number of simulations needed to map out the morphology of polymer blends as a function of input parameters and also generates a data set which can be used by others to this end. We explore dimensionality reduction, via principal component analysis and autoencoder techniques, and analyze the resulting morphology clusters. Supervised ML using Gaussian process classification was subsequently used to predict morphology clusters according to species molar fraction and interaction parameter inputs. Manual pattern clustering yielded the best results, but ML techniques were able to predict the morphology of polymer blends with ≥90% accuracy.
Green–Griffiths–Kerr introduced Hodge representations to classify the Hodge groups of polarized Hodge structures, and the corresponding Mumford–Tate subdomains. We summarize how, given a fixed period domain $ \mathcal{D} $, to enumerate the Hodge representations and corresponding Mumford–Tate subdomains $ D \subset\mathcal{D} $. The procedure is illustrated in two examples: (i) weight two Hodge structures with $ {p}_g={h}^{2,0}=2 $; and (ii) weight three CY-type Hodge structures.
Technical challenges associated with telomere length (TL) measurements have prompted concerns regarding their utility as a biomarker of aging. Several factors influence TL assessment via qPCR, the most common measurement method in epidemiological studies, including storage conditions and DNA extraction method. Here, we tested the impact of power supply during the qPCR assay. Momentary fluctuations in power can affect the functioning of high-performance electronics, including real-time thermocyclers. We investigated if mitigating these fluctuations by using an uninterruptible power supply (UPS) influenced TL assessment via qPCR. Samples run with a UPS had significantly lower standard deviation (p < 0.001) and coefficient of variation (p < 0.001) across technical replicates than those run without a UPS. UPS usage also improved exponential amplification efficiency at the replicate, sample, and plate levels. Together these improvements translated to increased performance across metrics of external validity including correlation with age, within-person correlation across tissues, and correlation between parents and offspring.
Pertussis is a highly contagious infectious disease and remains an important cause of mortality and morbidity worldwide. Over the last decade, vaccination has greatly reduced the burden of pertussis. Yet, uncertainty in individual vaccination coverage and ineffective case surveillance systems make it difficult to estimate burden and the related quantity of population-level susceptibility, which determines population risk. These issues are more pronounced in low-income settings where coverage is often overestimated, and case numbers are under-reported. Serological data provide a direct characterisation of the landscape of susceptibility to infection; and can be combined with vaccination coverage and basic theory to estimate rates of exposure to natural infection. Here, we analysed cross-sectional data on seropositivity against pertussis to identify spatial and age patterns of susceptibility in children in Madagascar. A large proportion of individuals surveyed were seronegative; however, there were patterns suggestive of natural infection in all the regions analysed. Improvements in vaccination coverage are needed to help prevent additional burden of pertussis in the country.
We answer four questions from a recent paper of Rao and Shinkar [17] on Lipschitz bijections between functions from {0, 1}n to {0, 1}. (1) We show that there is no O(1)-bi-Lipschitz bijection from Dictator to XOR such that each output bit depends on O(1) input bits. (2) We give a construction for a mapping from XOR to Majority which has average stretch $O(\sqrt{n})$, matching a previously known lower bound. (3) We give a 3-Lipschitz embedding $\phi \colon\{0,1\}^n \to \{0,1\}^{2n+1}$ such that $${\rm{XOR }}(x) = {\rm{ Majority }}(\phi (x))$$ for all $x \in \{0,1\}^n$. (4) We show that with high probability there is an O(1)-bi-Lipschitz mapping from Dictator to a uniformly random balanced function.
Typical enteropathogenic Escherichia coli (tEPEC) infection is a major cause of diarrhoea and contributor to mortality in children <5 years old in developing countries. Data were analysed from the Global Enteric Multicenter Study examining children <5 years old seeking care for moderate-to-severe diarrhoea (MSD) in Kenya. Stool specimens were tested for enteric pathogens, including by multiplex polymerase chain reaction for gene targets of tEPEC. Demographic, clinical and anthropometric data were collected at enrolment and ~60-days later; multivariable logistic regressions were constructed. Of 1778 MSD cases enrolled from 2008 to 2012, 135 (7.6%) children tested positive for tEPEC. In a case-to-case comparison among MSD cases, tEPEC was independently associated with presentation at enrolment with a loss of skin turgor (adjusted odds ratio (aOR) 2.08, 95% confidence interval (CI) 1.37–3.17), and convulsions (aOR 2.83, 95% CI 1.12–7.14). At follow-up, infants with tEPEC compared to those without were associated with being underweight (OR 2.2, 95% CI 1.3–3.6) and wasted (OR 2.5, 95% CI 1.3–4.6). Among MSD cases, tEPEC was associated with mortality (aOR 2.85, 95% CI 1.47–5.55). This study suggests that tEPEC contributes to morbidity and mortality in children. Interventions aimed at defining and reducing the burden of tEPEC and its sequelae should be urgently investigated, prioritised and implemented.
We introduce new definitions of sectional, Ricci, and scalar curvatures for networks and their higher dimensional counterparts, derived from two classical notions of curvature for curves in general metric spaces, namely, the Menger curvature and the Haantjes curvature. These curvatures are applicable to unweighted or weighted and undirected or directed networks and are more intuitive and easier to compute than other network curvatures. In particular, the proposed curvatures based on the interpretation of Haantjes definition as geodesic curvature allow us to give a network analogue of the classical local Gauss–Bonnet theorem. Furthermore, we propose even simpler and more intuitive proxies for the Haantjes curvature that allow for even faster and easier computations in large-scale networks. In addition, we also investigate the embedding properties of the proposed Ricci curvatures. Lastly, we also investigate the behavior, both on model and real-world networks, of the curvatures introduced herein with more established notions of Ricci curvature and other widely used network measures.
The prevalence of asymptomatic infection by coronavirus disease 2019 (COVID-19) as a critical measure for effectiveness of mitigation strategy has been reported to be widely varied. In this study, we aimed to determine the prevalence of asymptomatic infection using serosurvey on general population. In a cross-sectional seroprevalence survey in Guilan province, Iran, the specific antibody against COVID-19 in a representative sample was detected using rapid test kits. Among 117 seropositive subjects, prevalence of asymptomatic infection was determined based on the history of symptoms during the preceding 3 months. The design-adjusted prevalence of asymptomatic infection was 57.2% (95% confidence interval (CI) 44–69). The prevalence was significantly lower in subjects with previous contacts to COVID-19 patients (12%, 95% CI 2–49) than others without (69%, 95% CI, 46–86). The lowest prevalence was for painful body symptom (74.4%). This study revealed that more than half of the infected COVID-19 patients had no symptoms. The implications of our findings include the importance of adopting public health measures such as social distancing and inefficiency of contact tracing to interrupt epidemic transmission.
Stochastic clearing theory has wide-spread applications in the context of supply chain and service operations management. Historical application domains include bulk service queues, inventory control, and transportation planning (e.g., vehicle dispatching and shipment consolidation). In this paper, motivated by a fundamental application in shipment consolidation, we revisit the notion of service performance for stochastic clearing system operation. More specifically, our goal is to evaluate and compare service performance of alternative operational policies for clearing decisions, as quantified by a measure of timely service referred to as Average Order Delay ($AOD$). All stochastic clearing systems are subject to service delay due to the inherent clearing practice, and $\textrm {AOD}$ can be thought of as a benchmark for evaluating timely service. Although stochastic clearing theory has a long history, the existing literature on the analysis of $\textrm {AOD}$ as a service measure has several limitations. Hence, we extend the previous analysis by proposing a more general method for a generic analytical derivation of $\textrm {AOD}$ for any renewal-type clearing policy, including but not limited to alternative shipment consolidation policies in the previous literature. Our proposed method utilizes a new martingale point of view and lends itself for a generic analytical characterization of $\textrm {AOD}$, leading to a complete comparative analysis of alternative renewal-type clearing policies. Hence, we also close the gaps in the literature on shipment consolidation via a complete set of analytically provable results regarding $\textrm {AOD}$ which were only illustrated through numerical tests previously.
In this paper, we derive the asymptotic properties of the density-weighted average derivative estimator when a regressor is contaminated with classical measurement error and the density of this error must be estimated. Average derivatives of conditional mean functions are used extensively in economics and statistics, most notably in semiparametric index models. As well as ordinary smooth measurement error, we provide results for supersmooth error distributions. This is a particularly important class of error distribution as it includes the Gaussian density. We show that under either type of measurement error, despite using nonparametric deconvolution techniques and an estimated error characteristic function, we are able to achieve a $\sqrt {n}$-rate of convergence for the average derivative estimator. Interestingly, if the measurement error density is symmetric, the asymptotic variance of the average derivative estimator is the same irrespective of whether the error density is estimated or not. The promising finite sample performance of the estimator is shown through a Monte Carlo simulation.
Google's ‘Community Mobility Reports’ (CMR) detail changes in activity and mobility occurring in response to COVID-19. They thus offer the unique opportunity to examine the relationship between mobility and disease incidence. The objective was to examine whether an association between COVID-19-confirmed case numbers and levels of mobility was apparent, and if so then to examine whether such data enhance disease modelling and prediction. CMR data for countries worldwide were cross-correlated with corresponding COVID-19-confirmed case numbers. Models were fitted to explain case numbers of each country's epidemic. Models using numerical date, contemporaneous and distributed lag CMR data were contrasted using Bayesian Information Criteria. Noticeable were negative correlations between CMR data and case incidence for prominent industrialised countries of Western Europe and the North Americas. Continent-wide examination found a negative correlation for all continents with the exception of South America. When modelling, CMR-expanded models proved superior to the model without CMR. The predictions made with the distributed lag model significantly outperformed all other models. The observed relationship between CMR data and case incidence, and its ability to enhance model quality and prediction suggests data related to community mobility could prove of use in future COVID-19 modelling.