To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In this chapter, we explore how (Type-2) computable distributions can be used to give both (algorithmic) sampling and distributional semantics to probabilistic programs with continuous distributions. To this end, we sketch an encoding of computable distributions in a fragment of Haskell and show how topological domains can be used to model the resulting PCF-like language. We also examine the implications that a (Type-2) computable semantics has for implementing conditioning. We hope to draw out the connection between an approach based on (Type-2) computability and ordinary programming throughout the chapter as well as highlight the relation with constructive mathematics (via realizability).
This chapter is concerned with analysing the expected runtime of probabilistic programs by exploiting program verification techniques. We introduce a weakest pre-conditioning framework á la Dijkstra that enables to determine the expected runtime in a compositional manner. Like weakest pre-conditions, it is a reasoning framework at the syntax level of programs. Applications of the weakest pre-conditioning framework include determining the expected runtime of randomised algorithms, as well as determining whether a program is positive almost-surely terminating, i.e., whether the expected number of computation steps until termination is finite for every possible input. For Bayesian networks, a restricted class of probabilistic programs, we show that the expected runtime analysis can be fully automated. In this way, the simulation time under rejection sampling can be determined. This is particularly useful for ill-conditioned inference queries.
Monads are a popular feature of the programming language Haskell because they can model many different notions of computation in a uniform and purely functional way. Our particular interest here is the probability monad, which can be -- and has been -- used to synthesise models for probabilistic programming. Quantitative Information Flow, or QIF, arises when security is combined with probability, and concerns the measurement of the amount of information that 'leaks' from a probabilistic program's state to a (usually) hostile observer: that is, not 'whether' leaks occur but rather 'how much?' Recently it has been shown that QIF can be seen monadically, a 'lifting' of the probability monad so that programs become functions from distributions to distributions of distributions: the codomain is 'hyper distributions'. Haskell's support for monads therefore suggests a synthesis of an executable model for QIF. Here, we provide the first systematic and thorough account of doing that: using distributions of distributions to synthesise a model for Quantitative Information Flow in terms of monads in Haskell.
Tabular is a domain-specific language for expressing probabilistic models of relational data. Tabular has several features that set it apart from other probabilistic programming languages including: (1) programs and data are stored as spreadsheet tables; (2) programs consist of probabilistic annotations on the relational schema of the data; and (3) inference returns estimations of missing values and latent columns, as well as parameters. Our primary implementation is for Microsoft Excel and relies on Infer.NET for inference. Still, the language can be called independently of Excel and can target alternative inference engines.
Equational logic has been a central theme in mathematical reasoning and in reasoning about programs. We introduce a quantitative analogue of equational reasoning that allows one to reason about approximate equality. The equality symbol is annotated with a real number that describes how far apart two terms can be. We develop the counterparts of standard results of equational logic, in particular, a completeness theorem. We define quantitative algebras and free quantitative algebras which yield monads on categories of metric spaces. We show that key examples of probability metrics, in particular, the Kantorovich metric and the Wasserstein p-metrics, arise from simple quantitative theories. Finally we develop a quantitative version of the theory of effects in programming languages.
Probabilistic couplings are a powerful abstraction for analysing probabilistic properties. Originating from research in probability theory, a coupling is a distribution over pairs that relates – or couples – two given distributions. If we can find a coupling with certain properties, then we can conclude properties about the two related distributions. In this way, probabilistic relational properties – properties comparing two executions of a probabilistic program – can be established by building a suitable coupling. Couplings have also been explored in the logic and verification literature. For example, probabilistic bisimulation asserts that there exists a coupling; in this way, couplings can be used to verify equivalence of finite state probabilistic transition systems. However, their use in mathematics suggests that couplings can prove more sophisticated properties for richer probabilistic computations, such as imperative programs and infinite state systems. Furthermore, we can borrow a tool from probability theory, called proof by coupling, to construct couplings in a compositional fashion. This chapter describes how coupling proofs can be naturally encoded in pRHL, a relational program logic originally designed for verifying cryptographic protocols. Several examples are presented, showing how to use this proof technique to verify equivalence, stochastic domination and probabilistic convergence.
For non-probabilistic programs, a key question in static analysis is termination, which asks whether a given program terminates under a given initial condition. In the presence of probabilistic behaviour, there are two fundamental extensions of the termination question: (a) the almost-sure termination question, which asks whether the termination probability is 1; and (b) the bounded-time termination question, which asks whether the expected termination time is bounded. There are many active research directions to address these two questions; one important such direction is the use of martingale theory for termination analysis. In this chapter, we survey the main techniques of the martingale-based approach to the termination analysis of probabilistic programs.
Let G be a graph of minimum degree at least k and let Gp be the random subgraph of G obtained by keeping each edge independently with probability p. We are interested in the size of the largest complete minor that Gp contains when p = (1 + ε)/k with ε > 0. We show that with high probability Gp contains a complete minor of order $\tilde{\Omega}(\sqrt{k})$, where the ~ hides a polylogarithmic factor. Furthermore, in the case where the order of G is also bounded above by a constant multiple of k, we show that this polylogarithmic term can be removed, giving a tight bound.
Bayesian probability models uncertain knowledge and learning from observations. As a defining feature of optimal adversarial behaviour, Bayesian reasoning forms the basis of safety properties in contexts such as privacy and fairness. Probabilistic programming is a convenient implementation of Bayesian reasoning but the adversarial setting imposes obstacles to its use: approximate inference can underestimate adversary knowledge and exact inference is impractical in cases covering large state spaces. By abstracting distributions, the semantics of a probabilistic language, and inference, jointly termed probabilistic abstract interpretation, we demonstrate adversary models both approximate and sound. We apply the techniques to build a privacy-protecting monitor and describe how to trade off the precision and computational cost in its implementation while remaining sound with respect to privacy risk bounds.
This chapter offers an accessible introduction to the channel-based approach to Bayesian probability theory. This framework rests on algebraic and logical foundations, inspired by the methodologies of programming language semantics. It offers a uniform, structured and expressive language for describing Bayesian phenomena in terms of familiar programming concepts, like channel, predicate transformation and state transformation. The introduction also covers inference in Bayesian networks, which will be modelled by a suitable calculus of string diagrams.
Influenza vaccine effectiveness (VE) wanes over the course of a temperate climate winter season but little data are available from tropical countries with year-round influenza virus activity. In Singapore, a retrospective cohort study of adults vaccinated from 2013 to 2017 was conducted. Influenza vaccine failure was defined as hospital admission with polymerase chain reaction-confirmed influenza infection 2–49 weeks after vaccination. Relative VE was calculated by splitting the follow-up period into 8-week episodes (Lexis expansion) and the odds of influenza infection in the first 8-week period after vaccination (weeks 2–9) compared with subsequent 8-week periods using multivariable logistic regression adjusting for patient factors and influenza virus activity. Records of 19 298 influenza vaccinations were analysed with 617 (3.2%) influenza infections. Relative VE was stable for the first 26 weeks post-vaccination, but then declined for all three influenza types/subtypes to 69% at weeks 42–49 (95% confidence interval (CI) 52–92%, P = 0.011). VE declined fastest in older adults, in individuals with chronic pulmonary disease and in those who had been previously vaccinated within the last 2 years. Vaccine failure was significantly associated with a change in recommended vaccine strains between vaccination and observation period (adjusted odds ratio 1.26, 95% CI 1.06–1.50, P = 0.010).
Two hundred days after the first confirmed case of COVID-19 in Brazil, the epidemic has rapidly spread in metropolitan areas and advanced throughout the countryside. We followed the temporal epidemic pattern at São Paulo State, the most populous of the country, the first to have a confirmed case of COVID-19, and the one with the most significant number of cases until now. We analysed the number of new cases per day in each regional health department and calculated the effective reproduction number (Rt) over time. Social distance measures, along with improvement in testing and isolating positive cases, general population mask-wearing and standard health security protocols for essential and non-essential activities, were adopted and impacted on slowing down epidemic velocity but were insufficient to stop transmission.
Most textbooks on regression focus on theory and the simplest of examples. Real statistical problems, however, are complex and subtle. This is not a book about the theory of regression. It is about using regression to solve real problems of comparison, estimation, prediction, and causal inference. Unlike other books, it focuses on practical issues such as sample size and missing data and a wide range of goals and techniques. It jumps right in to methods and computer code you can use immediately. Real examples, real stories from the authors' experience demonstrate what regression can do and its limitations, with practical advice for understanding assumptions and implementing methods for experiments and observational studies. They make a smooth transition to logistic regression and GLM. The emphasis is on computation in R and Stan rather than derivations, with code available online. Graphics and presentation aid understanding of the models and model fitting.
The epidemic of coronavirus disease 2019 (COVID-19) began in China and had spread rapidly to many other countries. This study aimed to identify risk factors associated with delayed negative conversion of SARS-CoV-2 in COVID-19 patients. In this retrospective single-centre study, we included 169 consecutive patients with confirmed COVID-19 in Zhongnan Hospital of Wuhan University from 15th January to 2nd March. The cases were divided into two groups according to the median time of SARS-CoV-2 negative conversion. The differences between groups were compared. In total, 169 patients had a median virus negative conversion time of 18 days (interquartile range: 11–25) from symptom onset. Compared with the patients with short-term negative conversion, those with long-term conversion had an older age, higher incidence of comorbidities, chief complaints of cough and chest distress/breath shortness and severer illness on admission, higher level of leucocytes, neutrophils, aspartate aminotransferase, creatine kinase and erythrocyte sedimentation rate (ESR), lower level of CD3+CD4+ lymphocytes and albumin and more likely to receive mechanical ventilation. In multivariate analysis, cough, leucocytes, neutrophils and ESR were positively correlated with delayed virus negative conversion, and CD3+CD4+ lymphocytes were negatively correlated. The integrated indicator of leucocytes, neutrophils and CD3+CD4+ lymphocytes showed a good performance in predicting the negative conversion within 2 weeks (area under ROC curve (AUC) = 0.815), 3 weeks (AUC = 0.804), 4 weeks (AUC = 0.812) and 5 weeks (AUC = 0.786). In conclusion, longer quarantine periods might be more justified for COVID-19 patients with cough, higher levels of leucocytes, neutrophils and ESR and lower levels of CD3+CD4+ lymphocytes.
A general multi-type population model is considered, where individuals live and reproduce according to their age and type, but also under the influence of the size and composition of the entire population. We describe the dynamics of the population as a measure-valued process and obtain its asymptotics as the population grows with the environmental carrying capacity. Thus, a deterministic approximation is given, in the form of a law of large numbers, as well as a central limit theorem. This general framework is then adapted to model sexual reproduction, with a special section on serial monogamic mating systems.
Insurance companies make extensive use of Monte Carlo simulations in their capital and solvency models. To overcome the computational problems associated with Monte Carlo simulations, most large life insurance companies use proxy models such as replicating portfolios (RPs). In this paper, we present an example based on a variable annuity guarantee, showing the main challenges faced by practitioners in the construction of RPs: the feature engineering step and subsequent basis function selection problem. We describe how neural networks can be used as a proxy model and how to apply risk-neutral pricing on a neural network to integrate such a model into a market risk framework. The proposed model naturally solves the feature engineering and feature selection problems of RPs.
It is well-known that in a small Pólya urn, i.e., an urn where the second largest real part of an eigenvalue is at most half the largest eigenvalue, the distribution of the numbers of balls of different colours in the urn is asymptotically normal under weak additional conditions. We consider the balanced case, and then give asymptotics of the mean and the covariance matrix, showing that after appropriate normalization, the mean and covariance matrix converge to the mean and covariance matrix of the limiting normal distribution.