To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Science is all about identifying and understanding organized structures or patterns in nature. In this regard, periodic patterns have proven especially important. Nowhere is this more evident than in the field of astronomy. Periodic phenomena allow us to determine fundamental properties like mass and distance, enable us to probe the interior of stars through the new techniques of stellar seismology, detect new planets, and discover exotic states of matter like neutron stars and black holes. Clearly, any fundamental advance in our ability to detect periodic phenomena will have profound consequences in our ability to unlock nature's secrets. The purpose of this chapter is to describe advances that have come about through the application of Bayesian probability theory, and provide illustrations of its power through several examples in physics and astronomy. We also examine how non-uniform sampling can greatly reduce some signal aliasing problems.
New insights on the periodogram
Arthur Schuster introduced the periodogram in 1905, as a means for detecting a periodicity and estimating its frequency. If the data are evenly spaced, the periodogram is determined by the Discrete Fourier Transform (DFT), thus justifying the use of the DFT for such detection and measurement problems. In 1965, Cooley and Tukey introduced the Fast Discrete Fourier Transform (FFT), a very efficient method of implementing the DFT that removes certain redundancies in the computation and greatly speeds up the calculation of the DFT.
This book is primarily concerned with the philosophy and practice of inferring the laws of nature from experimental data and prior information. The role of inference in the larger framework of the scientific method is illustrated in Figure 1.1.
In this simple model, the scientific method is depicted as a loop which is entered through initial observations of nature, followed by the construction of testable hypotheses or theories as to the working of nature, which give rise to the prediction of other properties to be tested by further experimentation or observation. The new data lead to the refinement of our current theories, and/or development of new theories, and the process continues.
The role of deductive inference in this process, especially with regard to deriving the testable predictions of a theory, has long been recognized. Of course, any theory makes certain assumptions about nature which are assumed to be true and these assumptions form the axioms of the deductive inference process. The terms deductive inference and deductive reasoning are considered equivalent in this book. For example, Einstein's Special Theory of Relativity rests on two important assumptions; namely, that the vacuum speed of light is a constant in all inertial reference frames and that the laws of nature have the same form in all inertial frames.
In the last chapter, we discussed a variety of approaches to estimate the most probable set of parameters for nonlinear models. The primary rationale for these approaches is that they circumvent the need to carry out the multi-dimensional integrals required in a full Bayesian computation of the desired marginal posteriors. This chapter provides an introduction to a very efficient mathematical tool to estimate the desired posterior distributions for high-dimensional models that has been receiving a lot of attention recently. The method is known as Markov Chain Monte Carlo (MCMC). MCMC was first introduced in the early 1950s by statistical physicists (N. Metropolis, A. Rosenbluth, M. Rosenbluth, A. Teller, and E. Teller) as a method for the simulation of simple fluids. Monte Carlo methods are now widely employed in all areas of science and economics to simulate complex systems and to evaluate integrals in many dimensions. Among all Monte Carlo methods, MCMC provides an enormous scope for dealing with very complicated systems. In this chapter we will focus on its use in evaluating the multi-dimensional integrals required in a Bayesian analysis of models with many parameters.
The chapter starts with an introduction to Monte Carlo integration and examines how a Markov chain, implemented by the Metropolis–Hastings algorithm, can be employed to concentrate samples to regions with significant probability. Next, tempering improvements are investigated that prevent the MCMC from getting stuck in the region of a local peak in the probability distribution.
This chapter can be thought of as an extension of the material covered in Chapter 4 which was concerned with how to encode a given state of knowledge into a probability distribution suitable for use in Bayes' theorem. However, sometimes the information is of a form that does not simply enable us to evaluate a unique probability distribution p(Y|I). For example, suppose our prior information expresses the following constraint:
I ≡ “the mean value of cos y = 0.6.”
This information alone does not determine a unique p(Y|I), but we can use I to test whether any proposed probability distribution is acceptable. For this reason, we call this type of constraint information testable information. In contrast, consider the following prior information:
I1 ≡ “the mean value of cos y is probably > 0.6.”
This latter information, although clearly relevant to inference about Y, is too vague to be testable because of the qualifier “probably.”
Jaynes (1957) demonstrated how to combine testable information with Claude Shannon's entropy measure of the uncertainty of a probability distribution to arrive at a unique probability distribution. This principle has become known as the maximum entropy principle or simply MaxEnt.
We will first investigate how to measure the uncertainty of a probability distribution and then find how it is related to the entropy of the distribution. We will then examine three simple constraint problems and derive their corresponding probability distributions.
The first part of this chapter is devoted to a brief description of the methods and terminology employed in Bayesian inference and can be read as a stand-alone introduction on how to do Bayesian analysis. Following a review of the basics in Section 3.2, we consider the two main inference problems: parameter estimation and model selection. This includes how to specify credible regions for parameters and how to eliminate nuisance parameters through marginalization. We also learn that Bayesian model comparison has a built-in “Occam's razor,” which automatically penalizes complicated models, assigning them large probabilities only if the complexity of the data justifies the additional complication of the model. We also learn how this penalty arises through marginalization and depends both on the number of parameters and the prior ranges of these parameters.
We illustrate these features with a detailed analysis of a toy spectral line problem and in the process introduce the Jeffreys prior and learn how different choices of priors affect our conclusions. We also have a look at a general argument for selecting priors for location and scale parameters in the early phases of an investigation when our state of ignorance is very high. The final section illustrates how Bayesian analysis provides valuable new insights on systematic errors and how to deal with them.
I recommend that Sections 3.2 to 3.5 of this chapter be read twice; once quickly, and again after seeing these ideas applied in the detailed example treated in Sections 3.6 to 3.11.
The goal of science is to unlock nature's secrets. This involves the identification and understanding of nature's observable structures or patterns. Our understanding comes through the development of theoretical models which are capable of explaining the existing observations as well as making testable predictions. The focus of this book is on what happens at the interface between the predictions of scientific models and the data from the latest experiments. The data are always limited in accuracy and incomplete (we always want more), so we are unable to employ deductive reasoning to prove or disprove the theory. How do we proceed to extend our theoretical framework of understanding in the face of this? Fortunately, a variety of sophisticated mathematical and computational approaches have been developed to help us through this interface, these go under the general heading of statistical inference. Statistical inference provides a means for assessing the plausibility of one or more competing models, and estimating the model parameters and their uncertainties. These topics are commonly referred to as “data analysis” in the jargon of most physicists.
We are currently in the throes of a major paradigm shift in our understanding of statistical inference based on a powerful theory of extended logic. For historical reasons, it is referred to as Bayesian Inference or Bayesian Probability Theory. To get a taste of how significant this development is, consider the following: probabilities are commonly quantified by a real number between 0 and 1.