To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The simplest models for the formation of large-scale structure are reviewed. On the assumption that the dark matter is cold and collisionless, LSS data are able to measure the total amount of matter, together with the baryon fraction and the spectral index of primordial fluctuations. There are degeneracies between these parameters, but these are broken by the addition of extra information such as CMB fluctuation data. The CDM models are confronted with recent data, especially the 2dF Galaxy Redshift Survey, which was the first to measure more than 100,000 redshifts. The 2dFGRS power spectrum is measured to ≲ 10% accuracy for k > 0.02 h Mpc–1, and is well fitted by a CDM model with Ωmh = 0.20 ± 0.03 and a baryon fraction of 0.15 ± 0.07. In combination with CMB data, a flat universe with Ωm ⋍ 0.3 is strongly favored. In order to use LSS data in this way, an understanding of galaxy bias is required. A recent approach to bias, known as the ‘halo model’ allows important insights into this phenomenon, and gives a calculation of the extent to which bias can depend on scale.
Structure formation in the CDM model
The origin and formation of large-scale structure in cosmology is a key problem that has generated much work over the years. Out of all the models that have been proposed, this talk concentrates on the simplest: gravitational instability of small initial density fluctuations.
The stock market is an excellent economic forecaster. It has predicted six of the last three recessions.
(Paul Samuelson)
In contrast to previous chapters, we now consider data transformation, how to transform data in order to produce better statistics, either to extract signal or to enhance signal.
There are many observations consisting of sequential data, such as intensity as a function of position as a radio telescope is scanned across the sky or as signal varies across a row on a CCD detector, single-slit spectra, time-measurements of intensity (or any other property). What sort of issues might concern us?
baseline detection and/or assessment, so that signal on this baseline can be analysed;
signal detection, identification for example of a spectral line or source in sequential data for which the noise may be comparable in magnitude to the signal;
filtering to improve signal-to-noise ratio;
quantifying the noise;
period-finding; searching the data for periodicities;
trend-finding; can we predict the future behaviour of subsequent data?
correlation of time series to find correlated signal between antenna pairs or to find spectral lines;
modelling; many astronomical systems give us our data convolved with some more-or-less known instrumental function, and we need to take this into account to get back to the true data.
The distinctive aspect of these types of analysis is that the feature of interest only emerges after a transformation.
Watson, you are coming along wonderfully. You have really done very well indeed. It is true that you have missed everything of importance, but you have hit upon the method…
(Sherlock Holmes in ‘A Case of Identity’, Sir Arthur Conan Doyle)
‘Detection’ is one of the commonest words in the practising astronomers' vocabulary. It is the preliminary to much else that happens in astronomy, whether it means locating a spectral line, a faint star or a gamma-ray burst. Indeed of its wide range of meanings, here we take the location, and confident measurement, of some sort of feature in a fixed region of an image or spectrum.
When a detection is obvious to even the most sceptical referee, statistical questions usually do not arise in the first instance. The parameters that result from such a detection have a signal-to-noise ratio so high that the detection finds its way into the literature as fact. However, elusive objects or features at the limit of detectability tend to become the focus of interest in any branch of astronomy. Then, the notion of detection (and non-detection) requires careful examination and definition.
Non-detections are especially important because they define how representative any catalogue of objects may be. This set of non-detections can represent vital information in deducing the properties of a population of objects; if something is never detected, that too is a fact, and can be exploited statistically. Every observation potentially contains information.
If your experiment needs statistics, you ought to have done a better experiment.
(Ernest Rutherford)
Science is about decision. Building instruments, collecting data, reducing data, compiling catalogues, classifying, doing theory – all of these are tools, techniques or aspects which are necessary. But we are not doing science unless we are deciding something; only decision counts. Is this hypothesis or theory correct? If not, why not? Are these data self-consistent or consistent with other data? Adequate to answer the question posed? What further experiments do they suggest?
We decide by comparing. We compare by describing properties of an object or sample, because lists of numbers or images do not present us with immediate results enabling us to decide anything. Is the faint smudge on an image a star or a galaxy? We characterize its shape, crudely perhaps, by a property, say the full-width half-maximum, the FWHM, which we compare with the FWHM of the point-spread function. We have represented a dataset, the image of the object, by a statistic, and in so doing we reach a decision.
Statistics are there for decision and because we know a background against which to take a decision. To this end, every measurement we make, and every parameter or value we derive, requires an error estimate, a measure of range (expressed in terms of probability) that encompasses our belief of the true value of the parameter. We are taught this by our masters in the course of interminable undergrad lab experiments.
Whether He does or not, the concepts of probability are important in astronomy for two reasons.
Astronomical measurements are subject to random measurement error, perhaps more so than most physical sciences because of our inability to rerun experiments and our perpetual wish to observe at the extreme limit of instrumental capability. We have to express these errors as precisely and usefully as we can. Thus when we say ‘an interval of 10−6 units, centred on the measured mass of the Moon, has a 95 per cent chance of containing the true value’, it is a much more quantitative statement than ‘the mass of the Moon is 1±10−6 units’. The second statement really only means anything because of some unspoken assumption about the distribution of errors. Knowing the error distribution allows us to assign a probability, or measure of confidence, to the answer.
The inability to do experiments on our subject matter leads us to draw conclusions by contrasting properties of controlled samples. These samples are often small and subject to uncertainty in the same way that a Gallup poll is subject to ‘sampling error’. In astronomy we draw conclusions such as ‘the distributions of luminosity in X-ray-selected Type I and Type II objects differ at the 95 per cent level of significance’. Very often the strength of this conclusion is dominated by the number of objects in the sample and is virtually unaffected by observational error.
In embarking on statistics we are entering a vast area, enormously developed for the Gaussian distribution in particular. This is classical territory; historically, statistics were developed because the approach now called Bayesian had fallen out of favour. Hence direct probabilistic inferences were superseded by the indirect and conceptually different route, going through statistics and intimately linked to hypothesis testing. The use of statistics is not particularly easy. The alternatives to Bayesian methods are subtle and not very obvious; they are also associated with some fairly formidable mathematical machinery. We will avoid this, presenting only results and showing the use of statistics, while trying to make clear the conceptual foundations.
Statistics
Statistics are designed to summarize, reduce or describe data. The formal definition of a statistic is that it is some function of the data alone. For a set of data X1, X2, …, some examples of statistics might be the average, the maximum value or the average of the cosines. Statistics are therefore combinations of finite amounts of data. In the following discussion, and indeed throughout, we try to distinguish particular fixed values of the data, and functions of the data alone, by upper case (except for Greek letters). Possible values, being variables, we will denote in the usual algebraic spirit by lower case.
The summarizing aspect of statistics is exemplified by those describing (1) location and (2) spread or scatter.
(interchange between Peter Scheuer and his then student, CRJ)
It is often the case that we need to do sample comparison: we have someone else's data to compare with ours; or someone else's model to compare with our data; or even our data to compare with our model. We need to make the comparison and to decide something. We are doing hypothesis testing – are our data consistent with a model, with somebody else's data? In searching for correlations as we were in Chapter 4, we were hypothesis testing; in the model fitting of Chapter 6 we are involved in data modelling and parameter estimation.
Classical methods of hypothesis testing may be either parametric or non-parametric, distribution-free as it is sometimes called. Bayesian methods necessarily involve a known distribution. We have described the concepts of Bayesian versus frequentist and parametric versus non-parametric in the introductory Chapters 1 and 2. Table 5.1 summarizes these apparent dichotomies and indicates appropriate usage.
That non-parametric Bayesian tests do not exist appears self-evident, as the key Bayesian feature is the probability of a particular model in the face of the data. However, it is not quite this clear-cut, and there has been consideration of non-parametric methods in a Bayesian context (Gull & Fielden 1986). If we understand the data so that we can model its collection process, then the Bayesian route beckons (see Chapter 2 and its examples).
An examination of the distribution of the numbers of galaxies recorded on photographic plates shows that it does not conform to the Poisson law and indicates the presence of a factor causing ‘contagion’.
(Neyman, Scott & Shane 1953)
The distribution of objects on the celestial sphere, or on an imaged patch of this sphere, has ever been a major preoccupation of astronomers. Avoiding here the science of image processing, the province of thousands of books and papers, we consider some of the common statistical approaches used to quantify sky distributions in order to permit contact with theory. Before we turn to the adopted statistical weaponry of galaxy distribution, we discuss some general statistics applicable to the spherical surface.
Statistics on a spherical surface
Abstractly, the distribution of objects on the celestial sphere is simply the distribution of directions of a set of unit vectors. In this respect, other three-dimensional spaces may be of interest, like the Poincaré sphere with unit vectors indicating the state of polarization of radiation.
This is a thriving subfield of statistics and there is an excellent hand-book (Fisher, Lewis & Embleton 1987). Much of the motivation comes from geophysical topics (orientation of palaeomagnetism, for instance) but many other ‘spaces’ are of interest. The emphasis is on statistical modelling and a variety of distributions is available. The Fisher distribution, one of the most popular, plays a similar role in spherical statistics to that played by the Gaussian in ordinary statistics.
Peter Scheuer started this. In 1977 he walked into JVW's office in the Cavendish Lab and quietly asked for advice on what further material should be taught to the new intake of Radio Astronomy graduate students (that year including the hapless CRJ). JVW, wrestling with simple chi-square testing at the time, blurted out ‘They know nothing about practical statistics’. Peter left thoughtfully. A day later he returned. ‘Good news! The Management Board has decided that the students are going to have a course on practical statistics.’ Can I sit in, JVW asked innocently. ‘Better news! The Management Board has decided that you're going to teach it’.
So, for us, began the notion of practical statistics. A subject that began with gambling is not an arcane academic pursuit, but it is certainly subtle as well. It is fitting that Peter Scheuer was involved at the beginning of this (lengthy) project; his style of science exemplified both subtlety and pragmatism. We hope that we can convey something of both. If an echo of Peter's booming laugh is sometimes heard in these pages, it is because we both learned from him that a useful answer is often much easier – and certainly much more entertaining – than you at first think.
After the initial course, the material for this book grew out of various further courses, journal articles, and the abundant personal experience that results from understanding just a little of any field of knowledge that counts Gauss and Laplace amongst its originators.
There is a vast literature. Here we point to a few works which we have found useful, binning these into five types: popular, the basic text, the rigorous text, the data analysis manual, and the books of specialist interest to astronomers.
classic popular books have legendary titles: How to Lie with Statistics (Huff 1973), Facts from Figures (Moroney 1965), Statistics in Action (Sprent 1977) and Statistics without Tears (Rowntree 1981). They are all fun. A modern version with a twist in the title is Seeing through Statistics (Utts 1996), which entertains, serves as a statistics primer, and is almost a member of the next group.
come in types (a) and (b), both of which cover similar material for the first two-thirds of each book. They start with descriptive or summarizing statistics (mean, standard deviation), the distributions of these statistics, then moving to the concept of probability and hence statistical inference and hypothesis testing, including correlation of two variables. They then diverge, choosing from a menu including analysis of variance (ANOVA), regression analysis, non-parametric statistics, etc. Modern versions come in bright colours and flavours, perhaps to help presentation to undergraduates of a subject with which excitement is not always associated. The value of many such books is exceptional because of the sales they generate. They are complete with tables, ready summaries of tests and formulae inside covers or in coloured insets, and frequently arrive with CDs and floppy disks including test datasets. […]
Arguing that the trial judge had failed to explain clearly the use of Bayes' theorem, the defence lodged an appeal. But in a bizarre irony, the Appeal Court last month upheld the appeal and ordered a retrial – on the grounds that the original judge had spent too much time explaining the scientific assessment of evidence. In their ruling, the Appeal judges said: ‘To introduce Bayes’ theorem, or any similar method, into a criminal trial plunges the jury into inappropriate and unnecessary realms of theory and complexity'.
(Robert Matthews, New Scientist 1996)
When we make a set of measurements, it is instinct to try to correlate the observations with other results. One or more motives may be involved in this instinct: for instance we might wish (1) to check that other observers' measurements are reasonable, (2) to check that our measurements are reasonable, (3) to test a hypothesis, perhaps one for which the observations were explicitly made, or (4) in the absence of any hypothesis, any knowledge, or anything better to do with the data, to find if they are correlated with other results in the hope of discovering some new and universal truth.
4.1 The fishing trip
Take the last point first. Suppose that we have plotted something against something, on a fishing expedition of this type. There are grave dangers on this expedition, and we must ask ourselves the following questions.
By
C. Leitherer, Space Telescope Science Institute, 3700 San Martin Drive, Baltimore, MD 21218 †
Edited by
Mario Livio, Space Telescope Science Institute, Baltimore,Keith Noll, Space Telescope Science Institute, Baltimore,Massimo Stiavelli, Space Telescope Science Institute, Baltimore
The contributions of the Hubble Space Telescope to our understanding of starburst galaxies are reviewed. Over the past decade, HST's imagers and spectrographs have returned highquality data from the far-ultraviolet to the near-infrared at unprecedented spatial resolution. A representative set of HST key observations is used to address several relevant issues: Where are starbursts found? What is their stellar content? How do they evolve with time? How do the stars and the interstellar medium interact? The review concludes with a list of science highlights and a forecast for the second decade.
Overview
Almost exactly 10 years ago ST ScI hosted its annual symposium entitled Massive Stars in Starbursts (Leitherer et al. 1991). Those were the weeks immediately prior to HST's launch, and the conference organizers felt it appropriate to have a meeting on the subject of starbursts because HST had the potential for significant contributions. Starbursts are compact (10°—103 pc), young (∼ 106—108 yr) sites of star formation, often with high dust obscuration. These properties make starbursts ideal targets for HST, given its superior spatial resolution, ultraviolet (UV) sensitivity, and (later-on) infrared (IR) capabilities.
As we all know, the high hopes were not immediately fulfilled, and it was not until after the First Servicing Mission that HST lived up to the expectations.
By
F. D. MacChetto, Space Telescope Science Institute, 3700 San Martin Drive, Baltimore, MD 21218
Edited by
Mario Livio, Space Telescope Science Institute, Baltimore,Keith Noll, Space Telescope Science Institute, Baltimore,Massimo Stiavelli, Space Telescope Science Institute, Baltimore
One of the important topics of current astrophysical research is the role that supermassive black holes play in shaping the morphology of their host galaxies. There is increasing evidence for the presence of massive black holes at the centers of all galaxies and many efforts are directed at understanding the processes that lead to their formation, the duty cycle for the active phase and the question of the fueling mechanism. Related issues are the epoch of formation of the supermassive black holes, their time evolution and growth and the role they play in the early ionization of the Universe. Considerable observational and theoretical work has been carried out in this field over the last few years and I will review some of the recent key areas of progress.
Introduction
It is now widely accepted that quasars (QSOs) and Active Galactic Nuclei (AGN) are powered by accretion onto massive black holes. This has led to extensive theoretical and observational studies to elucidate the properties of the black holes, the characteristics of the accretion mechanisms and the mechanisms responsible for the production and transportation of the energy from the central regions to the extended radio lobes.
However, over the last few years there has been an increasing realization that Massive Dark Objects (MDOs) may actually reside at the centers of all galaxies (Ho 1998, Magorrian et al. 1998, Richstone et al. 1998, Gebhardt et al. 2000a, Gebhardt et al. 2000b, Merrit & Ferrarese 2001, van der Marel 1999).