We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The use of statistical data analysis in physics means different things to different people. The reason for this is that most problems are different, and so someone concentrating on areas where the experimental data collected are relatively straightforward to analyse will naturally tend to use techniques that are simpler than those required for a more complicated problem or for a sparse data sample. Ultimately we all need to use statistical methods in order to translate data into some measure of a physical observable. This book will discuss a number of different concepts and techniques in order of increasing complexity. Before embarking on a discussion of statistics the remainder of this chapter introduces three common experimental problems encountered by students studying physics: (i) using a pendulum to measure acceleration due to gravity (Section 1.1), (ii) testing the validity of Ohm's law for a conductor (Section 1.2), and (iii) measuring the half-life of a radioactive isotope (Section 1.3). These examples rely on material covered in Chapters 4 through 7 and Chapter 9. Readers who appreciate the context of material in the remainder of this book may wish to skip forward to the next chapter.
Consider a data sample Ω described by the set of variables x that is composed of two (or more) populations. Often we are faced with the task of trying to identify or separate one sub-sample from the other (as these are different classes or types of events). In practice it is often not possible to completely separate samples of one class A from another class B as was seen in the case of likelihood fits to data. There are a number of techniques that can be used in order to try and optimally identify or separate a sub-sample of data from the whole, and some of these are described below. Each of the techniques described has its own benefits and disadvantages, and the final choice of the ‘optimal’ solution of how to separate A and B can require subjective input from the analyst. In general this type of situation requires the use of multivariate analysis (MVA).
The simplest approach is that of cutting on the data to improve the purity of a class of events, as described in Section 10.1. More advanced classifiers such as Bayesian classifiers, Fisher discriminants, neural networks, and decision trees are subsequently discussed. The Fisher discriminant described in Section 10.3 has the advantage that the coefficients required to optimally separate two populations of events are determined analytically up to an arbitrary scale factor. The neural network (Section 10.4) and decision tree (Section 10.5) algorithms described here require a numerical optimisation to be performed.
This chapter develops the notion introduced in Chapter 6 on how one defines a statistical error and extends this to look at one- and two-sided intervals (see Sections 7.1 and 7.2). One-sided intervals fall into the categories of upper or lower limits, which we can place on a hypothesised effect or process that has not been observed (see Section 7.2). Each of these concepts can be interpreted in terms of a frequentist or Bayesian approach. Section 9.7.3 discusses the concept of Bayesian upper limits in the context of a fit to data, and Appendix E contains tables of integrals of several common PDFs that can be used to determine confidence intervals and limits.
Two-sided intervals
The relevance of two-sided confidence intervals is discussed in the context of useful distributions used to represent PDFs. In particular the following sections highlight the use of Gaussian, Poisson, and binomial distributions as particular use cases of such intervals.
In general, for a distribution f(x) given by some variable x, we can define some region of interest in x called a two-sided interval such that x1 < x < x2.
Before embarking upon a detailed discussion of statistics, it is useful to introduce some notation to help describe data. This section introduces elementary set theory notation and Venn diagrams.
The notion of a set is a collection of objects or elements. This collection can also be referred to as data, and the individual elements in the data can themselves be referred to as data (in the singular sense), or as an event or element. We usually denote a set with a capital letter, for example Ω. The element of a set is denoted by a lower case letter, for example either ω or ωi, where the latter explicitly references the ith element of the set. The elements of a set are written within curly braces ‘{‘ and ‘}’. For example we can write a set ΩBinary that contains the elements 1 and 0 as
This is called the binary set, as it contains the elements required to represent a binary system. The order of elements in a set is irrelevant, so we could write ΩBinary in an equivalent form as
If we want to express the information that a given element is or is not a part of a set, we use the symbols ∈ and ∉, respectively. For example we may write
to express that both 0 and 1 are elements of ΩBinary, but 2 is not an element of this set.
2.1 (i) A ∪ B = {1, 2, 3, 4, 5, 9}; (ii) A ∩ B = {1, 3}; (iii) A \ B = {2, 4};(iv) B \ A = {5, 9};(v) Ā = {5, 9}, and B = {2, 4}
2.2 (i) A ∪ B = {0, 1, 2, 3, 5, 6, 9};(ii) A ∩ B = {5}; (iii) A \ B = {0, 2, 6};(iv) B \ A = {1, 3, 9}
2.3 (i) For example see the top part of Figure 2.3; (ii) see Figure 2.2, where A and B are interchanged.
2.4 A ∪ B = R
2.5 A ∪ B = R
2.6 A ∩ B = 1, 2, 3, 4, 5
2.7 A ∩ B = ø
2.8 A ∪ B ∩ C = Z+
2.9 A ∪ B ∩ C = {0, Z-}
2.10 ΩDecimal = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
2.11 {red, green, blue}
2.12 C
2.13 Z
2.14 {x|5 < x < 10}
2.15 {x |x ∈ R}
3. Probability
3.1 4/52 = 0.077 to 3 s.f.
3.2 0.45%
3.3 20/52 = 0.385 to 3 s.f.
3.4 The probability of the getting an ace followed by a 10 point card, where the other player also gets a 10 point card is 0.72%. The probability of getting an ace followed by a 10 point card, where the other player gets a card with a different value is 1.69%. The total probability of being dealt an ace followed by a 10 point card is the sum: 2.41%.
The foundations of science are built upon centuries of careful observation. These constitute measurements that are interpreted in terms of hypotheses, models, and ultimately well-tested theories that may stand the test of time for only a few years or for centuries. In order to understand what a single measurement means we need to appreciate a diverse range of statistical methods. Without such an appreciation it would be impossible for scientific method to turn observations of nature into theories that describe the behaviour of the Universe from sub-atomic to cosmic scales. In other words science would be impracticable without statistical data analysis. The data analysis principles underpinning scientific method pervade our everyday lives, from the use of statistics we are subjected to through advertising to the smooth operation of SPAM filters that we take for granted as we read our e-mail. These methods also impact upon the wider economy, as some areas of the financial industry use data mining and other statistical techniques to predict trading performance or to perform risk analysis for insurance purposes.
This book evolved from a one-semester advanced undergraduate course on statistical data analysis for physics students at Queen Mary, University of London with the aim of covering the rudimentary techniques required for many disciplines, as well as some of the more advanced topics that can be employed when dealing with limited data samples. This has been written by a physicist with a non-specialist audience in mind. This is not a statistics book for statisticians, and references have been provided for the interested reader to refer to for more rigorous treatment of the techniques discussed here.
The probability of something occurring is the quantification of the chance of observing a particular outcome given a single event. The event itself may be the result of a single experiment, or one single data point collected by an un-repeatable experiment. We refer to a single event or an ensemble of events as data, and the way we refer to data implies if data is singular or plural. If we quantify the probability of a repeatable experiment, then this understanding can be used to make predictions of the outcomes of future experiments. We cannot predict the outcome of a given experiment with certainty; however, we can assign a level of confidence to our predictions that incorporates the uncertainty from our previous knowledge and any information of the limitations of the experiment to be performed.
Consider the following. A scientist builds an experiment with two distinct outputs A and B. Having prepared the experiment, the apparatus is configured to always return the result A, and never return the result B. If the experiment is performed over and over again one will always obtain the result A with certainty. The probability of obtaining this result is 1.0 (100%). The result B will never be observed, and so the probability of obtaining that result is 0.0 (0%).
Based on course notes from over twenty years of teaching engineering and physical sciences at Michigan Technological University, Tomas Co's engineering mathematics textbook is rich with examples, applications and exercises. Professor Co uses analytical approaches to solve smaller problems to provide mathematical insight and understanding, and numerical methods for large and complex problems. The book emphasises applying matrices with strong attention to matrix structure and computational issues such as sparsity and efficiency. Chapters on vector calculus and integral theorems are used to build coordinate-free physical models with special emphasis on orthogonal co-ordinates. Chapters on ODEs and PDEs cover both analytical and numerical approaches. Topics on analytical solutions include similarity transform methods, direct formulas for series solutions, bifurcation analysis, Lagrange–Charpit formulas, shocks/rarefaction and others. Topics on numerical methods include stability analysis, DAEs, high-order finite-difference formulas, Delaunay meshes, and others. MATLAB® implementations of the methods and concepts are fully integrated.
Advances in scientific computing have made modelling and simulation an important part of the decision-making process in engineering, science, and public policy. This book provides a comprehensive and systematic development of the basic concepts, principles, and procedures for verification and validation of models and simulations. The emphasis is placed on models that are described by partial differential and integral equations and the simulations that result from their numerical solution. The methods described can be applied to a wide range of technical fields, from the physical sciences, engineering and technology and industry, through to environmental regulations and safety, product and plant safety, financial investing, and governmental regulations. This book will be genuinely welcomed by researchers, practitioners, and decision makers in a broad range of fields, who seek to improve the credibility and reliability of simulation results. It will also be appropriate either for university courses or for independent study.
A unique and comprehensive graduate text and reference on numerical methods for electromagnetic phenomena, from atomistic to continuum scales, in biology, optical-to-micro waves, photonics, nanoelectronics and plasmas. The state-of-the-art numerical methods described include: Statistical fluctuation formulae for the dielectric constant Particle-Mesh-Ewald, Fast-Multipole-Method and image-based reaction field method for long-range interactions High-order singular/hypersingular (Nyström collocation/Galerkin) boundary and volume integral methods in layered media for Poisson–Boltzmann electrostatics, electromagnetic wave scattering and electron density waves in quantum dots Absorbing and UPML boundary conditions High-order hierarchical Nédélec edge elements High-order discontinuous Galerkin (DG) and Yee finite difference time-domain methods Finite element and plane wave frequency-domain methods for periodic structures Generalized DG beam propagation method for optical waveguides NEGF(Non-equilibrium Green's function) and Wigner kinetic methods for quantum transport High-order WENO and Godunov and central schemes for hydrodynamic transport Vlasov-Fokker-Planck and PIC and constrained MHD transport in plasmas