To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Edited by
R. A. Bailey, University of St Andrews, Scotland,Peter J. Cameron, University of St Andrews, Scotland,Yaokun Wu, Shanghai Jiao Tong University, China
In the models discussed here, there is a hierarchy of variation that corresponds to groupings within the data. For example, students may be sampled from different classes, that in turn are sampled from different schools. Or, rather than being nested, groups may be crossed. Important notions are those of fixed and random effects, and variance components. Analysis of data from designs that have the balance needed to allow an analysis of variance breakdown are a special case. Further types of mixed models are generalized linear mixed models and repeated measures models. Repeated measures models are multilevel models where measurements consist of multiple profiles in time or space, resulting in time or spatial dependence. Relative to the length of time series that is required for a realistic analysis, each individual repeated measures profile can and often will have values for a few time points only.
This chapter explores ways to set up a model matrix so that linear combinations of the columns can fit curves and multidimensional surfaces. These extend to methods, within a generalized additive model framework, that use a penalization approach to constrain over-fitting. A further extension is to fitting quantiles of the data. The methodologies are important both for direct use for modeling data, and for checking for pattern in residuals from models that are in a more classical parametric style. The methodology is extended, in later chapters, to include smoothing terms in generalized linear models and models that allow for time series errors.
We give a simple method to estimate the number of distinct copies of some classes of spanning subgraphs in hypergraphs with a high minimum degree. In particular, for each $k\geq 2$ and $1\leq \ell \leq k-1$, we show that every $k$-graph on $n$ vertices with minimum codegree at least
contains $\exp\!(n\log n-\Theta (n))$ Hamilton $\ell$-cycles as long as $(k-\ell )\mid n$. When $(k-\ell )\mid k$, this gives a simple proof of a result of Glock, Gould, Joos, Kühn, and Osthus, while when $(k-\ell )\nmid k$, this gives a weaker count than that given by Ferber, Hardiman, and Mond, or when $\ell \lt k/2$, by Ferber, Krivelevich, and Sudakov, but one that holds for an asymptotically optimal minimum codegree bound.
Edited by
R. A. Bailey, University of St Andrews, Scotland,Peter J. Cameron, University of St Andrews, Scotland,Yaokun Wu, Shanghai Jiao Tong University, China
Edited by
R. A. Bailey, University of St Andrews, Scotland,Peter J. Cameron, University of St Andrews, Scotland,Yaokun Wu, Shanghai Jiao Tong University, China
The notes in this appendix provide a brief and limited overview of R syntax, semantics, and the R package system, as background for working with the R code included in the text. It is intended for use alongside R help pages and the wealth of tutorial material that is available online.
Edited by
R. A. Bailey, University of St Andrews, Scotland,Peter J. Cameron, University of St Andrews, Scotland,Yaokun Wu, Shanghai Jiao Tong University, China
We give an introduction to a topic in the “stable algebra of matrices,” as related to certain problems in symbolic dynamics. We introduce enough symbolic dynamics to explain these connections, but the algebra is of independent interest and can be followed with little attention to the symbolic dynamics. This “stable algebra of matrices” involves the study of properties and relations of square matrices over a semiring S, which are invariant under two fundamental equivalence relations: shift equivalence and strong shift equivalence. When S is a field, these relations are the same, and matrices over S are shift equivalent if and only if the nonnilpotent parts of their canonical forms are similar. We give a detailed account of these relations over other rings and semirings. When S is a ring, this involves module theory and algebraic K theory. We discuss in detail and contrast the problems of characterizing the possible spectra, and the possible nonzero spectra, of nonnegative real matrices.We also review key features of the automorphism group of a shift of finite type; the recently introduced stabilized automorphism group; and the work of Kim, Roush and Wagoner giving counterexamples to Williams’ shift equivalence conjecture.
Common time series models allow for a correlation between observations that is likely to be largest for points that are close together in time. Adjustments can be made, also, for seasonal effects. Variation in a single spatial dimension may have characteristics akin to those of time series, and comparable models find application there. Autoregressive models, which make good intuitive sense and are simple to describe, are the starting point for discussion; then moving on to autoregressive moving average with possible differencing. The "forecast" package for R has mechanisms that allow automatic selection of model parameters. Exponential smoothing state space (exponential time series or ETS) models are an important alternative that have often proved effective in forecasting applications. ARCH and GARCH heteroskedasticity models are further classes that have been developed to handle the special characteristics of financial time series.
Edited by
R. A. Bailey, University of St Andrews, Scotland,Peter J. Cameron, University of St Andrews, Scotland,Yaokun Wu, Shanghai Jiao Tong University, China
Edited by
R. A. Bailey, University of St Andrews, Scotland,Peter J. Cameron, University of St Andrews, Scotland,Yaokun Wu, Shanghai Jiao Tong University, China
These lecture notes provide quantum probabilistic concepts and methods for spectral analysis of graphs, in particular, for the study of asymptotic behavior of the spectral distributions of growing graphs. Quantum probability theory is an algebraic generalization of classical (Kolmogorovian) probability theory, where an element of a (not necessarily commutative) ∗-algebra is treated as a random variable. In this aspect the concepts and methods peculiar to quantum probability are applied to the spectral analysis of adjacency matrices of graphs. In particular, we focus on the method of quantum decomposition and the use of various concepts of independence. The former discloses the noncommutative nature of adjacency matrices and gives a systematic method of computing spectral distributions. The latter is related to various graph products and provides a unified aspect in obtaining the limit spectral distributions as corollaries of various central limit theorems.
Inferences are never assumption free. Data summaries that do not account for all relevant effects readily mislead. Distributions for the Pearson correlation and for counts, and extensions accounting for handling extra-binomial and extra-Poisson variation are noted. Notions of statistical power are introduced. Resampling methods, the bootstrap, and permutation tests, extend available inferential approaches. Regression with a single explanatory variable is used as a context in which to introduce residual plots, outliers, influence, robust regression, and standard errors of predicted values. There are two regression lines – that of y on x and that of x on y. Power transformations, with the logarithmic transformation as a special case, are often effective in giving a linear relationship. The training/test approach, and the closely allied of cross-validation approach, can be important for avoiding over-fitting. Other topics include one- and two-way comparisons, adjustments when there are multiple comparisons, and the estimation of false discovery rates when there is severe multiplicity. Discussions of theories of inference, including likelihood, and Bayes Factor and other Bayesian perspectives, ends the chapter.
Edited by
R. A. Bailey, University of St Andrews, Scotland,Peter J. Cameron, University of St Andrews, Scotland,Yaokun Wu, Shanghai Jiao Tong University, China
The strengths of this book include the directness of its encounter with research data, its advice on practical data analysis issues, careful critiques of analysis results, its use of modern data analysis tools and approaches, its use of simulation and other computer-intensive methods where these provide insight or give results that are not otherwise available, its attention to graphical and other presentation issues, its use of examples drawn from across the range of statistical applications, the links that it makes into the debate over reproducibility in science, and the inclusion of code that reproduces analyses. The methods that we cover have wide application. The datasets, many of which have featured in published papers, are drawn from many different fields. They reflect a journey in learning and understanding, alike for the authors and for those with whom they have worked, that has ranged widely over many different research areas. The R system has brought into a common framework a huge range of abilities for data analysis, data manipulation and graphics. Our text has as its aim helping its readers to take full advantage of those abilities.
Generalized linear models extend classical linear models in two ways. They allow the fitting of a linear model to a dependent variable whose expected values have been transformed using a "link" function. They allow for a range of error families other than the normal. They are widely used to fit models to count data and to binomial-type data, including models with errors that may exhibit extra-binomial or extra-Poisson variation. The discussion extends to models in the generalized additive model framework, and to ordinal regression models. Survival analysis, also referred to as time-to-event analysis, is principally concerned with the time duration of a given condition, often but not necessarily sickness or death. In nonmedical contexts, it may be referred to as failure time or reliability analysis. Applications include the failure times of industrial machine components, electronic equipment, kitchen toasters, light bulbs, businesses, loan defaults, and more. There is an elegant methodology for dealing with "censoring" – where all that can be said is that the event of interest occured before or after a certain time, or in a specified interval.
Tree-based methods use methodologies that are radically different from those discussed in previous chapters. They are relatively easy to use and can be applied to a wide class of problems. As with many of the new machine learning methods, construction of a tree, or (in the random forest approach, trees) follows an algorithmic process. Single-tree methods occupy the first part this chapter. An important aspect of the methodology is the determining of error estimates. By building a large number of trees and using a voting process to make predictions, the random forests methodology that occupies the latter part of this chapter can often greatly improve on what can be achieved with a single tree. The methodology operates more as a black box, but with implementation details that are simpler to describe than for single- tree methods. In large sample classification problems, the methodology has often proved superior to other contenders.